SlideShare une entreprise Scribd logo
1  sur  37
1© Cloudera, Inc. All rights reserved.
Michael Crutcher
Director, Product Management - Storage
Lambda Architecture
2© Cloudera, Inc. All rights reserved.
Agenda
• Big Data Challenges
• What is Lambda?
• Lambda Advantages and Disadvantages
• Kudu as a Lambda alternative
3© Cloudera, Inc. All rights reserved.
Big Data Challenges
4© Cloudera, Inc. All rights reserved.
“Something interesting is happening”
The world’s largest
taxi company owns
ZERO vehicles.
The world’s largest
accommodation provider
owns ZERO real estate.
The world’s most
popular media owner
creates ZERO content.
The world’s leading
music platform owns
no music.
5© Cloudera, Inc. All rights reserved.
Data is now a strategic asset
Instrumentation
Consumerization
Experimentation
Today, everything that can be
measured will be measured.
Today, data IS the
application.
Today, becoming data-driven
is a business imperative.
6© Cloudera, Inc. All rights reserved.
“It will soon be technically
feasible & affordable to
record & store everything…”
— New York Times
“Digital technologies will, in
the near future, accomplish
many tasks once considered
uniquely human.”
.
— Second Machine Age
Data is abundant,
diverse & shared freely
As is how we store,
process and analyze it
Streaming Machine Learning BI
ETL Modeling
7© Cloudera, Inc. All rights reserved.
The new analytics paradigm
Understand
why it
happened
Change
what
happens
next
Determine
what
happened
Make it
happen
consistently
8© Cloudera, Inc. All rights reserved.
So Why Big Data?
What does the reporting look
like at your business today?
What if it could happen in half
the time, or half that time?
What data are you looking at?
What data do you want to know
about your customers? How
can you best use external data?
Too often data is archived,
combined, or simplified to save
space and strain on systems.
Once data is combined we loose
the ability to dig deeper.
Better Business Forecasting Better Views of CustomersFull Fidelity Data Access
9© Cloudera, Inc. All rights reserved.
What is Lambda architecture?
10© Cloudera, Inc. All rights reserved.
What is Lambda Architecture?
Batch Layer
Serving Layer
Speed Layer
New Data
Data Lake
(HDFS)
Precompute
Views
Stream or
Micro Batch
Increment
Views
Data
Application
“Real-time” Increment
Batch Recompute
Merge
Hadoop
Storm/Spark
HBase
Impala
11© Cloudera, Inc. All rights reserved.
Batch Layer
• Manages the master data set, an immutable, append-only set of raw data
• Pre-computes views of the data
• “Traditionally” this has been in HDFS and processed with Map/Reduce
• There has already been some shift to cloud based object storage and processing
in other frameworks like Spark
12© Cloudera, Inc. All rights reserved.
Speed Layer
• This layer ingests streaming data or micro-batches
• Spark and Storm are traditionally used
• In some cases micro-batches are directly ingested into NoSQL data stores like
HBase
• This data is periodically expunged
• In many “Lambda-like” architectures I’ve seen, this layer is used to provide an
“active partition” that provides a limited window of mutability
13© Cloudera, Inc. All rights reserved.
Serving Layer
• As you might guess from the name, this is the layer that serves data
• It would be unusual for raw data to be served directly
• This could be an application written directly against a data store like HBase
• It could be a SQL engine on top of a file system, Impala + Parquet is an example
14© Cloudera, Inc. All rights reserved.
What is a Kappa Architecture?
Batch Layer
Serving Layer
Speed Layer
New Data
Data Lake
(HDFS)
Precompute
Views
Stream or
Micro Batch
Increment
Views
Data
Application
“Real-time” Increment
Batch Recompute
Merge
Hadoop
Storm/Spark
HBase
Impala
15© Cloudera, Inc. All rights reserved.
Everything Has a New Name
Batch Layer
Serving Layer
Speed Layer
New Data
Data Lake
(HDFS)
Precompute
Views
Stream or
Micro Batch
Increment
Views
Data
Application
“Real-time” Increment
Batch Recompute
Merge
System of Record (OLTP)
Operational Data Store Derived Tables (EDW)
In-Memory Database
Star/Snowflake, Cubes,
or In-Memory Tables
16© Cloudera, Inc. All rights reserved.
The Log as Storage
• Lambda and Kappa architectures are both predicated on immutable source data
• Data can be modeled as a series of events recorded at specific points in time
about entities
• Updates are modeled as new events and the current or historic value associated
with an entity can be reconstructed through the collected events
• Kappa calls this ordered set of events “a log”, it’s safe to say they didn’t invent
this term
A B C D B E F A G B
1 10
Ordered over Time
17© Cloudera, Inc. All rights reserved.
Is Raw Data the Right Logical Model?
• It’s possible to derive many higher level logical abstractions from raw data
• As an example, I could construct a customer account balance from raw account
activity data
• This doesn’t mean it’s a good idea
A B C B A C B A A C
t0 t12Account Activity
+$10 +$20 +$15 -$10 +$35 -$5 +$25 +$15 -$20 +$10
Easy:
What was the last account event for Customer C?
Harder:
What is the account balance for Customer A at t12?
18© Cloudera, Inc. All rights reserved.
There are Only Two Hard CS Problems
1) Cache invalidation
2) Naming things
-- Phil Karlton
19© Cloudera, Inc. All rights reserved.
Data Engineering has one hard problem
• When should I denormalize to maximize performance?
• When should I normalize to minimize maintenance problems?
Denormalize Everything!
Normalize Everything!
I wish things were faster!
I wish things were easier
to maintain!
20© Cloudera, Inc. All rights reserved.
Lambda Advantages and
Disadvantages
21© Cloudera, Inc. All rights reserved.
Lambda Advantages
• Marries diverse strengths of existing open source software into a unified
architecture
• Provides scalability via the batch layer
• Provides real time performance via the speed layer
22© Cloudera, Inc. All rights reserved.
Lambda Disadvantages
• Complexity
• Many moving parts
• Restatement is difficult
• Two code bases must be kept in sync
• Proper failure handling is complex
23© Cloudera, Inc. All rights reserved.
Lambda Complexity
Batch Layer
Serving Layer
Speed Layer
New Data
Data Lake
(HDFS)
Precompute
Views
Stream or
Micro Batch
Increment
Views
Data
Application
“Real-time” Increment
Batch Recompute
Merge
Hadoop
Storm/Spark
HBase
Impala
Code must be kept in sync
Restatement is difficult
24© Cloudera, Inc. All rights reserved.
Lambda Complexity
Batch Layer
Serving Layer
Speed Layer
New Data
Data Lake
(HDFS)
Precompute
Views
Stream or
Micro Batch
Increment
Views
Data
Application
“Real-time” Increment
Batch Recompute
Merge
Hadoop
Storm/Spark
HBase
Impala
Hmm… this data looks fishy
Problem Here?
Here?
Here?
Here?
Here?
Here?
Here?
25© Cloudera, Inc. All rights reserved.
The Log as Storage
• The idea of representing data as immutable log information is not new and is not
without tradeoffs:
• Space amplification: how many bytes of data are stored, relative to how many
logical bytes the database contains
• Write amplification: how many bytes of data are written by the database
compared to the number of bytes changed by the user
• Read amplification: how many bytes the database has to physically read to
return values to the user compared to the bytes returned
• Complexity: am I solving a CS problem or a customer problem?
• These are not simple issues and there’s no straightforward “right” answer
26© Cloudera, Inc. All rights reserved.
Premature Optimization
Programmers waste enormous amounts of time thinking about, or worrying about,
the speed of noncritical parts of their programs, and these attempts at efficiency
actually have a strong negative impact when debugging and maintenance are
considered. We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil.
--Donald Knuth
27© Cloudera, Inc. All rights reserved.
Gap Filling vs. Optimization
• Some Lambda implementations are deployed on big data systems that don’t require
significant optimization to deliver desired SLAs
• Often, Lambda architectures are used to fill the very stark difference in workload
processing capabilities of technologies that are used typically used for the batch (long
scan) and fast layers (quick point lookups)
• Anecdotally, Lambda architectures seem to be deployed much more often with current
generation open source technology than they were with legacy commercial offerings
• Part of this is because of data volume, variety, and velocity caused by our increasingly
data driven world, but I think part of this is also because legacy technologies haven’t had
as stark of a difference in what workloads they’re optimal for
• Are you deploying a Lambda architecture because you need to squeeze out all of the
performance possible, or because you have a mixed workload that can’t be deployed on
one single storage technology?
28© Cloudera, Inc. All rights reserved.
Gap Filling v2: Lack of Mutability
• Some Lambda implementations aim to fill the gap of
the lack of mutability in HDFS
• Raw, master data should be immutable, but in the real
world raw data could potentially need to be adjusted
• Sensors could have been miscalibrated, data may have
been incorrectly entered, raw data might be an
approximation before finalization, etc.
• Derived aggregations might more efficiently modified
in place, vs. recalculated from raw data, recalculating
all of history is often not practically possible
Incoming Data
(Messaging
System)
New Partition
Most Recent Partition
Historic Data
HBase
Parquet
File
• Wait for running operations to complete
• Define new Impala partition referencing the
newly written Parquet file
Reporting
Request
Impala on HDFS
29© Cloudera, Inc. All rights reserved.
Kudu as a Lambda alternative
30© Cloudera, Inc. All rights reserved.
HDFS
Fast Scans, Analytics
and Processing of
Stored Data
Fast On-Line
Updates &
Data Serving
Arbitrary Storage
(Active Archive)
Fast Analytics
(on fast-changing or
frequently-updated data)
Kudu: Fast Analytics on Fast-Changing Data
New storage engine enables new Hadoop use cases
Unchanging
Fast Changing
Frequent Updates
HBase
Append-Only
Real-Time
Kudu Kudu fills the Gap
Modern analytic
applications often
require complex data
flow & difficult
integration work to
move data between
HBase & HDFS
Analytic
Gap
Pace of Analysis
PaceofData
31© Cloudera, Inc. All rights reserved.
Kudu Increases the Value of Time Series Data
Time Series
Inserts, updates, scans, lookups
Workload
Examples
Stream market data, fraud detection &
prevention, risk monitoring
Time series data is most valuable if you can
analyze it to change outcomes in real time.
Kudu simulateneously enables:
• Time series data inserted/updated as it arrives
• Analytic scans to find trends on fresh time series data
• Lookups to quickly visit the point in time where an
event occurred for further investigation
32© Cloudera, Inc. All rights reserved.
Kudu can help spot problems before they
happen. Real-time data inserts with the ability to
analyze trends identifies potential problems.
Kudu identifies trouble through:
• Extreme scale, allowing better historic trend analysis
• Fast inserts to enable an up-to-date view of your
business
• Fast scans identify/flag undesired states for remedy
Kudu Keeps Your Business Operational
Machine Data
Analytics
Inserts, scans, lookups
Workload
Examples
Network threat detection, IoT, predictive
maintenance and failure detection
33© Cloudera, Inc. All rights reserved.
More Versatility in Online Reporting
Online
Reporting
Inserts, updates, scans, lookups
Workload
Examples
“Active” Reporting
Online reporting has traditionally been limited by
data volume and analytic capabilitiy, keeping
only recent data designed for granular queries.
Kudu adds online reporting versatility through:
• Fast inserts and updates to keep data fresh
• Fast lookups and analytic scans in one data store
34© Cloudera, Inc. All rights reserved.
Xiaomi use case
• World’s 4th largest smart-phone maker (most popular in China)
• Gather important RPC tracing events from mobile app and backend service.
• Service monitoring & troubleshooting tool.
High write throughput
• >5 Billion records/day and growing
Query latest data and quick response
• Identify and resolve issues quickly
Can search for individual records
• Easy for troubleshooting
35© Cloudera, Inc. All rights reserved.
Xiaomi big data analytics pipeline
Large ETL pipeline delays
● High data visibility latency
(from 1 hour up to 1 day)
● Data format conversion woes
Ordering issues
● Log arrival (storage) not
exactly in correct order
● Must read 2 – 3 days of data
to get all of the data points
for a single day
36© Cloudera, Inc. All rights reserved.
Xiaomi big data analytics pipeline
Simplified with Kudu
Low latency ETL pipeline
● ~10s data latency
● For apps that need to avoid
direct backpressure or need
ETL for record enrichment
Direct zero-latency path
● For apps that can tolerate
backpressure and can use the
NoSQL APIs
● Apps that don’t need ETL
enrichment for storage /
retrieval
OLAP scan
Side table lookup
Result store
37© Cloudera, Inc. All rights reserved.
Conclusions
• Lambda has a real place in big data architectures
• Optimize as needed, but beware of the cost of premature optimization
• Kudu is designed to be a simple solution for when you need a data store that’s
updatable and provides “good enough” performance for analytic and real time
workloads simultaneously

Contenu connexe

Tendances

Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in DeltaDatabricks
 
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseCloudera, Inc.
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache icebergAlluxio, Inc.
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Cloudera, Inc.
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Delta Lake with Azure Databricks
Delta Lake with Azure DatabricksDelta Lake with Azure Databricks
Delta Lake with Azure DatabricksDustin Vannoy
 
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar ZecevicDataScienceConferenc1
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache KuduJeff Holoman
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Databricks
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...Databricks
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAAdam Doyle
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseDataWorks Summit/Hadoop Summit
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
An overview of Neo4j Internals
An overview of Neo4j InternalsAn overview of Neo4j Internals
An overview of Neo4j InternalsTobias Lindaaker
 
Real-Time Market Data Analytics Using Kafka Streams
Real-Time Market Data Analytics Using Kafka StreamsReal-Time Market Data Analytics Using Kafka Streams
Real-Time Market Data Analytics Using Kafka Streamsconfluent
 

Tendances (20)

Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
 
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
 
Kudu Deep-Dive
Kudu Deep-DiveKudu Deep-Dive
Kudu Deep-Dive
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Delta Lake with Azure Databricks
Delta Lake with Azure DatabricksDelta Lake with Azure Databricks
Delta Lake with Azure Databricks
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
 
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
An overview of Neo4j Internals
An overview of Neo4j InternalsAn overview of Neo4j Internals
An overview of Neo4j Internals
 
Real-Time Market Data Analytics Using Kafka Streams
Real-Time Market Data Analytics Using Kafka StreamsReal-Time Market Data Analytics Using Kafka Streams
Real-Time Market Data Analytics Using Kafka Streams
 

Similaire à Moving Beyond Lambda Architectures with Apache Kudu

From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...DataWorks Summit
 
Cloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native appsCloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native appsVMware Tanzu
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerUnlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerCloudera, Inc.
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaJeffrey T. Pollock
 
Idera live 2021: Managing Databases in the Cloud - the First Step, a Succes...
Idera live 2021:   Managing Databases in the Cloud - the First Step, a Succes...Idera live 2021:   Managing Databases in the Cloud - the First Step, a Succes...
Idera live 2021: Managing Databases in the Cloud - the First Step, a Succes...IDERA Software
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopJosh Patterson
 
Cloud - NDT - Presentation
Cloud - NDT - PresentationCloud - NDT - Presentation
Cloud - NDT - PresentationÉric Dusablon
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016StampedeCon
 
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)Uri Laserson
 
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiModern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiSlim Baltagi
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeData Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeDenodo
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...Alluxio, Inc.
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Introducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by JaseelaIntroducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by JaseelaStudent
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Databricks
 

Similaire à Moving Beyond Lambda Architectures with Apache Kudu (20)

From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
 
Cloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native appsCloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native apps
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerUnlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator Optimizer
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
 
Idera live 2021: Managing Databases in the Cloud - the First Step, a Succes...
Idera live 2021:   Managing Databases in the Cloud - the First Step, a Succes...Idera live 2021:   Managing Databases in the Cloud - the First Step, a Succes...
Idera live 2021: Managing Databases in the Cloud - the First Step, a Succes...
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
Cloud - NDT - Presentation
Cloud - NDT - PresentationCloud - NDT - Presentation
Cloud - NDT - Presentation
 
Kudu austin oct 2015.pptx
Kudu austin oct 2015.pptxKudu austin oct 2015.pptx
Kudu austin oct 2015.pptx
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
 
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)
 
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiModern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeData Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data Lake
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Introducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by JaseelaIntroducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by Jaseela
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
 

Plus de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Plus de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Dernier

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 

Dernier (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 

Moving Beyond Lambda Architectures with Apache Kudu

  • 1. 1© Cloudera, Inc. All rights reserved. Michael Crutcher Director, Product Management - Storage Lambda Architecture
  • 2. 2© Cloudera, Inc. All rights reserved. Agenda • Big Data Challenges • What is Lambda? • Lambda Advantages and Disadvantages • Kudu as a Lambda alternative
  • 3. 3© Cloudera, Inc. All rights reserved. Big Data Challenges
  • 4. 4© Cloudera, Inc. All rights reserved. “Something interesting is happening” The world’s largest taxi company owns ZERO vehicles. The world’s largest accommodation provider owns ZERO real estate. The world’s most popular media owner creates ZERO content. The world’s leading music platform owns no music.
  • 5. 5© Cloudera, Inc. All rights reserved. Data is now a strategic asset Instrumentation Consumerization Experimentation Today, everything that can be measured will be measured. Today, data IS the application. Today, becoming data-driven is a business imperative.
  • 6. 6© Cloudera, Inc. All rights reserved. “It will soon be technically feasible & affordable to record & store everything…” — New York Times “Digital technologies will, in the near future, accomplish many tasks once considered uniquely human.” . — Second Machine Age Data is abundant, diverse & shared freely As is how we store, process and analyze it Streaming Machine Learning BI ETL Modeling
  • 7. 7© Cloudera, Inc. All rights reserved. The new analytics paradigm Understand why it happened Change what happens next Determine what happened Make it happen consistently
  • 8. 8© Cloudera, Inc. All rights reserved. So Why Big Data? What does the reporting look like at your business today? What if it could happen in half the time, or half that time? What data are you looking at? What data do you want to know about your customers? How can you best use external data? Too often data is archived, combined, or simplified to save space and strain on systems. Once data is combined we loose the ability to dig deeper. Better Business Forecasting Better Views of CustomersFull Fidelity Data Access
  • 9. 9© Cloudera, Inc. All rights reserved. What is Lambda architecture?
  • 10. 10© Cloudera, Inc. All rights reserved. What is Lambda Architecture? Batch Layer Serving Layer Speed Layer New Data Data Lake (HDFS) Precompute Views Stream or Micro Batch Increment Views Data Application “Real-time” Increment Batch Recompute Merge Hadoop Storm/Spark HBase Impala
  • 11. 11© Cloudera, Inc. All rights reserved. Batch Layer • Manages the master data set, an immutable, append-only set of raw data • Pre-computes views of the data • “Traditionally” this has been in HDFS and processed with Map/Reduce • There has already been some shift to cloud based object storage and processing in other frameworks like Spark
  • 12. 12© Cloudera, Inc. All rights reserved. Speed Layer • This layer ingests streaming data or micro-batches • Spark and Storm are traditionally used • In some cases micro-batches are directly ingested into NoSQL data stores like HBase • This data is periodically expunged • In many “Lambda-like” architectures I’ve seen, this layer is used to provide an “active partition” that provides a limited window of mutability
  • 13. 13© Cloudera, Inc. All rights reserved. Serving Layer • As you might guess from the name, this is the layer that serves data • It would be unusual for raw data to be served directly • This could be an application written directly against a data store like HBase • It could be a SQL engine on top of a file system, Impala + Parquet is an example
  • 14. 14© Cloudera, Inc. All rights reserved. What is a Kappa Architecture? Batch Layer Serving Layer Speed Layer New Data Data Lake (HDFS) Precompute Views Stream or Micro Batch Increment Views Data Application “Real-time” Increment Batch Recompute Merge Hadoop Storm/Spark HBase Impala
  • 15. 15© Cloudera, Inc. All rights reserved. Everything Has a New Name Batch Layer Serving Layer Speed Layer New Data Data Lake (HDFS) Precompute Views Stream or Micro Batch Increment Views Data Application “Real-time” Increment Batch Recompute Merge System of Record (OLTP) Operational Data Store Derived Tables (EDW) In-Memory Database Star/Snowflake, Cubes, or In-Memory Tables
  • 16. 16© Cloudera, Inc. All rights reserved. The Log as Storage • Lambda and Kappa architectures are both predicated on immutable source data • Data can be modeled as a series of events recorded at specific points in time about entities • Updates are modeled as new events and the current or historic value associated with an entity can be reconstructed through the collected events • Kappa calls this ordered set of events “a log”, it’s safe to say they didn’t invent this term A B C D B E F A G B 1 10 Ordered over Time
  • 17. 17© Cloudera, Inc. All rights reserved. Is Raw Data the Right Logical Model? • It’s possible to derive many higher level logical abstractions from raw data • As an example, I could construct a customer account balance from raw account activity data • This doesn’t mean it’s a good idea A B C B A C B A A C t0 t12Account Activity +$10 +$20 +$15 -$10 +$35 -$5 +$25 +$15 -$20 +$10 Easy: What was the last account event for Customer C? Harder: What is the account balance for Customer A at t12?
  • 18. 18© Cloudera, Inc. All rights reserved. There are Only Two Hard CS Problems 1) Cache invalidation 2) Naming things -- Phil Karlton
  • 19. 19© Cloudera, Inc. All rights reserved. Data Engineering has one hard problem • When should I denormalize to maximize performance? • When should I normalize to minimize maintenance problems? Denormalize Everything! Normalize Everything! I wish things were faster! I wish things were easier to maintain!
  • 20. 20© Cloudera, Inc. All rights reserved. Lambda Advantages and Disadvantages
  • 21. 21© Cloudera, Inc. All rights reserved. Lambda Advantages • Marries diverse strengths of existing open source software into a unified architecture • Provides scalability via the batch layer • Provides real time performance via the speed layer
  • 22. 22© Cloudera, Inc. All rights reserved. Lambda Disadvantages • Complexity • Many moving parts • Restatement is difficult • Two code bases must be kept in sync • Proper failure handling is complex
  • 23. 23© Cloudera, Inc. All rights reserved. Lambda Complexity Batch Layer Serving Layer Speed Layer New Data Data Lake (HDFS) Precompute Views Stream or Micro Batch Increment Views Data Application “Real-time” Increment Batch Recompute Merge Hadoop Storm/Spark HBase Impala Code must be kept in sync Restatement is difficult
  • 24. 24© Cloudera, Inc. All rights reserved. Lambda Complexity Batch Layer Serving Layer Speed Layer New Data Data Lake (HDFS) Precompute Views Stream or Micro Batch Increment Views Data Application “Real-time” Increment Batch Recompute Merge Hadoop Storm/Spark HBase Impala Hmm… this data looks fishy Problem Here? Here? Here? Here? Here? Here? Here?
  • 25. 25© Cloudera, Inc. All rights reserved. The Log as Storage • The idea of representing data as immutable log information is not new and is not without tradeoffs: • Space amplification: how many bytes of data are stored, relative to how many logical bytes the database contains • Write amplification: how many bytes of data are written by the database compared to the number of bytes changed by the user • Read amplification: how many bytes the database has to physically read to return values to the user compared to the bytes returned • Complexity: am I solving a CS problem or a customer problem? • These are not simple issues and there’s no straightforward “right” answer
  • 26. 26© Cloudera, Inc. All rights reserved. Premature Optimization Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. --Donald Knuth
  • 27. 27© Cloudera, Inc. All rights reserved. Gap Filling vs. Optimization • Some Lambda implementations are deployed on big data systems that don’t require significant optimization to deliver desired SLAs • Often, Lambda architectures are used to fill the very stark difference in workload processing capabilities of technologies that are used typically used for the batch (long scan) and fast layers (quick point lookups) • Anecdotally, Lambda architectures seem to be deployed much more often with current generation open source technology than they were with legacy commercial offerings • Part of this is because of data volume, variety, and velocity caused by our increasingly data driven world, but I think part of this is also because legacy technologies haven’t had as stark of a difference in what workloads they’re optimal for • Are you deploying a Lambda architecture because you need to squeeze out all of the performance possible, or because you have a mixed workload that can’t be deployed on one single storage technology?
  • 28. 28© Cloudera, Inc. All rights reserved. Gap Filling v2: Lack of Mutability • Some Lambda implementations aim to fill the gap of the lack of mutability in HDFS • Raw, master data should be immutable, but in the real world raw data could potentially need to be adjusted • Sensors could have been miscalibrated, data may have been incorrectly entered, raw data might be an approximation before finalization, etc. • Derived aggregations might more efficiently modified in place, vs. recalculated from raw data, recalculating all of history is often not practically possible Incoming Data (Messaging System) New Partition Most Recent Partition Historic Data HBase Parquet File • Wait for running operations to complete • Define new Impala partition referencing the newly written Parquet file Reporting Request Impala on HDFS
  • 29. 29© Cloudera, Inc. All rights reserved. Kudu as a Lambda alternative
  • 30. 30© Cloudera, Inc. All rights reserved. HDFS Fast Scans, Analytics and Processing of Stored Data Fast On-Line Updates & Data Serving Arbitrary Storage (Active Archive) Fast Analytics (on fast-changing or frequently-updated data) Kudu: Fast Analytics on Fast-Changing Data New storage engine enables new Hadoop use cases Unchanging Fast Changing Frequent Updates HBase Append-Only Real-Time Kudu Kudu fills the Gap Modern analytic applications often require complex data flow & difficult integration work to move data between HBase & HDFS Analytic Gap Pace of Analysis PaceofData
  • 31. 31© Cloudera, Inc. All rights reserved. Kudu Increases the Value of Time Series Data Time Series Inserts, updates, scans, lookups Workload Examples Stream market data, fraud detection & prevention, risk monitoring Time series data is most valuable if you can analyze it to change outcomes in real time. Kudu simulateneously enables: • Time series data inserted/updated as it arrives • Analytic scans to find trends on fresh time series data • Lookups to quickly visit the point in time where an event occurred for further investigation
  • 32. 32© Cloudera, Inc. All rights reserved. Kudu can help spot problems before they happen. Real-time data inserts with the ability to analyze trends identifies potential problems. Kudu identifies trouble through: • Extreme scale, allowing better historic trend analysis • Fast inserts to enable an up-to-date view of your business • Fast scans identify/flag undesired states for remedy Kudu Keeps Your Business Operational Machine Data Analytics Inserts, scans, lookups Workload Examples Network threat detection, IoT, predictive maintenance and failure detection
  • 33. 33© Cloudera, Inc. All rights reserved. More Versatility in Online Reporting Online Reporting Inserts, updates, scans, lookups Workload Examples “Active” Reporting Online reporting has traditionally been limited by data volume and analytic capabilitiy, keeping only recent data designed for granular queries. Kudu adds online reporting versatility through: • Fast inserts and updates to keep data fresh • Fast lookups and analytic scans in one data store
  • 34. 34© Cloudera, Inc. All rights reserved. Xiaomi use case • World’s 4th largest smart-phone maker (most popular in China) • Gather important RPC tracing events from mobile app and backend service. • Service monitoring & troubleshooting tool. High write throughput • >5 Billion records/day and growing Query latest data and quick response • Identify and resolve issues quickly Can search for individual records • Easy for troubleshooting
  • 35. 35© Cloudera, Inc. All rights reserved. Xiaomi big data analytics pipeline Large ETL pipeline delays ● High data visibility latency (from 1 hour up to 1 day) ● Data format conversion woes Ordering issues ● Log arrival (storage) not exactly in correct order ● Must read 2 – 3 days of data to get all of the data points for a single day
  • 36. 36© Cloudera, Inc. All rights reserved. Xiaomi big data analytics pipeline Simplified with Kudu Low latency ETL pipeline ● ~10s data latency ● For apps that need to avoid direct backpressure or need ETL for record enrichment Direct zero-latency path ● For apps that can tolerate backpressure and can use the NoSQL APIs ● Apps that don’t need ETL enrichment for storage / retrieval OLAP scan Side table lookup Result store
  • 37. 37© Cloudera, Inc. All rights reserved. Conclusions • Lambda has a real place in big data architectures • Optimize as needed, but beware of the cost of premature optimization • Kudu is designed to be a simple solution for when you need a data store that’s updatable and provides “good enough” performance for analytic and real time workloads simultaneously