SlideShare une entreprise Scribd logo
1  sur  40
1© Cloudera, Inc. All rights reserved.
Michael Crutcher
Director, Product Management - Storage
Lambda Architecture
2© Cloudera, Inc. All rights reserved.
Agenda
• Big Data Challenges
• What is Lambda?
• Lambda Advantages and Disadvantages
• Kudu as a Lambda alternative
3© Cloudera, Inc. All rights reserved.
Big Data Challenges
4© Cloudera, Inc. All rights reserved.
“Something interesting is happening”
The world’s largest
taxi company owns
ZERO vehicles.
The world’s largest
accommodation provider
owns ZERO real estate.
The world’s most
popular media owner
creates ZERO content.
The world’s leading
music platform owns
no music.
5© Cloudera, Inc. All rights reserved.
Data is now a strategic asset
Instrumentation
Consumerization
Experimentation
Today, everything that can be
measured will be measured.
Today, data IS the
application.
Today, becoming data-driven
is a business imperative.
6© Cloudera, Inc. All rights reserved.
“It will soon be technically
feasible & affordable to
record & store everything…”
— New York Times
“Digital technologies will, in
the near future, accomplish
many tasks once considered
uniquely human.”
.
— Second Machine Age
Data is abundant,
diverse & shared freely
As is how we store,
process and analyze it
Streaming Machine Learning BI
ETL Modeling
7© Cloudera, Inc. All rights reserved.
The new analytics paradigm
Understand
why it
happened
Change
what
happens
next
Determine
what
happened
Make it
happen
consistently
8© Cloudera, Inc. All rights reserved.
So Why Big Data?
What does the reporting look
like at your business today?
What if it could happen in half
the time, or half that time?
What data are you looking at?
What data do you want to know
about your customers? How
can you best use external data?
Too often data is archived,
combined, or simplified to save
space and strain on systems.
Once data is combined we loose
the ability to dig deeper.
Better Business Forecasting Better Views of CustomersFull Fidelity Data Access
9© Cloudera, Inc. All rights reserved.
What is Lambda architecture?
10© Cloudera, Inc. All rights reserved.
What is Lambda Architecture?
Batch Layer
Serving Layer
Speed Layer
New Data
Data Lake
(HDFS)
Precompute
Views
Stream or
Micro Batch
Increment
Views
Data
Application
“Real-time” Increment
Batch Recompute
Merge
Hadoop
Storm/Spark
HBase
Impala
11© Cloudera, Inc. All rights reserved.
Batch Layer
• Manages the master data set, an immutable, append-only set of raw data
• Pre-computes views of the data
• “Traditionally” this has been in HDFS and processed with Map/Reduce
• There has already been some shift to cloud based object storage and processing
in other frameworks like Spark
12© Cloudera, Inc. All rights reserved.
Speed Layer
• This layer ingests streaming data or micro-batches
• Spark and Storm are traditionally used
• In some cases micro-batches are directly ingested into NoSQL data stores like
HBase
• This data is periodically expunged
• In many “Lambda-like” architectures I’ve seen, this layer is used to provide an
“active partition” that provides a limited window of mutability
13© Cloudera, Inc. All rights reserved.
Serving Layer
• As you might guess from the name, this is the layer that serves data
• It would be unusual for raw data to be served directly
• This could be an application written directly against a data store like HBase
• It could be a SQL engine on top of a file system, Impala + Parquet is an example
14© Cloudera, Inc. All rights reserved.
What is a Kappa Architecture?
Batch Layer
Serving Layer
Speed Layer
New Data
Data Lake
(HDFS)
Precompute
Views
Stream or
Micro Batch
Increment
Views
Data
Application
“Real-time” Increment
Batch Recompute
Merge
Hadoop
Storm/Spark
HBase
Impala
15© Cloudera, Inc. All rights reserved.
Everything Has a New Name
Batch Layer
Serving Layer
Speed Layer
New Data
Data Lake
(HDFS)
Precompute
Views
Stream or
Micro Batch
Increment
Views
Data
Application
“Real-time” Increment
Batch Recompute
Merge
System of Record (OLTP)
Operational Data Store Derived Tables (EDW)
In-Memory Database
Star/Snowflake, Cubes,
or In-Memory Tables
16© Cloudera, Inc. All rights reserved.
The Log as Storage
• Lambda and Kappa architectures are both predicated on immutable source data
• Data can be modeled as a series of events recorded at specific points in time
about entities
• Updates are modeled as new events and the current or historic value associated
with an entity can be reconstructed through the collected events
• Kappa calls this ordered set of events “a log”, it’s safe to say they didn’t invent
this term
A B C D B E F A G B
1 10
Ordered over Time
17© Cloudera, Inc. All rights reserved.
Is Raw Data the Right Logical Model?
• It’s possible to derive many higher level logical abstractions from raw data
• As an example, I could construct a customer account balance from raw account
activity data
• This doesn’t mean it’s a good idea
A B C B A C B A A C
t0 t12Account Activity
+$10 +$20 +$15 -$10 +$35 -$5 +$25 +$15 -$20 +$10
Easy:
What was the last account event for Customer C?
Harder:
What is the account balance for Customer A at t12?
18© Cloudera, Inc. All rights reserved.
There are Only Two Hard CS Problems
1) Cache invalidation
2) Naming things
-- Phil Karlton
19© Cloudera, Inc. All rights reserved.
Data Engineering has one hard problem
• When should I denormalize to maximize performance?
• When should I normalize to minimize maintenance problems?
Denormalize Everything!
Normalize Everything!
I wish things were faster!
I wish things were easier
to maintain!
20© Cloudera, Inc. All rights reserved.
Lambda Advantages and
Disadvantages
21© Cloudera, Inc. All rights reserved.
Lambda Advantages
• Marries diverse strengths of existing open source software into a unified
architecture
• Provides scalability via the batch layer
• Provides real time performance via the speed layer
22© Cloudera, Inc. All rights reserved.
Lambda Disadvantages
• Complexity
• Many moving parts
• Restatement is difficult
• Two code bases must be kept in sync
• Proper failure handling is complex
23© Cloudera, Inc. All rights reserved.
Lambda Complexity
Batch Layer
Serving Layer
Speed Layer
New Data
Data Lake
(HDFS)
Precompute
Views
Stream or
Micro Batch
Increment
Views
Data
Application
“Real-time” Increment
Batch Recompute
Merge
Hadoop
Storm/Spark
HBase
Impala
Code must be kept in sync
Restatement is difficult
24© Cloudera, Inc. All rights reserved.
Lambda Complexity
Batch Layer
Serving Layer
Speed Layer
New Data
Data Lake
(HDFS)
Precompute
Views
Stream or
Micro Batch
Increment
Views
Data
Application
“Real-time” Increment
Batch Recompute
Merge
Hadoop
Storm/Spark
HBase
Impala
Hmm… this data looks fishy
Problem Here?
Here?
Here?
Here?
Here?
Here?
Here?
25© Cloudera, Inc. All rights reserved.
The Log as Storage
• The idea of representing data as immutable log information is not new and is not
without tradeoffs:
• Space amplification: how many bytes of data are stored, relative to how many
logical bytes the database contains
• Write amplification: how many bytes of data are written by the database
compared to the number of bytes changed by the user
• Read amplification: how many bytes the database has to physically read to
return values to the user compared to the bytes returned
• Complexity: am I solving a CS problem or a customer problem?
• These are not simple issues and there’s no straightforward “right” answer
26© Cloudera, Inc. All rights reserved.
Premature Optimization
Programmers waste enormous amounts of time thinking about, or worrying about,
the speed of noncritical parts of their programs, and these attempts at efficiency
actually have a strong negative impact when debugging and maintenance are
considered. We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil.
--Donald Knuth
27© Cloudera, Inc. All rights reserved.
Gap Filling vs. Optimization
• Some Lambda implementations are deployed on big data systems that don’t require
significant optimization to deliver desired SLAs
• Often, Lambda architectures are used to fill the very stark difference in workload
processing capabilities of technologies that are used typically used for the batch (long
scan) and fast layers (quick point lookups)
• Anecdotally, Lambda architectures seem to be deployed much more often with current
generation open source technology than they were with legacy commercial offerings
• Part of this is because of data volume, variety, and velocity caused by our increasingly
data driven world, but I think part of this is also because legacy technologies haven’t had
as stark of a difference in what workloads they’re optimal for
• Are you deploying a Lambda architecture because you need to squeeze out all of the
performance possible, or because you have a mixed workload that can’t be deployed on
one single storage technology?
28© Cloudera, Inc. All rights reserved.
Gap Filling v2: Lack of Mutability
• Some Lambda implementations aim to fill the gap of
the lack of mutability in HDFS
• Raw, master data should be immutable, but in the real
world raw data could potentially need to be adjusted
• Sensors could have been miscalibrated, data may have
been incorrectly entered, raw data might be an
approximation before finalization, etc.
• Derived aggregations might more efficiently modified
in place, vs. recalculated from raw data, recalculating
all of history is often not practically possible
Incoming Data
(Messaging
System)
New Partition
Most Recent Partition
Historic Data
HBase
Parquet
File
• Wait for running operations to complete
• Define new Impala partition referencing the
newly written Parquet file
Reporting
Request
Impala on HDFS
29© Cloudera, Inc. All rights reserved.
Kudu as a Lambda alternative
30© Cloudera, Inc. All rights reserved.
HDFS
Fast Scans, Analytics
and Processing of
Stored Data
Fast On-Line
Updates &
Data Serving
Arbitrary Storage
(Active Archive)
Fast Analytics
(on fast-changing or
frequently-updated data)
Kudu: Fast Analytics on Fast-Changing Data
New storage engine enables new Hadoop use cases
Unchanging
Fast Changing
Frequent Updates
HBase
Append-Only
Real-Time
Kudu Kudu fills the Gap
Modern analytic
applications often
require complex data
flow & difficult
integration work to
move data between
HBase & HDFS
Analytic
Gap
Pace of Analysis
PaceofData
31© Cloudera, Inc. All rights reserved.
Validation from CERN – Space Utilization
32© Cloudera, Inc. All rights reserved.
Validation from CERN – Random Lookup
33© Cloudera, Inc. All rights reserved.
Validation from CERN – Scan Rate
34© Cloudera, Inc. All rights reserved.
Kudu Increases the Value of Time Series Data
Time Series
Inserts, updates, scans, lookups
Workload
Examples
Stream market data, fraud detection &
prevention, risk monitoring
Time series data is most valuable if you can
analyze it to change outcomes in real time.
Kudu simulateneously enables:
• Time series data inserted/updated as it arrives
• Analytic scans to find trends on fresh time series data
• Lookups to quickly visit the point in time where an
event occurred for further investigation
35© Cloudera, Inc. All rights reserved.
Kudu can help spot problems before they
happen. Real-time data inserts with the ability to
analyze trends identifies potential problems.
Kudu identifies trouble through:
• Extreme scale, allowing better historic trend analysis
• Fast inserts to enable an up-to-date view of your
business
• Fast scans identify/flag undesired states for remedy
Kudu Keeps Your Business Operational
Machine Data
Analytics
Inserts, scans, lookups
Workload
Examples
Network threat detection, IoT, predictive
maintenance and failure detection
36© Cloudera, Inc. All rights reserved.
More Versatility in Online Reporting
Online
Reporting
Inserts, updates, scans, lookups
Workload
Examples
“Active” Reporting
Online reporting has traditionally been limited by
data volume and analytic capabilitiy, keeping
only recent data designed for granular queries.
Kudu adds online reporting versatility through:
• Fast inserts and updates to keep data fresh
• Fast lookups and analytic scans in one data store
37© Cloudera, Inc. All rights reserved.
Xiaomi use case
• World’s 4th largest smart-phone maker (most popular in China)
• Gather important RPC tracing events from mobile app and backend service.
• Service monitoring & troubleshooting tool.
High write throughput
• >5 Billion records/day and growing
Query latest data and quick response
• Identify and resolve issues quickly
Can search for individual records
• Easy for troubleshooting
38© Cloudera, Inc. All rights reserved.
Xiaomi big data analytics pipeline
Large ETL pipeline delays
● High data visibility latency
(from 1 hour up to 1 day)
● Data format conversion woes
Ordering issues
● Log arrival (storage) not
exactly in correct order
● Must read 2 – 3 days of data
to get all of the data points
for a single day
39© Cloudera, Inc. All rights reserved.
Xiaomi big data analytics pipeline
Simplified with Kudu
Low latency ETL pipeline
● ~10s data latency
● For apps that need to avoid
direct backpressure or need
ETL for record enrichment
Direct zero-latency path
● For apps that can tolerate
backpressure and can use the
NoSQL APIs
● Apps that don’t need ETL
enrichment for storage /
retrieval
OLAP scan
Side table lookup
Result store
40© Cloudera, Inc. All rights reserved.
Conclusions
• Lambda has a real place in big data architectures
• Optimize as needed, but beware of the cost of premature optimization
• Kudu is designed to be a simple solution for when you need a data store that’s
updatable and provides “good enough” performance for analytic and real time
workloads simultaneously

Contenu connexe

Tendances

Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingDatabricks
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkDongwon Kim
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive QueriesOwen O'Malley
 
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudHBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudMichael Stack
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & DeltaDatabricks
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache icebergAlluxio, Inc.
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeKent Graziano
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
iceberg introduction.pptx
iceberg introduction.pptxiceberg introduction.pptx
iceberg introduction.pptxDori Waldman
 
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GCHadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GCErik Krogen
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data EngineeringHarald Erb
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guidelarsgeorge
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta LakeKnoldus Inc.
 

Tendances (20)

Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudHBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
 
Kudu Deep-Dive
Kudu Deep-DiveKudu Deep-Dive
Kudu Deep-Dive
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
iceberg introduction.pptx
iceberg introduction.pptxiceberg introduction.pptx
iceberg introduction.pptx
 
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GCHadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data Engineering
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guide
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta Lake
 

En vedette

Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldCloudera, Inc.
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudCloudera, Inc.
 
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the CloudData Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the CloudCloudera, Inc.
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...Cloudera, Inc.
 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester WebinarCloudera, Inc.
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Cloudera, Inc.
 
Enabling the Connected Car Revolution

Enabling the Connected Car Revolution
Enabling the Connected Car Revolution

Enabling the Connected Car Revolution
Cloudera, Inc.
 
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Cloudera, Inc.
 
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)Securing the Data Hub--Protecting your Customer IP (Technical Workshop)
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)Cloudera, Inc.
 
Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)Cloudera, Inc.
 
The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)Cloudera, Inc.
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsCloudera, Inc.
 
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)Cloudera, Inc.
 
The role of Big Data and Modern Data Management in Driving a Customer 360 fro...
The role of Big Data and Modern Data Management in Driving a Customer 360 fro...The role of Big Data and Modern Data Management in Driving a Customer 360 fro...
The role of Big Data and Modern Data Management in Driving a Customer 360 fro...Cloudera, Inc.
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache KuduJeff Holoman
 

En vedette (20)

Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
 
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the CloudData Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester Webinar
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr

 
Enabling the Connected Car Revolution

Enabling the Connected Car Revolution
Enabling the Connected Car Revolution

Enabling the Connected Car Revolution

 
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1

 
Top 5 IoT Use Cases
Top 5 IoT Use CasesTop 5 IoT Use Cases
Top 5 IoT Use Cases
 
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)Securing the Data Hub--Protecting your Customer IP (Technical Workshop)
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)
 
Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
 
The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
The role of Big Data and Modern Data Management in Driving a Customer 360 fro...
The role of Big Data and Modern Data Management in Driving a Customer 360 fro...The role of Big Data and Modern Data Management in Driving a Customer 360 fro...
The role of Big Data and Modern Data Management in Driving a Customer 360 fro...
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 
Apache kudu
Apache kuduApache kudu
Apache kudu
 

Similaire à Part 1: Lambda Architectures: Simplified by Apache Kudu

From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...DataWorks Summit
 
Cloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native appsCloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native appsVMware Tanzu
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerUnlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerCloudera, Inc.
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaJeffrey T. Pollock
 
Idera live 2021: Managing Databases in the Cloud - the First Step, a Succes...
Idera live 2021:   Managing Databases in the Cloud - the First Step, a Succes...Idera live 2021:   Managing Databases in the Cloud - the First Step, a Succes...
Idera live 2021: Managing Databases in the Cloud - the First Step, a Succes...IDERA Software
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopJosh Patterson
 
Cloud - NDT - Presentation
Cloud - NDT - PresentationCloud - NDT - Presentation
Cloud - NDT - PresentationÉric Dusablon
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016StampedeCon
 
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)Uri Laserson
 
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiModern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiSlim Baltagi
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeData Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeDenodo
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...Alluxio, Inc.
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Introducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by JaseelaIntroducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by JaseelaStudent
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Databricks
 

Similaire à Part 1: Lambda Architectures: Simplified by Apache Kudu (20)

From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
 
Cloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native appsCloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native apps
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerUnlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator Optimizer
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
 
Idera live 2021: Managing Databases in the Cloud - the First Step, a Succes...
Idera live 2021:   Managing Databases in the Cloud - the First Step, a Succes...Idera live 2021:   Managing Databases in the Cloud - the First Step, a Succes...
Idera live 2021: Managing Databases in the Cloud - the First Step, a Succes...
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
Cloud - NDT - Presentation
Cloud - NDT - PresentationCloud - NDT - Presentation
Cloud - NDT - Presentation
 
Kudu austin oct 2015.pptx
Kudu austin oct 2015.pptxKudu austin oct 2015.pptx
Kudu austin oct 2015.pptx
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
 
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)
 
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiModern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeData Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data Lake
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Introducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by JaseelaIntroducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by Jaseela
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
 

Plus de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Plus de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Dernier

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 

Dernier (20)

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 

Part 1: Lambda Architectures: Simplified by Apache Kudu

  • 1. 1© Cloudera, Inc. All rights reserved. Michael Crutcher Director, Product Management - Storage Lambda Architecture
  • 2. 2© Cloudera, Inc. All rights reserved. Agenda • Big Data Challenges • What is Lambda? • Lambda Advantages and Disadvantages • Kudu as a Lambda alternative
  • 3. 3© Cloudera, Inc. All rights reserved. Big Data Challenges
  • 4. 4© Cloudera, Inc. All rights reserved. “Something interesting is happening” The world’s largest taxi company owns ZERO vehicles. The world’s largest accommodation provider owns ZERO real estate. The world’s most popular media owner creates ZERO content. The world’s leading music platform owns no music.
  • 5. 5© Cloudera, Inc. All rights reserved. Data is now a strategic asset Instrumentation Consumerization Experimentation Today, everything that can be measured will be measured. Today, data IS the application. Today, becoming data-driven is a business imperative.
  • 6. 6© Cloudera, Inc. All rights reserved. “It will soon be technically feasible & affordable to record & store everything…” — New York Times “Digital technologies will, in the near future, accomplish many tasks once considered uniquely human.” . — Second Machine Age Data is abundant, diverse & shared freely As is how we store, process and analyze it Streaming Machine Learning BI ETL Modeling
  • 7. 7© Cloudera, Inc. All rights reserved. The new analytics paradigm Understand why it happened Change what happens next Determine what happened Make it happen consistently
  • 8. 8© Cloudera, Inc. All rights reserved. So Why Big Data? What does the reporting look like at your business today? What if it could happen in half the time, or half that time? What data are you looking at? What data do you want to know about your customers? How can you best use external data? Too often data is archived, combined, or simplified to save space and strain on systems. Once data is combined we loose the ability to dig deeper. Better Business Forecasting Better Views of CustomersFull Fidelity Data Access
  • 9. 9© Cloudera, Inc. All rights reserved. What is Lambda architecture?
  • 10. 10© Cloudera, Inc. All rights reserved. What is Lambda Architecture? Batch Layer Serving Layer Speed Layer New Data Data Lake (HDFS) Precompute Views Stream or Micro Batch Increment Views Data Application “Real-time” Increment Batch Recompute Merge Hadoop Storm/Spark HBase Impala
  • 11. 11© Cloudera, Inc. All rights reserved. Batch Layer • Manages the master data set, an immutable, append-only set of raw data • Pre-computes views of the data • “Traditionally” this has been in HDFS and processed with Map/Reduce • There has already been some shift to cloud based object storage and processing in other frameworks like Spark
  • 12. 12© Cloudera, Inc. All rights reserved. Speed Layer • This layer ingests streaming data or micro-batches • Spark and Storm are traditionally used • In some cases micro-batches are directly ingested into NoSQL data stores like HBase • This data is periodically expunged • In many “Lambda-like” architectures I’ve seen, this layer is used to provide an “active partition” that provides a limited window of mutability
  • 13. 13© Cloudera, Inc. All rights reserved. Serving Layer • As you might guess from the name, this is the layer that serves data • It would be unusual for raw data to be served directly • This could be an application written directly against a data store like HBase • It could be a SQL engine on top of a file system, Impala + Parquet is an example
  • 14. 14© Cloudera, Inc. All rights reserved. What is a Kappa Architecture? Batch Layer Serving Layer Speed Layer New Data Data Lake (HDFS) Precompute Views Stream or Micro Batch Increment Views Data Application “Real-time” Increment Batch Recompute Merge Hadoop Storm/Spark HBase Impala
  • 15. 15© Cloudera, Inc. All rights reserved. Everything Has a New Name Batch Layer Serving Layer Speed Layer New Data Data Lake (HDFS) Precompute Views Stream or Micro Batch Increment Views Data Application “Real-time” Increment Batch Recompute Merge System of Record (OLTP) Operational Data Store Derived Tables (EDW) In-Memory Database Star/Snowflake, Cubes, or In-Memory Tables
  • 16. 16© Cloudera, Inc. All rights reserved. The Log as Storage • Lambda and Kappa architectures are both predicated on immutable source data • Data can be modeled as a series of events recorded at specific points in time about entities • Updates are modeled as new events and the current or historic value associated with an entity can be reconstructed through the collected events • Kappa calls this ordered set of events “a log”, it’s safe to say they didn’t invent this term A B C D B E F A G B 1 10 Ordered over Time
  • 17. 17© Cloudera, Inc. All rights reserved. Is Raw Data the Right Logical Model? • It’s possible to derive many higher level logical abstractions from raw data • As an example, I could construct a customer account balance from raw account activity data • This doesn’t mean it’s a good idea A B C B A C B A A C t0 t12Account Activity +$10 +$20 +$15 -$10 +$35 -$5 +$25 +$15 -$20 +$10 Easy: What was the last account event for Customer C? Harder: What is the account balance for Customer A at t12?
  • 18. 18© Cloudera, Inc. All rights reserved. There are Only Two Hard CS Problems 1) Cache invalidation 2) Naming things -- Phil Karlton
  • 19. 19© Cloudera, Inc. All rights reserved. Data Engineering has one hard problem • When should I denormalize to maximize performance? • When should I normalize to minimize maintenance problems? Denormalize Everything! Normalize Everything! I wish things were faster! I wish things were easier to maintain!
  • 20. 20© Cloudera, Inc. All rights reserved. Lambda Advantages and Disadvantages
  • 21. 21© Cloudera, Inc. All rights reserved. Lambda Advantages • Marries diverse strengths of existing open source software into a unified architecture • Provides scalability via the batch layer • Provides real time performance via the speed layer
  • 22. 22© Cloudera, Inc. All rights reserved. Lambda Disadvantages • Complexity • Many moving parts • Restatement is difficult • Two code bases must be kept in sync • Proper failure handling is complex
  • 23. 23© Cloudera, Inc. All rights reserved. Lambda Complexity Batch Layer Serving Layer Speed Layer New Data Data Lake (HDFS) Precompute Views Stream or Micro Batch Increment Views Data Application “Real-time” Increment Batch Recompute Merge Hadoop Storm/Spark HBase Impala Code must be kept in sync Restatement is difficult
  • 24. 24© Cloudera, Inc. All rights reserved. Lambda Complexity Batch Layer Serving Layer Speed Layer New Data Data Lake (HDFS) Precompute Views Stream or Micro Batch Increment Views Data Application “Real-time” Increment Batch Recompute Merge Hadoop Storm/Spark HBase Impala Hmm… this data looks fishy Problem Here? Here? Here? Here? Here? Here? Here?
  • 25. 25© Cloudera, Inc. All rights reserved. The Log as Storage • The idea of representing data as immutable log information is not new and is not without tradeoffs: • Space amplification: how many bytes of data are stored, relative to how many logical bytes the database contains • Write amplification: how many bytes of data are written by the database compared to the number of bytes changed by the user • Read amplification: how many bytes the database has to physically read to return values to the user compared to the bytes returned • Complexity: am I solving a CS problem or a customer problem? • These are not simple issues and there’s no straightforward “right” answer
  • 26. 26© Cloudera, Inc. All rights reserved. Premature Optimization Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. --Donald Knuth
  • 27. 27© Cloudera, Inc. All rights reserved. Gap Filling vs. Optimization • Some Lambda implementations are deployed on big data systems that don’t require significant optimization to deliver desired SLAs • Often, Lambda architectures are used to fill the very stark difference in workload processing capabilities of technologies that are used typically used for the batch (long scan) and fast layers (quick point lookups) • Anecdotally, Lambda architectures seem to be deployed much more often with current generation open source technology than they were with legacy commercial offerings • Part of this is because of data volume, variety, and velocity caused by our increasingly data driven world, but I think part of this is also because legacy technologies haven’t had as stark of a difference in what workloads they’re optimal for • Are you deploying a Lambda architecture because you need to squeeze out all of the performance possible, or because you have a mixed workload that can’t be deployed on one single storage technology?
  • 28. 28© Cloudera, Inc. All rights reserved. Gap Filling v2: Lack of Mutability • Some Lambda implementations aim to fill the gap of the lack of mutability in HDFS • Raw, master data should be immutable, but in the real world raw data could potentially need to be adjusted • Sensors could have been miscalibrated, data may have been incorrectly entered, raw data might be an approximation before finalization, etc. • Derived aggregations might more efficiently modified in place, vs. recalculated from raw data, recalculating all of history is often not practically possible Incoming Data (Messaging System) New Partition Most Recent Partition Historic Data HBase Parquet File • Wait for running operations to complete • Define new Impala partition referencing the newly written Parquet file Reporting Request Impala on HDFS
  • 29. 29© Cloudera, Inc. All rights reserved. Kudu as a Lambda alternative
  • 30. 30© Cloudera, Inc. All rights reserved. HDFS Fast Scans, Analytics and Processing of Stored Data Fast On-Line Updates & Data Serving Arbitrary Storage (Active Archive) Fast Analytics (on fast-changing or frequently-updated data) Kudu: Fast Analytics on Fast-Changing Data New storage engine enables new Hadoop use cases Unchanging Fast Changing Frequent Updates HBase Append-Only Real-Time Kudu Kudu fills the Gap Modern analytic applications often require complex data flow & difficult integration work to move data between HBase & HDFS Analytic Gap Pace of Analysis PaceofData
  • 31. 31© Cloudera, Inc. All rights reserved. Validation from CERN – Space Utilization
  • 32. 32© Cloudera, Inc. All rights reserved. Validation from CERN – Random Lookup
  • 33. 33© Cloudera, Inc. All rights reserved. Validation from CERN – Scan Rate
  • 34. 34© Cloudera, Inc. All rights reserved. Kudu Increases the Value of Time Series Data Time Series Inserts, updates, scans, lookups Workload Examples Stream market data, fraud detection & prevention, risk monitoring Time series data is most valuable if you can analyze it to change outcomes in real time. Kudu simulateneously enables: • Time series data inserted/updated as it arrives • Analytic scans to find trends on fresh time series data • Lookups to quickly visit the point in time where an event occurred for further investigation
  • 35. 35© Cloudera, Inc. All rights reserved. Kudu can help spot problems before they happen. Real-time data inserts with the ability to analyze trends identifies potential problems. Kudu identifies trouble through: • Extreme scale, allowing better historic trend analysis • Fast inserts to enable an up-to-date view of your business • Fast scans identify/flag undesired states for remedy Kudu Keeps Your Business Operational Machine Data Analytics Inserts, scans, lookups Workload Examples Network threat detection, IoT, predictive maintenance and failure detection
  • 36. 36© Cloudera, Inc. All rights reserved. More Versatility in Online Reporting Online Reporting Inserts, updates, scans, lookups Workload Examples “Active” Reporting Online reporting has traditionally been limited by data volume and analytic capabilitiy, keeping only recent data designed for granular queries. Kudu adds online reporting versatility through: • Fast inserts and updates to keep data fresh • Fast lookups and analytic scans in one data store
  • 37. 37© Cloudera, Inc. All rights reserved. Xiaomi use case • World’s 4th largest smart-phone maker (most popular in China) • Gather important RPC tracing events from mobile app and backend service. • Service monitoring & troubleshooting tool. High write throughput • >5 Billion records/day and growing Query latest data and quick response • Identify and resolve issues quickly Can search for individual records • Easy for troubleshooting
  • 38. 38© Cloudera, Inc. All rights reserved. Xiaomi big data analytics pipeline Large ETL pipeline delays ● High data visibility latency (from 1 hour up to 1 day) ● Data format conversion woes Ordering issues ● Log arrival (storage) not exactly in correct order ● Must read 2 – 3 days of data to get all of the data points for a single day
  • 39. 39© Cloudera, Inc. All rights reserved. Xiaomi big data analytics pipeline Simplified with Kudu Low latency ETL pipeline ● ~10s data latency ● For apps that need to avoid direct backpressure or need ETL for record enrichment Direct zero-latency path ● For apps that can tolerate backpressure and can use the NoSQL APIs ● Apps that don’t need ETL enrichment for storage / retrieval OLAP scan Side table lookup Result store
  • 40. 40© Cloudera, Inc. All rights reserved. Conclusions • Lambda has a real place in big data architectures • Optimize as needed, but beware of the cost of premature optimization • Kudu is designed to be a simple solution for when you need a data store that’s updatable and provides “good enough” performance for analytic and real time workloads simultaneously