SlideShare une entreprise Scribd logo
1  sur  19
Télécharger pour lire hors ligne
Ge#ng	
  Started	
  with	
  Hadoop:	
  
Opera4onal	
  Data	
  Lake	
  
Rich	
  Reimer	
  
VP,	
  Product	
  Management	
  
rreimer@splicemachine.com	
  
	
  
2	
  
The	
  Big	
  Squeeze	
  
Data	
  growing	
  much	
  faster	
  than	
  IT	
  budgets	
  
Source:	
  2013	
  IBM	
  Briefing	
  Book	
  
Source:	
  Gartner,	
  Worldwide	
  IT,	
  	
  
Spending	
  forecast,	
  3Q13	
  Update	
  
Tradi4onal	
  RDBMSs	
  Giants	
  Overwhelmed…	
  
Scale-­‐up	
  becoming	
  cost-­‐prohibi:ve	
  
Splice	
  Machine	
  |	
  Proprietary	
  &	
  Confiden4al	
  
4	
  
Scale-­‐Out:	
  The	
  Future	
  of	
  Databases	
  
Drama:c	
  improvement	
  in	
  price/performance	
  
	
  
Scale	
  Up	
  
(Increase	
  server	
  size)	
  
Scale	
  Out	
  
(More	
  small	
  servers)	
  
vs.	
  
$ $
 $
 $
 $
 $
5	
  
What	
  is	
  a	
  Data	
  Lake?	
  
•  Scale-­‐out	
  technology	
  
based	
  on	
  Hadoop	
  
•  Data	
  stored	
  in	
  na4ve	
  
formats	
  
6	
  
Schema	
  on	
  Ingest	
  vs.	
  Schema	
  on	
  Read	
  
§  Even	
  “schemaless”	
  MongoDB	
  requires	
  “schema”	
  
-  10	
  Things	
  You	
  Should	
  Know	
  About	
  Running	
  MongoDB	
  At	
  Scale	
  
•  By	
  Asya	
  Kamsky,	
  Principal	
  Solu4ons	
  Architect	
  at	
  MongoDB	
  
•  Item	
  #1	
  –	
  “have	
  a	
  good	
  schema	
  and	
  indexing	
  strategy”	
  
Schema
on Ingest
Schema
on Read
•  Schema on Read
if you only use data
a few times a year
•  Structured data
should always
remain structured
•  Add schema if
data used regularly
Data Stream Application
7	
  
Who	
  Are	
  We?	
  
THE	
  ONLY	
  
HADOOP	
  RDBMS	
  
Replace	
  your	
  old	
  RDBMS	
  
with	
  a	
  scale-­‐out	
  SQL	
  database	
  
Affordable,	
  Scale-­‐Out	
  
ACID	
  Transac4ons	
  
No	
  Applica4on	
  Rewrites	
  
10x	
  	
  
Bemer	
  	
  
Price/Perf	
  
	
  
8	
  
Reference	
  Architecture:	
  Opera4onal	
  Data	
  Lake	
  
Offload	
  real-­‐:me	
  repor:ng	
  and	
  analy:cs	
  from	
  expensive	
  OLTP	
  and	
  DW	
  systems	
  
OLTP
Systems
Ad Hoc
Analytics
Operational
Data Lake
Executive
Business
Reports
Operational
Reports &
Analytics
ERP
CRM
Supply
Chain
HR
…
Data
Warehouse
Datamart
Stream or
Batch
Updates
ETL
Real-Time,
Event-Driven
Apps
Streamlining	
  the	
  Structured	
  Data	
  Pipeline	
  in	
  Hadoop	
  
9	
  
Source
Systems
ERP
…
CRM
Sqoop
Apply
Inferred
Schema
Stored as
flat files
SQL Query Engines BI Tools
Tradi=onal	
  Hadoop	
  Pipeline	
  
vs.	
  
Source
Systems
ERP
…
CRM
Existing
ETL Tool
Stored in
same
schema
BI Tools
Streamlined	
  Hadoop	
  Pipeline	
  
Advantages	
  
•  Reduced	
  opera4onal	
  costs	
  
with	
  less	
  complexity	
  
•  Reduced	
  processing	
  4me	
  and	
  
errors	
  with	
  fewer	
  transla4ons	
  
•  Real-­‐4me	
  updates	
  for	
  data	
  
cleansing	
  
•  Bemer	
  SQL	
  support	
  
10	
  
Streamlining	
  and	
  Hardening	
  the	
  ETL	
  Processing	
  Pipeline	
  
Gracefully	
  handle	
  data	
  quality	
  issues	
  and	
  failed	
  queries	
  without	
  full	
  data	
  reloads	
  
	
  
Issue	
   Hadoop	
  Issues	
   Splice	
  Machine	
  Solu=on	
  
Handle	
  Data	
  
Quality	
  Issues	
  
(e.g.,	
  duplicates)	
  
Hours	
  to	
  correct	
  
✗  Run	
  slow	
  MapReduce	
  job	
  to	
  de-­‐dupe	
  
✗  Reload	
  en4re	
  data	
  set	
  (hours)	
  
Seconds	
  to	
  correct	
  
✓ Insert	
  fails	
  due	
  to	
  constraint	
  viola4on	
  
✓ Rollback	
  flawed	
  updates	
  if	
  necessary	
  
✓ Reject,	
  replace,	
  or	
  merge	
  duplicates	
  with	
  incremental	
  
update	
  (ms	
  to	
  sec)	
  
Update/Delete	
  
Data	
  
Hours	
  to	
  correct	
  
✗  Reload	
  en4re	
  data	
  set	
  (hours)	
  
✗  Writers	
  block	
  readers	
  
Seconds	
  to	
  correct	
  
✓ Correct	
  data	
  and	
  do	
  incremental	
  update	
  (ms	
  to	
  sec)	
  
✓ Consistent	
  view	
  of	
  data	
  even	
  with	
  many	
  concurrent	
  updates	
  
✓ Writers	
  don’t	
  block	
  readers	
  
ETL	
  Failure	
   Hours	
  to	
  correct	
  
✗  Reload	
  en4re	
  data	
  set	
  (hours)	
  
✗  Miss	
  ETL	
  window,	
  leading	
  to	
  either	
  delayed	
  
reports	
  or	
  stale	
  data	
  
Seconds	
  to	
  correct	
  
✓ Rollback	
  failed	
  step	
  
✓ Retry	
  failed	
  step	
  and	
  con4nue	
  
Fast	
  Query	
  Speeds	
   ✗  Results	
  typically	
  no	
  faster	
  than	
  seconds	
  because	
  
data	
  stored	
  in	
  random	
  formats	
  
✗  MapReduce	
  
✓ Results	
  possible	
  in	
  milliseconds	
  because	
  data	
  stored	
  in	
  
highly	
  op4mized	
  format	
  
✓ No	
  MapReduce	
  
11	
  
Complemen4ng	
  Exis4ng	
  Hadoop-­‐Based	
  Data	
  Lakes	
  
Op:mizing	
  storage	
  and	
  querying	
  of	
  structured	
  data	
  as	
  part	
  of	
  ELT	
  or	
  Hadoop	
  query	
  engines	
  
OLTP
Systems
ERP
CRM
Supply
Chain
HR
…
SCHEMA ON
INGEST:
Streamlined,
structured-to-
structured
integration
Structured
Data
Unstructured
Data
1	
  
2	
  
3	
  
SCHEMA BEFORE READ:
Repository for structured data
or metadata from ELT process
on unstructured data
HCATALOG
Pig
SCHEMA ON READ:
Ad-hoc Hadoop queries
across structured and
unstructured data
Case	
  Study:	
  Opera4onal	
  Data	
  Lake	
  
12	
  12	
  
Overview	
  	
  
  Computer	
  technology	
  corpora4on	
  
  Update	
  database	
  technology	
  for:	
  
  ODS	
  layer	
  replacement	
  
  ETL	
  processing	
  and	
  analysis	
  of	
  Omniture	
  data	
  
  Real-­‐4me	
  OLTP	
  for	
  Global	
  Tech	
  Support	
  app	
  
	
  
Challenges	
  
  Oracle	
  and	
  Teradata	
  too	
  expensive	
  to	
  scale	
  
  Many	
  Oracle	
  queries	
  couldn’t	
  complete	
  
  Can	
  only	
  hold	
  7	
  days	
  worth	
  of	
  data	
  in	
  Oracle	
  
  Missing	
  ETL	
  window	
  with	
  current	
  Hadoop	
  data	
  lake	
  
	
  
Solu5on	
  Diagram	
  
	
  
(400TB)	
  
OLTP Systems
ERP
CRM
Supply
Chain
Benefits	
  
75%	
  less	
  cost	
  
with	
  commodity	
  scale	
  out	
  
Incremental	
  ETL	
  processing	
  
gracefully	
  handle	
  data	
  quality	
  issues	
  
5x-­‐10x	
  faster	
  
comple4ng	
  queries	
  on	
  which	
  Oracle	
  failed	
  	
  	
  
	
  
✔	
  
13	
  
Reference	
  Architecture:	
  Unified	
  Customer	
  Profile	
  
Improve	
  marke:ng	
  ROI	
  with	
  deeper	
  customer	
  intelligence	
  and	
  beKer	
  cross-­‐channel	
  coordina:on	
  
Unified
Customer Profile
(aka DMP)
Operational Reports for
Campaign Performance
Social
Feeds
Web/eCommerce
Clickstreams
WebsiteDatamart
Stream or Batch
Updates
BI Tools
Demand Side
Platform (DSP)
Ad Exchange
1st Party/
CRM Data
3rd Party Data
(e.g., Axciom)
Ad Perf. Data
(e.g., Doubleclick)
Email Mktg Data
Call Center Data
POS Data
Email
Marketing
App
Ad Hoc Audience
Segmentation
BI Tools
14	
  
Campaign	
  Management:	
  Harte-­‐Hanks	
  
Overview	
  	
  
  Digital	
  marke4ng	
  services	
  provider	
  
  Unified	
  Customer	
  Profile	
  
  Real-­‐4me	
  campaign	
  management	
  
  Complex	
  OLTP	
  and	
  OLAP	
  environment	
  
	
  
Challenges	
  
  Oracle	
  RAC	
  too	
  expensive	
  to	
  scale	
  
  Queries	
  too	
  slow	
  –	
  even	
  up	
  to	
  ½	
  hour	
  
  Ge#ng	
  worse	
  –	
  expect	
  30-­‐50%	
  data	
  growth	
  
  Looked	
  for	
  9	
  months	
  for	
  a	
  cost-­‐effec4ve	
  solu4on	
  
	
  
Solu5on	
  Diagram	
  
	
  
Ini5al	
  Results	
  
¼	
  cost	
  
with	
  commodity	
  scale	
  out	
  
3-­‐7x	
  faster	
  
through	
  parallelized	
  queries	
  
10-­‐20x	
  price/perf	
  
with	
  no	
  applica4on,	
  BI	
  or	
  ETL	
  rewrites	
  
	
  
Cross-Channel
Campaigns
Real-Time
Personalization
Real-Time Actions
15	
  
Proven	
  Building	
  Blocks:	
  Hadoop	
  and	
  Derby	
  
APACHE	
  DERBY	
  	
  
§  	
  ANSI	
  SQL-­‐99	
  RDBMS	
  
§  	
  Java-­‐based	
  
§  	
  ODBC/JDBC	
  Compliant	
  
	
  
APACHE	
  HBASE/HDFS	
  
§  Auto-­‐sharding	
  
§  Real-­‐4me	
  updates	
  
§  Fault-­‐tolerance	
  
§  Scalability	
  to	
  100s	
  of	
  PBs	
  
§  Data	
  replica4on	
  	
  
	
  
	
  
Typical	
  Database	
  Workloads	
  
16	
  
Opera=onal	
  
Applica=ons	
  
Opera=onal	
  
Repor=ng	
  &	
  Analy=cs	
  
Ad-­‐Hoc	
  Analy=cs	
   Enterprise	
  Data	
  
Warehouses	
  
Typical	
  
Databases	
  
•  MySQL	
  
•  Oracle	
  
•  MongoDB	
  
•  MySQL	
  	
  
•  Oracle	
  
•  Greenplum	
  
•  Paraccel	
  
•  Netezza	
  
•  Teradata	
  
•  Oracle	
  
•  Sybase	
  IQ	
  
Use	
  Cases	
   •  OLTP	
  -­‐	
  ERP,	
  CRM	
  
•  Websites	
  
•  Opera4onal	
  
Datastores	
  
•  Exploratory	
  Analy4cs	
  
•  Data	
  Mining	
  
•  Enterprise	
  Repor4ng	
  
Typical	
  Users	
   •  Customers	
  
•  Opera4onal	
  
Employees	
  
•  Opera4onal	
  
Employees	
  
•  Analysts	
  
•  Data	
  Scien4sts	
  
•  Managers	
  
•  Execu4ves	
  
Workload	
  
Strengths	
  
•  High	
  concurrency	
  of	
  
small	
  reads/	
  writes	
  
•  Range	
  queries	
  
•  Parameterized	
  
reports	
  against	
  real-­‐
4me	
  data	
  
•  Range	
  queries	
  
•  Complex	
  queries	
  
requiring	
  full	
  table	
  
scans	
  
•  Parameterized	
  
reports	
  against	
  
historical	
  data	
  
17	
  
Internet	
  of	
  Things	
  
Opera4onal	
  Data	
  Lake	
  Digital	
  Marke4ng	
  
Personalized	
  	
  
Medicine	
  
Use	
  Cases	
  
Splice	
  Machine	
  |	
  Proprietary	
  &	
  Confiden4al	
  
Fraud	
  Detec4on	
  
18	
  
Opera4onal	
  Data	
  Lake:	
  Great	
  On-­‐Ramp	
  to	
  Big	
  Data	
  
	
  
§  Clear	
  Business	
  Value	
  Now	
  
§  Replace	
  obsolete	
  Opera4onal	
  Data	
  Stores	
  (ODSs)	
  
§  Exis4ng	
  use	
  cases	
  –	
  not	
  just	
  a	
  science	
  project	
  
§  Hadoop	
  RDBMS	
  –	
  inexpensive	
  to	
  store	
  data	
  
§  Incremental	
  On-­‐Ramp	
  to	
  Big	
  Data	
  
§  Start	
  with	
  structured	
  data	
  and	
  then	
  expand	
  to	
  
unstructured	
  
§  Add	
  schema	
  when	
  needed	
  
Ge#ng	
  Started	
  with	
  Hadoop:	
  
Opera4onal	
  Data	
  Lake	
  
Rich	
  Reimer	
  
VP,	
  Product	
  Management	
  
rreimer@splicemachine.com	
  
	
  

Contenu connexe

Tendances

Gobblin' Big Data With Ease @ QConSF 2014
Gobblin' Big Data With Ease @ QConSF 2014Gobblin' Big Data With Ease @ QConSF 2014
Gobblin' Big Data With Ease @ QConSF 2014Lin Qiao
 
Spark meetup - Zoomdata Streaming
Spark meetup  - Zoomdata StreamingSpark meetup  - Zoomdata Streaming
Spark meetup - Zoomdata StreamingZoomdata
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitDataWorks Summit
 
HAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoopHAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoopBigData Research
 
Mutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable WorldMutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable WorldDataWorks Summit
 
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
Show me the Money! Cost & Resource  Tracking for Hadoop and Storm Show me the Money! Cost & Resource  Tracking for Hadoop and Storm
Show me the Money! Cost & Resource Tracking for Hadoop and Storm DataWorks Summit/Hadoop Summit
 
SDM (Standardized Data Management) - A Dynamic Adaptive Ingestion Frameworks ...
SDM (Standardized Data Management) - A Dynamic Adaptive Ingestion Frameworks ...SDM (Standardized Data Management) - A Dynamic Adaptive Ingestion Frameworks ...
SDM (Standardized Data Management) - A Dynamic Adaptive Ingestion Frameworks ...DataWorks Summit
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data InsightsDataWorks Summit
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaCloudera, Inc.
 
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, BlazegraphDatabase Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph✔ Eric David Benari, PMP
 
The DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to ProductionThe DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to ProductionDataWorks Summit/Hadoop Summit
 
Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetFormat Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetDataWorks Summit
 
2 - Trafodion and Hadoop HBase
2 - Trafodion and Hadoop HBase2 - Trafodion and Hadoop HBase
2 - Trafodion and Hadoop HBaseRohit Jain
 

Tendances (19)

Gobblin' Big Data With Ease @ QConSF 2014
Gobblin' Big Data With Ease @ QConSF 2014Gobblin' Big Data With Ease @ QConSF 2014
Gobblin' Big Data With Ease @ QConSF 2014
 
Filling the Data Lake
Filling the Data LakeFilling the Data Lake
Filling the Data Lake
 
Spark meetup - Zoomdata Streaming
Spark meetup  - Zoomdata StreamingSpark meetup  - Zoomdata Streaming
Spark meetup - Zoomdata Streaming
 
HDFS: Optimization, Stabilization and Supportability
HDFS: Optimization, Stabilization and SupportabilityHDFS: Optimization, Stabilization and Supportability
HDFS: Optimization, Stabilization and Supportability
 
Automated Analytics at Scale
Automated Analytics at ScaleAutomated Analytics at Scale
Automated Analytics at Scale
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
 
HAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoopHAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoop
 
Mutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable WorldMutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable World
 
Evolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage SubsystemEvolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage Subsystem
 
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
Show me the Money! Cost & Resource  Tracking for Hadoop and Storm Show me the Money! Cost & Resource  Tracking for Hadoop and Storm
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
 
SDM (Standardized Data Management) - A Dynamic Adaptive Ingestion Frameworks ...
SDM (Standardized Data Management) - A Dynamic Adaptive Ingestion Frameworks ...SDM (Standardized Data Management) - A Dynamic Adaptive Ingestion Frameworks ...
SDM (Standardized Data Management) - A Dynamic Adaptive Ingestion Frameworks ...
 
Spark + HBase
Spark + HBase Spark + HBase
Spark + HBase
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data Insights
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, BlazegraphDatabase Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
 
The DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to ProductionThe DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to Production
 
Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetFormat Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and Parquet
 
2 - Trafodion and Hadoop HBase
2 - Trafodion and Hadoop HBase2 - Trafodion and Hadoop HBase
2 - Trafodion and Hadoop HBase
 

En vedette

Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Zaloni
 
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Chicago Hadoop Users Group
 
HBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsHBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsMichael Stack
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
 
Splice Machine Overview
Splice Machine OverviewSplice Machine Overview
Splice Machine OverviewKunal Gupta
 
Hadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both WorldsHadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both WorldsInside Analysis
 
Crawl, Walk, Run: How to Get Started with Hadoop
Crawl, Walk, Run: How to Get Started with HadoopCrawl, Walk, Run: How to Get Started with Hadoop
Crawl, Walk, Run: How to Get Started with HadoopInside Analysis
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoopnvvrajesh
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...Yahoo Developer Network
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineData Con LA
 
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...Yahoo Developer Network
 

En vedette (11)

Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
 
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
 
HBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsHBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbms
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 
Splice Machine Overview
Splice Machine OverviewSplice Machine Overview
Splice Machine Overview
 
Hadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both WorldsHadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both Worlds
 
Crawl, Walk, Run: How to Get Started with Hadoop
Crawl, Walk, Run: How to Get Started with HadoopCrawl, Walk, Run: How to Get Started with Hadoop
Crawl, Walk, Run: How to Get Started with Hadoop
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
 
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
 

Similaire à Splice machine-bloor-webinar-data-lakes

Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
 
Agile data warehousing
Agile data warehousingAgile data warehousing
Agile data warehousingSneha Challa
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
 
Etl with apache impala by athemaster
Etl with apache impala by athemasterEtl with apache impala by athemaster
Etl with apache impala by athemasterAthemaster Co., Ltd.
 
What's New in SAP Replication Server 15.7.1 SP100
What's New in SAP Replication Server 15.7.1 SP100What's New in SAP Replication Server 15.7.1 SP100
What's New in SAP Replication Server 15.7.1 SP100Dobler Consulting
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMichael Hiskey
 
Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2Connor McDonald
 
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Global Business Events
 
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013StampedeCon
 
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongUnlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongCeph Community
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockJeffrey T. Pollock
 
Oracle Database 11g Lower Your Costs
Oracle Database 11g Lower Your CostsOracle Database 11g Lower Your Costs
Oracle Database 11g Lower Your CostsMark Rabne
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...DataWorks Summit/Hadoop Summit
 
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...Spark Summit
 
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with InnovationNot Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with InnovationInside Analysis
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United AirlinesDataWorks Summit
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauSam Palani
 
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...Mark Rittman
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Precisely
 

Similaire à Splice machine-bloor-webinar-data-lakes (20)

Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Agile data warehousing
Agile data warehousingAgile data warehousing
Agile data warehousing
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Etl with apache impala by athemaster
Etl with apache impala by athemasterEtl with apache impala by athemaster
Etl with apache impala by athemaster
 
What's New in SAP Replication Server 15.7.1 SP100
What's New in SAP Replication Server 15.7.1 SP100What's New in SAP Replication Server 15.7.1 SP100
What's New in SAP Replication Server 15.7.1 SP100
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2
 
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
 
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
 
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongUnlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff Pollock
 
Oracle Database 11g Lower Your Costs
Oracle Database 11g Lower Your CostsOracle Database 11g Lower Your Costs
Oracle Database 11g Lower Your Costs
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
 
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
 
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with InnovationNot Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United Airlines
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
 
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
 

Plus de Edgar Alejandro Villegas

What's New in Predictive Analytics IBM SPSS - Apr 2016
What's New in Predictive Analytics IBM SPSS - Apr 2016What's New in Predictive Analytics IBM SPSS - Apr 2016
What's New in Predictive Analytics IBM SPSS - Apr 2016Edgar Alejandro Villegas
 
The Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology WhitepaperThe Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology WhitepaperEdgar Alejandro Villegas
 
SQL in Hadoop To Boldly Go Where no Data Warehouse Has Gone Before
SQL in Hadoop  To Boldly Go Where no Data Warehouse Has Gone BeforeSQL in Hadoop  To Boldly Go Where no Data Warehouse Has Gone Before
SQL in Hadoop To Boldly Go Where no Data Warehouse Has Gone BeforeEdgar Alejandro Villegas
 
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343Edgar Alejandro Villegas
 
Best Practices for Oracle Exadata and the Oracle Optimizer
Best Practices for Oracle Exadata and the Oracle OptimizerBest Practices for Oracle Exadata and the Oracle Optimizer
Best Practices for Oracle Exadata and the Oracle OptimizerEdgar Alejandro Villegas
 
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...
Best Practices –  Extreme Performance with Data Warehousing  on Oracle Databa...Best Practices –  Extreme Performance with Data Warehousing  on Oracle Databa...
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...Edgar Alejandro Villegas
 
Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Edgar Alejandro Villegas
 
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slides
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slidesFast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slides
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slidesEdgar Alejandro Villegas
 
BITGLASS - DATA BREACH DISCOVERY DATASHEET
BITGLASS - DATA BREACH DISCOVERY DATASHEETBITGLASS - DATA BREACH DISCOVERY DATASHEET
BITGLASS - DATA BREACH DISCOVERY DATASHEETEdgar Alejandro Villegas
 
Four Pillars of Business Analytics - e-book - Actuate
Four Pillars of Business Analytics - e-book - ActuateFour Pillars of Business Analytics - e-book - Actuate
Four Pillars of Business Analytics - e-book - ActuateEdgar Alejandro Villegas
 

Plus de Edgar Alejandro Villegas (20)

What's New in Predictive Analytics IBM SPSS - Apr 2016
What's New in Predictive Analytics IBM SPSS - Apr 2016What's New in Predictive Analytics IBM SPSS - Apr 2016
What's New in Predictive Analytics IBM SPSS - Apr 2016
 
Oracle big data discovery 994294
Oracle big data discovery   994294Oracle big data discovery   994294
Oracle big data discovery 994294
 
Actian Ingres10.2 Datasheet
Actian Ingres10.2 DatasheetActian Ingres10.2 Datasheet
Actian Ingres10.2 Datasheet
 
Actian Matrix Datasheet
Actian Matrix DatasheetActian Matrix Datasheet
Actian Matrix Datasheet
 
Actian Matrix Whitepaper
 Actian Matrix Whitepaper Actian Matrix Whitepaper
Actian Matrix Whitepaper
 
Actian Vector Whitepaper
 Actian Vector Whitepaper Actian Vector Whitepaper
Actian Vector Whitepaper
 
Actian DataFlow Whitepaper
Actian DataFlow WhitepaperActian DataFlow Whitepaper
Actian DataFlow Whitepaper
 
The Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology WhitepaperThe Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology Whitepaper
 
SQL in Hadoop To Boldly Go Where no Data Warehouse Has Gone Before
SQL in Hadoop  To Boldly Go Where no Data Warehouse Has Gone BeforeSQL in Hadoop  To Boldly Go Where no Data Warehouse Has Gone Before
SQL in Hadoop To Boldly Go Where no Data Warehouse Has Gone Before
 
Realtime analytics with_hadoop
Realtime analytics with_hadoopRealtime analytics with_hadoop
Realtime analytics with_hadoop
 
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 
Big Data SurVey - IOUG - 2013 - 594292
Big Data SurVey - IOUG - 2013 - 594292Big Data SurVey - IOUG - 2013 - 594292
Big Data SurVey - IOUG - 2013 - 594292
 
Best Practices for Oracle Exadata and the Oracle Optimizer
Best Practices for Oracle Exadata and the Oracle OptimizerBest Practices for Oracle Exadata and the Oracle Optimizer
Best Practices for Oracle Exadata and the Oracle Optimizer
 
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...
Best Practices –  Extreme Performance with Data Warehousing  on Oracle Databa...Best Practices –  Extreme Performance with Data Warehousing  on Oracle Databa...
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...
 
Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869
 
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slides
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slidesFast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slides
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slides
 
BITGLASS - DATA BREACH DISCOVERY DATASHEET
BITGLASS - DATA BREACH DISCOVERY DATASHEETBITGLASS - DATA BREACH DISCOVERY DATASHEET
BITGLASS - DATA BREACH DISCOVERY DATASHEET
 
Four Pillars of Business Analytics - e-book - Actuate
Four Pillars of Business Analytics - e-book - ActuateFour Pillars of Business Analytics - e-book - Actuate
Four Pillars of Business Analytics - e-book - Actuate
 
Sas hpa-va-bda-exadata-2389280
Sas hpa-va-bda-exadata-2389280Sas hpa-va-bda-exadata-2389280
Sas hpa-va-bda-exadata-2389280
 

Dernier

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 

Dernier (20)

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 

Splice machine-bloor-webinar-data-lakes

  • 1. Ge#ng  Started  with  Hadoop:   Opera4onal  Data  Lake   Rich  Reimer   VP,  Product  Management   rreimer@splicemachine.com    
  • 2. 2   The  Big  Squeeze   Data  growing  much  faster  than  IT  budgets   Source:  2013  IBM  Briefing  Book   Source:  Gartner,  Worldwide  IT,     Spending  forecast,  3Q13  Update  
  • 3. Tradi4onal  RDBMSs  Giants  Overwhelmed…   Scale-­‐up  becoming  cost-­‐prohibi:ve   Splice  Machine  |  Proprietary  &  Confiden4al  
  • 4. 4   Scale-­‐Out:  The  Future  of  Databases   Drama:c  improvement  in  price/performance     Scale  Up   (Increase  server  size)   Scale  Out   (More  small  servers)   vs.   $ $ $ $ $ $
  • 5. 5   What  is  a  Data  Lake?   •  Scale-­‐out  technology   based  on  Hadoop   •  Data  stored  in  na4ve   formats  
  • 6. 6   Schema  on  Ingest  vs.  Schema  on  Read   §  Even  “schemaless”  MongoDB  requires  “schema”   -  10  Things  You  Should  Know  About  Running  MongoDB  At  Scale   •  By  Asya  Kamsky,  Principal  Solu4ons  Architect  at  MongoDB   •  Item  #1  –  “have  a  good  schema  and  indexing  strategy”   Schema on Ingest Schema on Read •  Schema on Read if you only use data a few times a year •  Structured data should always remain structured •  Add schema if data used regularly Data Stream Application
  • 7. 7   Who  Are  We?   THE  ONLY   HADOOP  RDBMS   Replace  your  old  RDBMS   with  a  scale-­‐out  SQL  database   Affordable,  Scale-­‐Out   ACID  Transac4ons   No  Applica4on  Rewrites   10x     Bemer     Price/Perf    
  • 8. 8   Reference  Architecture:  Opera4onal  Data  Lake   Offload  real-­‐:me  repor:ng  and  analy:cs  from  expensive  OLTP  and  DW  systems   OLTP Systems Ad Hoc Analytics Operational Data Lake Executive Business Reports Operational Reports & Analytics ERP CRM Supply Chain HR … Data Warehouse Datamart Stream or Batch Updates ETL Real-Time, Event-Driven Apps
  • 9. Streamlining  the  Structured  Data  Pipeline  in  Hadoop   9   Source Systems ERP … CRM Sqoop Apply Inferred Schema Stored as flat files SQL Query Engines BI Tools Tradi=onal  Hadoop  Pipeline   vs.   Source Systems ERP … CRM Existing ETL Tool Stored in same schema BI Tools Streamlined  Hadoop  Pipeline   Advantages   •  Reduced  opera4onal  costs   with  less  complexity   •  Reduced  processing  4me  and   errors  with  fewer  transla4ons   •  Real-­‐4me  updates  for  data   cleansing   •  Bemer  SQL  support  
  • 10. 10   Streamlining  and  Hardening  the  ETL  Processing  Pipeline   Gracefully  handle  data  quality  issues  and  failed  queries  without  full  data  reloads     Issue   Hadoop  Issues   Splice  Machine  Solu=on   Handle  Data   Quality  Issues   (e.g.,  duplicates)   Hours  to  correct   ✗  Run  slow  MapReduce  job  to  de-­‐dupe   ✗  Reload  en4re  data  set  (hours)   Seconds  to  correct   ✓ Insert  fails  due  to  constraint  viola4on   ✓ Rollback  flawed  updates  if  necessary   ✓ Reject,  replace,  or  merge  duplicates  with  incremental   update  (ms  to  sec)   Update/Delete   Data   Hours  to  correct   ✗  Reload  en4re  data  set  (hours)   ✗  Writers  block  readers   Seconds  to  correct   ✓ Correct  data  and  do  incremental  update  (ms  to  sec)   ✓ Consistent  view  of  data  even  with  many  concurrent  updates   ✓ Writers  don’t  block  readers   ETL  Failure   Hours  to  correct   ✗  Reload  en4re  data  set  (hours)   ✗  Miss  ETL  window,  leading  to  either  delayed   reports  or  stale  data   Seconds  to  correct   ✓ Rollback  failed  step   ✓ Retry  failed  step  and  con4nue   Fast  Query  Speeds   ✗  Results  typically  no  faster  than  seconds  because   data  stored  in  random  formats   ✗  MapReduce   ✓ Results  possible  in  milliseconds  because  data  stored  in   highly  op4mized  format   ✓ No  MapReduce  
  • 11. 11   Complemen4ng  Exis4ng  Hadoop-­‐Based  Data  Lakes   Op:mizing  storage  and  querying  of  structured  data  as  part  of  ELT  or  Hadoop  query  engines   OLTP Systems ERP CRM Supply Chain HR … SCHEMA ON INGEST: Streamlined, structured-to- structured integration Structured Data Unstructured Data 1   2   3   SCHEMA BEFORE READ: Repository for structured data or metadata from ELT process on unstructured data HCATALOG Pig SCHEMA ON READ: Ad-hoc Hadoop queries across structured and unstructured data
  • 12. Case  Study:  Opera4onal  Data  Lake   12  12   Overview       Computer  technology  corpora4on     Update  database  technology  for:     ODS  layer  replacement     ETL  processing  and  analysis  of  Omniture  data     Real-­‐4me  OLTP  for  Global  Tech  Support  app     Challenges     Oracle  and  Teradata  too  expensive  to  scale     Many  Oracle  queries  couldn’t  complete     Can  only  hold  7  days  worth  of  data  in  Oracle     Missing  ETL  window  with  current  Hadoop  data  lake     Solu5on  Diagram     (400TB)   OLTP Systems ERP CRM Supply Chain Benefits   75%  less  cost   with  commodity  scale  out   Incremental  ETL  processing   gracefully  handle  data  quality  issues   5x-­‐10x  faster   comple4ng  queries  on  which  Oracle  failed         ✔  
  • 13. 13   Reference  Architecture:  Unified  Customer  Profile   Improve  marke:ng  ROI  with  deeper  customer  intelligence  and  beKer  cross-­‐channel  coordina:on   Unified Customer Profile (aka DMP) Operational Reports for Campaign Performance Social Feeds Web/eCommerce Clickstreams WebsiteDatamart Stream or Batch Updates BI Tools Demand Side Platform (DSP) Ad Exchange 1st Party/ CRM Data 3rd Party Data (e.g., Axciom) Ad Perf. Data (e.g., Doubleclick) Email Mktg Data Call Center Data POS Data Email Marketing App Ad Hoc Audience Segmentation BI Tools
  • 14. 14   Campaign  Management:  Harte-­‐Hanks   Overview       Digital  marke4ng  services  provider     Unified  Customer  Profile     Real-­‐4me  campaign  management     Complex  OLTP  and  OLAP  environment     Challenges     Oracle  RAC  too  expensive  to  scale     Queries  too  slow  –  even  up  to  ½  hour     Ge#ng  worse  –  expect  30-­‐50%  data  growth     Looked  for  9  months  for  a  cost-­‐effec4ve  solu4on     Solu5on  Diagram     Ini5al  Results   ¼  cost   with  commodity  scale  out   3-­‐7x  faster   through  parallelized  queries   10-­‐20x  price/perf   with  no  applica4on,  BI  or  ETL  rewrites     Cross-Channel Campaigns Real-Time Personalization Real-Time Actions
  • 15. 15   Proven  Building  Blocks:  Hadoop  and  Derby   APACHE  DERBY     §   ANSI  SQL-­‐99  RDBMS   §   Java-­‐based   §   ODBC/JDBC  Compliant     APACHE  HBASE/HDFS   §  Auto-­‐sharding   §  Real-­‐4me  updates   §  Fault-­‐tolerance   §  Scalability  to  100s  of  PBs   §  Data  replica4on        
  • 16. Typical  Database  Workloads   16   Opera=onal   Applica=ons   Opera=onal   Repor=ng  &  Analy=cs   Ad-­‐Hoc  Analy=cs   Enterprise  Data   Warehouses   Typical   Databases   •  MySQL   •  Oracle   •  MongoDB   •  MySQL     •  Oracle   •  Greenplum   •  Paraccel   •  Netezza   •  Teradata   •  Oracle   •  Sybase  IQ   Use  Cases   •  OLTP  -­‐  ERP,  CRM   •  Websites   •  Opera4onal   Datastores   •  Exploratory  Analy4cs   •  Data  Mining   •  Enterprise  Repor4ng   Typical  Users   •  Customers   •  Opera4onal   Employees   •  Opera4onal   Employees   •  Analysts   •  Data  Scien4sts   •  Managers   •  Execu4ves   Workload   Strengths   •  High  concurrency  of   small  reads/  writes   •  Range  queries   •  Parameterized   reports  against  real-­‐ 4me  data   •  Range  queries   •  Complex  queries   requiring  full  table   scans   •  Parameterized   reports  against   historical  data  
  • 17. 17   Internet  of  Things   Opera4onal  Data  Lake  Digital  Marke4ng   Personalized     Medicine   Use  Cases   Splice  Machine  |  Proprietary  &  Confiden4al   Fraud  Detec4on  
  • 18. 18   Opera4onal  Data  Lake:  Great  On-­‐Ramp  to  Big  Data     §  Clear  Business  Value  Now   §  Replace  obsolete  Opera4onal  Data  Stores  (ODSs)   §  Exis4ng  use  cases  –  not  just  a  science  project   §  Hadoop  RDBMS  –  inexpensive  to  store  data   §  Incremental  On-­‐Ramp  to  Big  Data   §  Start  with  structured  data  and  then  expand  to   unstructured   §  Add  schema  when  needed  
  • 19. Ge#ng  Started  with  Hadoop:   Opera4onal  Data  Lake   Rich  Reimer   VP,  Product  Management   rreimer@splicemachine.com