SlideShare a Scribd company logo
1 of 43
The Modern Analytics
Architecture
Making Big Data UsefulJoseph D’Antoni, Solutions Architect
Anexinet
May 7-9, 2014 | San Jose, CA
Please silence
cell phones
Joey D’Antoni
Joey has over 15 years of experience with a wide variety of data platforms, in both
Fortune 50 companies as well as smaller organizations
He is a frequent speaker on database administration, big data, and career
management
He is the co-president of the Philadelphia SQL Server User’s Group
He wants you to make sure you can restore your data
Agenda
• Data Warehouses—how did we get here?
• Big Data—Hadoop and more
• Modern Analytic Tools
• Building Our New Architecture
4
Data Warehouses—A History
• Data Warehousing had it origins in
the 1970s—A.C. Nielsen provided
clients with data marts
• In 1988—Bill Inmon (IBM) published
“An Architecture for a Business
Information System”
• In 1996—Ralph Kimball published
“The Data Warehouse Toolkit” which
showcased models for OLAP style
modelling
5
Data Warehouse Models
• Star Schema
• Advantage is that the DW is easier
to use
• Facts and dimensions allow queries
to perform faster
• Loading and ETL become more
complicated
• Structure changes are very
expensive
Dimensional Model
6
Data Warehouse Model
• Tables are grouped by subject area
(consumer, finance, products)
• Tables are linked by joins
• Very easy to add information into
the database
• Queries are harder to write, and
joins can be very expensive
performance wise
Normalization
7
Data Warehousing Challenges
Data Quality
ETL
Performance and Scalability
Costs—Licensing and
Hardware
8
Data Quality
9
Extract, Transform, Load (ETL) Process
10
Some Database Business
Doesn’t Care
About
Process
Your
Some
Credit—Buck Woody, Microsoft
Performance and Scalability
Given the volume of data,
DW queries can be very
slow
We use techniques like
data compression to make
them faster
CPU was older problem—
now tends to be storage
11
Costs
Data Warehouses need large
servers
Database systems are
licensed by the size of the
server (core)
Data Warehouses need a
whole lot fast storage
Large volumes of fast storage
(SANs) are expensive
12
Traditional Solutions
13
Classic Data Analysis
Data Warehouse &
BI Solutions
ETL
…Uses Just a Subset
Common Technical Themes
There are a lot of “big data” solutions, but most of
have a lot of things in common
• Built in HA/DR through multiple copies of the data
• Designed for analytics processing more than OLTP
• Derived from Open Source solutions
• Designed around local storage and commodity hardware
Components Of Modern Architecture
Hadoop
• (And it’s ecosystem)
EDW
Analytics Engine
Visualization Engine
Big Data Workflow for Combined Data and Analytics
Data Acquire Organize Analyze Decide
StructuredSemi-StructuredUn-Structured
Master and
Reference
Transactions
Machine
Generated
(Logs)
Web
Text, Image,
Audio, Video
DBMS (OLTP)
Files
NoSQL
(Key Value
Data Store)
HDFS
ETL/ELT
Change Data
Capture
Real-Time
Message-
Based
Hadoop MR
ODS
Data
Warehouse
Streaming
(CEP Engine)
In-
Database
Analytics
Analytics
• Reporting and
dashboards
• Alerting and
recommendations
• EPM, Social Apps
• Text analytics and
search
• Advanced
analytics
• Interactive
discovery
Hardware
Big Data
Cluster
High
Speed
Network
RDBMS
Cluster
In-
Memory
Analytics
Source—Gartner,
Credit Suisse, 8/12
Are We Leaving the RDBMS?
CPUs
19
Hadoop
Project Starts
Exadata
Launched
Costs—Big Data versus Data
Warehouse
20
$-
$50,000.00
$100,000.00
$150,000.00
$200,000.00
$250,000.00
$300,000.00
$350,000.00
Server Storage Licensing Total
Hadoop and Data Warehouse Costs
Hadoop Data Warehouse
• For same costs you build a
15-node Hadoop cluster
• The Hadoop cluster would
have 3840 GB of RAM
versus the 1024 in the DW
sever
Enter the Yellow Elephant
21
Hadoop
Hadoop is the leading Big Data platform
(eco-system)
Invented by Yahoo
• Scales Horizontally (2 socket x86 servers in
massive clusters)
• Uses big, slow, local storage
• Extremely fault-tolerant
• In a nutshell—it’s a Distributed File System (3
copies of data in cluster) and a programming
framework called MapReduce
Introducing Hadoop
23
Host 1
Name Node
Host 3
Data Node
Host 5
Data Node
Host 2
Secondary
Name Node
Host 4
Data Node
Host 6
Data Node
How Map Reduce Works
24
• Automatic
parallelism
• Fault tolerance
Map Phase
Input File: foo.log
HDFS
Block 1
HDFS
Block 19
HDFS
Block 105
1) Read splits
into records
Split 1
K:0 V…
Map
Task 1
K:INFO
V…
Split 2
K:123
V…
Map
Task 2
K:INFO V:1
K:WARN V:1
Split 3
K:332 V…
K:368 V…
Map
Task 3
K:Debug
V:1
K:INFO V:1
2) Run Map
3) Write and
Sort Output
Hadoop Ecosystem
HDFS
MapReduce
Note: This is only a
subset of ecosystem!
YARN
Spark and Shark
• Hadoop 2
Enhancements
• Spark is in-memory
• Shark integrates Spark
with Hive
28
Hadoop Architectural Decisions
• Distribution
• Components
• Support
• Cloud vs On-Premises
Choosing Your Hadoop Distribution
Hadoop Vendors
Technology Vendor Description
Hadoop Distributions Apache Completely open source
software for distributed
clusters and map/reduce
Cloudera Industry leading commercial
distribution, good
management tools
Hortonworks Open source distribution—
Apache compatible
MapR Multiple enhancements to
Apache Hadoop (rewrite of
HDFS), high performance,
enterprise ready
Pivotal HD EMC spinoff with strong
financial backing, this is full
high performance RDBMS
(with BI connectors) on top of
Hadoop
Cloud vs On-Premises
32
• Short Term Use
• Rapid Scale
• Test Use Cases
• Pay as you go
• Internet data
source
• Large long term
implementations
• Well known workloads
• Shared clusters
• Large initial investment
On-Premises
Analytics Engine
33
Analytics
Hadoop is was
not fast
Full scans of files
So How Do We
Rapidly Analyze
Data?
34
Columnar Databases
Microsoft SQL Server (2012
& 2014)
PDW
HP Vertica
HBase
ParAccel
InfiniDB
EMC Greenplum
35
In-Memory Databases
SQL Server 2014
SAP Hana
Oracle Times Ten
VoltDB
Apache Spark
36
Analytics Tools Past and Present
37
38
Data Visualization
Tools for Data Visualization
Excel (Power View and Power
Map)
Tableau
Qlik
Platfora
Pentaho
Bringing This All Together
Power Query (Excel)
40
Some Database Business
Doesn’t Care
About
Process
Your
Some
Q & A ?
Session Evaluations
Submit by 5pmFriday May
9 to WIN prizes
Your feedback is
important and valuable.
ways to access
Go to
passbac2014/evals
Download the PASS EVENT
App from your App Store
and search: PASS BAC
2014
Follow the QR code link
displayed on session
signage throughout the
conference venue and in
the program guide
for attending this session and
the PASS Business Analytics
Conference 2014
Thank
You
May 7-9, 2014 | San Jose, CA

More Related Content

What's hot

Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher   Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher Tamir Dresher
 
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Dataconomy Media
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on HadoopTyler Mitchell
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architectureSudheer Kondla
 
Architecture of Big Data Solutions
Architecture of Big Data SolutionsArchitecture of Big Data Solutions
Architecture of Big Data SolutionsGuido Schmutz
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design PatternsJohn Yeung
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big DataInfochimps, a CSC Big Data Business
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016StampedeCon
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data SolutionJames Serra
 
Building big data solutions on azure
Building big data solutions on azureBuilding big data solutions on azure
Building big data solutions on azureEyal Ben Ivri
 
Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Jordan Chung
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHumza Naseer
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMark Kromer
 
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesDataWorks Summit
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...Mark Rittman
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 
Strata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationStrata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationZaloni
 
Cloud Storage Spring Cleaning: A Treasure Hunt
Cloud Storage Spring Cleaning: A Treasure HuntCloud Storage Spring Cleaning: A Treasure Hunt
Cloud Storage Spring Cleaning: A Treasure HuntSteven Moy
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsDataWorks Summit
 

What's hot (20)

Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher   Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher
 
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on Hadoop
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
Architecture of Big Data Solutions
Architecture of Big Data SolutionsArchitecture of Big Data Solutions
Architecture of Big Data Solutions
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design Patterns
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
 
Building big data solutions on azure
Building big data solutions on azureBuilding big data solutions on azure
Building big data solutions on azure
 
Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Big-Data Server Farm Architecture
Big-Data Server Farm Architecture
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing Architectures
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data Architectures
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Strata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationStrata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma Presentation
 
Cloud Storage Spring Cleaning: A Treasure Hunt
Cloud Storage Spring Cleaning: A Treasure HuntCloud Storage Spring Cleaning: A Treasure Hunt
Cloud Storage Spring Cleaning: A Treasure Hunt
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
 

Viewers also liked

The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...Revolution Analytics
 
A big-data architecture for real-time analytics
A big-data architecture for real-time analyticsA big-data architecture for real-time analytics
A big-data architecture for real-time analyticsramikaurraminder
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural PatternsAmazon Web Services
 
Top Agile Metrics
Top Agile MetricsTop Agile Metrics
Top Agile MetricsXBOSoft
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Brian O'Neill
 
Real time data analytics - part 1 - backend infrastructure
Real time data analytics - part 1 - backend infrastructureReal time data analytics - part 1 - backend infrastructure
Real time data analytics - part 1 - backend infrastructureAmazon Web Services
 
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)Neil Andrassy
 
Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture Tin Ho
 
Large scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudLarge scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudDataWorks Summit
 
Hadoop and Spark Analytics over Better Storage
Hadoop and Spark Analytics over Better StorageHadoop and Spark Analytics over Better Storage
Hadoop and Spark Analytics over Better StorageSandeep Patil
 
Real time analytics @ netflix
Real time analytics @ netflixReal time analytics @ netflix
Real time analytics @ netflixCody Rioux
 
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...DB Tsai
 
2014 spark with elastic search
2014   spark with elastic search2014   spark with elastic search
2014 spark with elastic searchHenry Saputra
 
ElasticSearch on AWS
ElasticSearch on AWSElasticSearch on AWS
ElasticSearch on AWSPhilipp Garbe
 
Building an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using SparkBuilding an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using SparkItai Yaffe
 
Nested and Parent/Child Docs in ElasticSearch
Nested and Parent/Child Docs in ElasticSearchNested and Parent/Child Docs in ElasticSearch
Nested and Parent/Child Docs in ElasticSearchBeyondTrees
 
Real time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and ElasticsearchReal time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and ElasticsearchAbhishek Andhavarapu
 
Real Time Fuzzy Matching with Spark and Elastic Search-(Sonal Goyal, Nube)
Real Time Fuzzy Matching with Spark and Elastic Search-(Sonal Goyal, Nube)Real Time Fuzzy Matching with Spark and Elastic Search-(Sonal Goyal, Nube)
Real Time Fuzzy Matching with Spark and Elastic Search-(Sonal Goyal, Nube)Spark Summit
 
Real Time search using Spark and Elasticsearch
Real Time search using Spark and ElasticsearchReal Time search using Spark and Elasticsearch
Real Time search using Spark and ElasticsearchSigmoid
 

Viewers also liked (20)

The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
 
A big-data architecture for real-time analytics
A big-data architecture for real-time analyticsA big-data architecture for real-time analytics
A big-data architecture for real-time analytics
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
Top Agile Metrics
Top Agile MetricsTop Agile Metrics
Top Agile Metrics
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
 
Real time data analytics - part 1 - backend infrastructure
Real time data analytics - part 1 - backend infrastructureReal time data analytics - part 1 - backend infrastructure
Real time data analytics - part 1 - backend infrastructure
 
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
 
963
963963
963
 
Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture
 
Large scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudLarge scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloud
 
Hadoop and Spark Analytics over Better Storage
Hadoop and Spark Analytics over Better StorageHadoop and Spark Analytics over Better Storage
Hadoop and Spark Analytics over Better Storage
 
Real time analytics @ netflix
Real time analytics @ netflixReal time analytics @ netflix
Real time analytics @ netflix
 
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
 
2014 spark with elastic search
2014   spark with elastic search2014   spark with elastic search
2014 spark with elastic search
 
ElasticSearch on AWS
ElasticSearch on AWSElasticSearch on AWS
ElasticSearch on AWS
 
Building an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using SparkBuilding an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using Spark
 
Nested and Parent/Child Docs in ElasticSearch
Nested and Parent/Child Docs in ElasticSearchNested and Parent/Child Docs in ElasticSearch
Nested and Parent/Child Docs in ElasticSearch
 
Real time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and ElasticsearchReal time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and Elasticsearch
 
Real Time Fuzzy Matching with Spark and Elastic Search-(Sonal Goyal, Nube)
Real Time Fuzzy Matching with Spark and Elastic Search-(Sonal Goyal, Nube)Real Time Fuzzy Matching with Spark and Elastic Search-(Sonal Goyal, Nube)
Real Time Fuzzy Matching with Spark and Elastic Search-(Sonal Goyal, Nube)
 
Real Time search using Spark and Elasticsearch
Real Time search using Spark and ElasticsearchReal Time search using Spark and Elasticsearch
Real Time search using Spark and Elasticsearch
 

Similar to The modern analytics architecture

Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonDremio Corporation
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the OrganizationSeeling Cheung
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...Rittman Analytics
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3xKinAnx
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016MLconf
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraCloudera, Inc.
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraMongoDB
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin BĂ©m
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationAdaryl "Bob" Wakefield, MBA
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvewKunal Khanna
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overviewRohit Jain
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit MumbaiAnand Haridass
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurgeRTTS
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceHortonworks
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSatish Mohan
 
The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)Cloudera, Inc.
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences
 

Similar to The modern analytics architecture (20)

Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management Orchestra
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management Orchestra
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overview
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 

More from Joseph D'Antoni

DBA Fundamentals VC
DBA Fundamentals VCDBA Fundamentals VC
DBA Fundamentals VCJoseph D'Antoni
 
Building perfect sql servers, every time -oops
Building perfect sql servers, every time -oopsBuilding perfect sql servers, every time -oops
Building perfect sql servers, every time -oopsJoseph D'Antoni
 
Pass 2013 dantoni azure a gs
Pass 2013 dantoni azure a gsPass 2013 dantoni azure a gs
Pass 2013 dantoni azure a gsJoseph D'Antoni
 
Accelerating Database Performance Using Compression
Accelerating Database Performance Using CompressionAccelerating Database Performance Using Compression
Accelerating Database Performance Using CompressionJoseph D'Antoni
 
Sql server 2012 ha and dr sql saturday boston
Sql server 2012 ha and dr sql saturday bostonSql server 2012 ha and dr sql saturday boston
Sql server 2012 ha and dr sql saturday bostonJoseph D'Antoni
 
Accelerating Database Performance with Compression
Accelerating Database Performance with CompressionAccelerating Database Performance with Compression
Accelerating Database Performance with CompressionJoseph D'Antoni
 
Sql Server 2012 HA and DR -- SQL Saturday Richmond
Sql Server 2012 HA and DR -- SQL Saturday RichmondSql Server 2012 HA and DR -- SQL Saturday Richmond
Sql Server 2012 HA and DR -- SQL Saturday RichmondJoseph D'Antoni
 
Sql server 2012 ha and dr sql saturday tampa
Sql server 2012 ha and dr sql saturday tampaSql server 2012 ha and dr sql saturday tampa
Sql server 2012 ha and dr sql saturday tampaJoseph D'Antoni
 
Windows server 2012 failover clustering new features
Windows server 2012 failover clustering new featuresWindows server 2012 failover clustering new features
Windows server 2012 failover clustering new featuresJoseph D'Antoni
 
Sql server 2012 ha and dr sql saturday dc
Sql server 2012 ha and dr sql saturday dcSql server 2012 ha and dr sql saturday dc
Sql server 2012 ha and dr sql saturday dcJoseph D'Antoni
 
San presentation nov 2012 central pa
San presentation nov 2012 central paSan presentation nov 2012 central pa
San presentation nov 2012 central paJoseph D'Antoni
 
Always on availability groups way too deep
Always on availability groups way too deepAlways on availability groups way too deep
Always on availability groups way too deepJoseph D'Antoni
 
South jersey sql virtualization
South jersey sql virtualizationSouth jersey sql virtualization
South jersey sql virtualizationJoseph D'Antoni
 
Virtualization for DBA
Virtualization for DBAVirtualization for DBA
Virtualization for DBAJoseph D'Antoni
 
Sql server 2012 ha dr 24_hop_final
Sql server 2012 ha dr 24_hop_finalSql server 2012 ha dr 24_hop_final
Sql server 2012 ha dr 24_hop_finalJoseph D'Antoni
 
Sql server 2012 ha dr 24_hop_final
Sql server 2012 ha dr 24_hop_finalSql server 2012 ha dr 24_hop_final
Sql server 2012 ha dr 24_hop_finalJoseph D'Antoni
 
Sql server 2012 ha dr nova
Sql server 2012 ha dr novaSql server 2012 ha dr nova
Sql server 2012 ha dr novaJoseph D'Antoni
 
Sql server 2012 ha dr
Sql server 2012 ha drSql server 2012 ha dr
Sql server 2012 ha drJoseph D'Antoni
 
Sql saturday powerpoint dc_san
Sql saturday powerpoint dc_sanSql saturday powerpoint dc_san
Sql saturday powerpoint dc_sanJoseph D'Antoni
 

More from Joseph D'Antoni (20)

DBA Fundamentals VC
DBA Fundamentals VCDBA Fundamentals VC
DBA Fundamentals VC
 
Building perfect sql servers, every time -oops
Building perfect sql servers, every time -oopsBuilding perfect sql servers, every time -oops
Building perfect sql servers, every time -oops
 
Pass 2013 dantoni azure a gs
Pass 2013 dantoni azure a gsPass 2013 dantoni azure a gs
Pass 2013 dantoni azure a gs
 
Accelerating Database Performance Using Compression
Accelerating Database Performance Using CompressionAccelerating Database Performance Using Compression
Accelerating Database Performance Using Compression
 
Pass bac jd_sm
Pass bac jd_smPass bac jd_sm
Pass bac jd_sm
 
Sql server 2012 ha and dr sql saturday boston
Sql server 2012 ha and dr sql saturday bostonSql server 2012 ha and dr sql saturday boston
Sql server 2012 ha and dr sql saturday boston
 
Accelerating Database Performance with Compression
Accelerating Database Performance with CompressionAccelerating Database Performance with Compression
Accelerating Database Performance with Compression
 
Sql Server 2012 HA and DR -- SQL Saturday Richmond
Sql Server 2012 HA and DR -- SQL Saturday RichmondSql Server 2012 HA and DR -- SQL Saturday Richmond
Sql Server 2012 HA and DR -- SQL Saturday Richmond
 
Sql server 2012 ha and dr sql saturday tampa
Sql server 2012 ha and dr sql saturday tampaSql server 2012 ha and dr sql saturday tampa
Sql server 2012 ha and dr sql saturday tampa
 
Windows server 2012 failover clustering new features
Windows server 2012 failover clustering new featuresWindows server 2012 failover clustering new features
Windows server 2012 failover clustering new features
 
Sql server 2012 ha and dr sql saturday dc
Sql server 2012 ha and dr sql saturday dcSql server 2012 ha and dr sql saturday dc
Sql server 2012 ha and dr sql saturday dc
 
San presentation nov 2012 central pa
San presentation nov 2012 central paSan presentation nov 2012 central pa
San presentation nov 2012 central pa
 
Always on availability groups way too deep
Always on availability groups way too deepAlways on availability groups way too deep
Always on availability groups way too deep
 
South jersey sql virtualization
South jersey sql virtualizationSouth jersey sql virtualization
South jersey sql virtualization
 
Virtualization for DBA
Virtualization for DBAVirtualization for DBA
Virtualization for DBA
 
Sql server 2012 ha dr 24_hop_final
Sql server 2012 ha dr 24_hop_finalSql server 2012 ha dr 24_hop_final
Sql server 2012 ha dr 24_hop_final
 
Sql server 2012 ha dr 24_hop_final
Sql server 2012 ha dr 24_hop_finalSql server 2012 ha dr 24_hop_final
Sql server 2012 ha dr 24_hop_final
 
Sql server 2012 ha dr nova
Sql server 2012 ha dr novaSql server 2012 ha dr nova
Sql server 2012 ha dr nova
 
Sql server 2012 ha dr
Sql server 2012 ha drSql server 2012 ha dr
Sql server 2012 ha dr
 
Sql saturday powerpoint dc_san
Sql saturday powerpoint dc_sanSql saturday powerpoint dc_san
Sql saturday powerpoint dc_san
 

Recently uploaded

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Recently uploaded (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

The modern analytics architecture

  • 1. The Modern Analytics Architecture Making Big Data UsefulJoseph D’Antoni, Solutions Architect Anexinet May 7-9, 2014 | San Jose, CA
  • 3. Joey D’Antoni Joey has over 15 years of experience with a wide variety of data platforms, in both Fortune 50 companies as well as smaller organizations He is a frequent speaker on database administration, big data, and career management He is the co-president of the Philadelphia SQL Server User’s Group He wants you to make sure you can restore your data
  • 4. Agenda • Data Warehouses—how did we get here? • Big Data—Hadoop and more • Modern Analytic Tools • Building Our New Architecture 4
  • 5. Data Warehouses—A History • Data Warehousing had it origins in the 1970s—A.C. Nielsen provided clients with data marts • In 1988—Bill Inmon (IBM) published “An Architecture for a Business Information System” • In 1996—Ralph Kimball published “The Data Warehouse Toolkit” which showcased models for OLAP style modelling 5
  • 6. Data Warehouse Models • Star Schema • Advantage is that the DW is easier to use • Facts and dimensions allow queries to perform faster • Loading and ETL become more complicated • Structure changes are very expensive Dimensional Model 6
  • 7. Data Warehouse Model • Tables are grouped by subject area (consumer, finance, products) • Tables are linked by joins • Very easy to add information into the database • Queries are harder to write, and joins can be very expensive performance wise Normalization 7
  • 8. Data Warehousing Challenges Data Quality ETL Performance and Scalability Costs—Licensing and Hardware 8
  • 10. Extract, Transform, Load (ETL) Process 10 Some Database Business Doesn’t Care About Process Your Some Credit—Buck Woody, Microsoft
  • 11. Performance and Scalability Given the volume of data, DW queries can be very slow We use techniques like data compression to make them faster CPU was older problem— now tends to be storage 11
  • 12. Costs Data Warehouses need large servers Database systems are licensed by the size of the server (core) Data Warehouses need a whole lot fast storage Large volumes of fast storage (SANs) are expensive 12
  • 14. Classic Data Analysis Data Warehouse & BI Solutions ETL …Uses Just a Subset
  • 15. Common Technical Themes There are a lot of “big data” solutions, but most of have a lot of things in common • Built in HA/DR through multiple copies of the data • Designed for analytics processing more than OLTP • Derived from Open Source solutions • Designed around local storage and commodity hardware
  • 16. Components Of Modern Architecture Hadoop • (And it’s ecosystem) EDW Analytics Engine Visualization Engine
  • 17. Big Data Workflow for Combined Data and Analytics Data Acquire Organize Analyze Decide StructuredSemi-StructuredUn-Structured Master and Reference Transactions Machine Generated (Logs) Web Text, Image, Audio, Video DBMS (OLTP) Files NoSQL (Key Value Data Store) HDFS ETL/ELT Change Data Capture Real-Time Message- Based Hadoop MR ODS Data Warehouse Streaming (CEP Engine) In- Database Analytics Analytics • Reporting and dashboards • Alerting and recommendations • EPM, Social Apps • Text analytics and search • Advanced analytics • Interactive discovery Hardware Big Data Cluster High Speed Network RDBMS Cluster In- Memory Analytics Source—Gartner, Credit Suisse, 8/12
  • 18. Are We Leaving the RDBMS?
  • 20. Costs—Big Data versus Data Warehouse 20 $- $50,000.00 $100,000.00 $150,000.00 $200,000.00 $250,000.00 $300,000.00 $350,000.00 Server Storage Licensing Total Hadoop and Data Warehouse Costs Hadoop Data Warehouse • For same costs you build a 15-node Hadoop cluster • The Hadoop cluster would have 3840 GB of RAM versus the 1024 in the DW sever
  • 21. Enter the Yellow Elephant 21
  • 22. Hadoop Hadoop is the leading Big Data platform (eco-system) Invented by Yahoo • Scales Horizontally (2 socket x86 servers in massive clusters) • Uses big, slow, local storage • Extremely fault-tolerant • In a nutshell—it’s a Distributed File System (3 copies of data in cluster) and a programming framework called MapReduce
  • 23. Introducing Hadoop 23 Host 1 Name Node Host 3 Data Node Host 5 Data Node Host 2 Secondary Name Node Host 4 Data Node Host 6 Data Node
  • 24. How Map Reduce Works 24 • Automatic parallelism • Fault tolerance
  • 25. Map Phase Input File: foo.log HDFS Block 1 HDFS Block 19 HDFS Block 105 1) Read splits into records Split 1 K:0 V… Map Task 1 K:INFO V… Split 2 K:123 V… Map Task 2 K:INFO V:1 K:WARN V:1 Split 3 K:332 V… K:368 V… Map Task 3 K:Debug V:1 K:INFO V:1 2) Run Map 3) Write and Sort Output
  • 26. Hadoop Ecosystem HDFS MapReduce Note: This is only a subset of ecosystem!
  • 27. YARN
  • 28. Spark and Shark • Hadoop 2 Enhancements • Spark is in-memory • Shark integrates Spark with Hive 28
  • 29. Hadoop Architectural Decisions • Distribution • Components • Support • Cloud vs On-Premises
  • 30. Choosing Your Hadoop Distribution
  • 31. Hadoop Vendors Technology Vendor Description Hadoop Distributions Apache Completely open source software for distributed clusters and map/reduce Cloudera Industry leading commercial distribution, good management tools Hortonworks Open source distribution— Apache compatible MapR Multiple enhancements to Apache Hadoop (rewrite of HDFS), high performance, enterprise ready Pivotal HD EMC spinoff with strong financial backing, this is full high performance RDBMS (with BI connectors) on top of Hadoop
  • 32. Cloud vs On-Premises 32 • Short Term Use • Rapid Scale • Test Use Cases • Pay as you go • Internet data source • Large long term implementations • Well known workloads • Shared clusters • Large initial investment On-Premises
  • 34. Analytics Hadoop is was not fast Full scans of files So How Do We Rapidly Analyze Data? 34
  • 35. Columnar Databases Microsoft SQL Server (2012 & 2014) PDW HP Vertica HBase ParAccel InfiniDB EMC Greenplum 35
  • 36. In-Memory Databases SQL Server 2014 SAP Hana Oracle Times Ten VoltDB Apache Spark 36
  • 37. Analytics Tools Past and Present 37
  • 39. Tools for Data Visualization Excel (Power View and Power Map) Tableau Qlik Platfora Pentaho
  • 40. Bringing This All Together Power Query (Excel) 40 Some Database Business Doesn’t Care About Process Your Some
  • 41. Q & A ?
  • 42. Session Evaluations Submit by 5pmFriday May 9 to WIN prizes Your feedback is important and valuable. ways to access Go to passbac2014/evals Download the PASS EVENT App from your App Store and search: PASS BAC 2014 Follow the QR code link displayed on session signage throughout the conference venue and in the program guide
  • 43. for attending this session and the PASS Business Analytics Conference 2014 Thank You May 7-9, 2014 | San Jose, CA