SlideShare une entreprise Scribd logo
1  sur  26
BI on Big Data
Tomer Shiran - @tshiran
Co-Founder & CEO, Dremio
Strata + Hadoop World London, June 3, 2016
What are your options?
2 BI on Hadoop: What are your Options
Dremio Company Background
Jacques Nadeau
Founder & CTO
• Recognized SQL & NoSQL expert
• Founder of Apache Arrow & Drill
• Previously Quigo (AOL); Offermatica
(ADBE); aQuantive (MSFT)
Tomer Shiran
Founder & CEO
• Previously MapR (VP Product &
employee #5), MapR; Microsoft;
IBM Research
• Carnegie Mellon, Technion
Julien Le Dem
Architect
• Founder of Apache Parquet
• Apache Pig PMC Member
• Previously Twitter (Lead, Analytics
Data Pipeline); Yahoo! (Architect)
Top Silicon Valley VCs• Stealth data analytics startup
• Founded in 2015
• Led by experts in Big Data and open source including
the creators of Apache Arrow & Apache Parquet
3 BI on Hadoop: What are your Options
Recent changes to the BI landscape
Good ol’ Days
• Only a few databases (e.g.
Oracle, Teradata, SQL
Server)
• A few BI tools (MicroStrategy,
Cognos)
• Everything worked with
everything
• Things were easy!
Modern Reality
• Larger scale, less control and
less structure
• Lots of databases!
• Data Lake, not database
• HDFS: It’s a file system, folks!
• NoSQL: Let’s put the schema in
the application
• It can feel like the wild west!
4 BI on Hadoop: What are your Options
Major Approaches to BI on Big Data?
ETL to RDBMS
o “Make the new world look like the old world!”
o Load a transformed set of data into relational database
Monolithic (all-in-one) solutions
o Use BI tools that connect directly to Big Data
SQL-on-Big-Data
o Connect BI tools to a query engine sitting on top of Big Data
o Three main sub-categories
• Native SQL
• Batch SQL
• OLAP Cubes
5 BI on Hadoop: What are your Options
So how do we bring BI to Big Data?
Big Data
RDBMS
BI options
ETL tool
ETL to Data
Warehouse
Big Data
SQL
Engine
BI options
SQL-on-Big-DataBig Data
Monolithic tool with built-in BI
Monolithic All-in-one
Solutions
6 BI on Hadoop: What are your Options
ETL to RDBMS: Introduction
• ETL (Extract, Transform, and Load) a subset of the data into a
relational database
o Oracle, PostgreSQL, Teradata, Redshift, Vertica, …
• Connect any desired BI tool to the RDBMS
o Tableau, Qlik, …
• Two options:
o Commercial tools (Informatica, Talend, Pentaho,…)
o Custom development, scripts, etc.
Big Data
RDBMS
BI options
ETL tool
7 BI on Hadoop: What are your Options
ETL to RDBMS: Example
• Load web server logs from HDFS into RDBMS
• ETL software: Pentaho Data Integration (aka ‘Kettle’)
• RDBMS: MySQL
Connect ETL
to RDBMS
Add and Configure
Input/Output
Connect Input
and Output
Create and fill
RDBMS table
Connect BI tool
To RDBMS
0
50
100
150
200
250
April May June July
Source: http://wiki.pentaho.com/display/BAD/Extracting+Data+from+HDFS+to+Load+an+RDBMS
8 BI on Hadoop: What are your Options
ETL to RDBMS: Pros and Cons
Pros
• Relational databases and their BI integrations are very mature
• Use your favorite tools
o Tableau, Excel, R, …
Cons
• Traditional ETL tools don’t work well with modern data
o Changing schemas, complex or semi-structured data, …
o Hand-coded scripts are a common substitute
• Data freshness
o How often do you replicate/synchronize?
• Data resolution
o Can’t store all the raw data in the RDBMS (due to scalability and/or cost)
o Need to sample, aggregate or time-constrain the data
…and really, who wants to ETL?
9 BI on Hadoop: What are your Options
Monolithic (or All-in-One) Solutions: Introduction
• Single piece of software on top of Big Data
• Performs both data visualization (BI) and execution
• Utilize sampling or manual pre-aggregation to reduce
the data volume that the user is interacting with
• Examples:
o Datameer
o Platfora
o Zoomdata Big Data
Monolithic system
with built-in BI
Monolithic
Solutions
10 BI on Hadoop: What are your Options
Platfora Architecture Overview
• Constructs aggregates that are
loaded into an external database
o Aggregates provide fast
visualizations
o Aggregations must be created
before consumption
MapReduce/Spark
HDFS
Hadoop Cluster
Hadoop
Proprietary DB
Aggregates
Platfora Cluster
11 BI on Hadoop: What are your Options
Hadoop Cluster
Datameer Nodes
Datameer Architecture Overview
• Users interact with samples of the data
in an Excel-like interface
• Finished designs use the whole dataset
• Query router determines execution
engine based on data size
Single Node
Custom Execution
Tez MapReduce
Query Router
Sampling
Hadoop
HDFS
12 BI on Hadoop: What are your Options
Zoomdata Architecture Overview
• Queries on historical (ie, non-streaming)
data are split into many sampling queries
• This sampling provides a view of the data
that converges toward an accurate picture
o But adds load on the data source…
• Can handle streaming data sources
Stream Processing
Engine
Spark-based
cache
HDFS / MongoDB
Zoomdata Server
Incremental
Sampling
Streaming
Data
Source
Multiple
Data Clusters
13 BI on Hadoop: What are your Options
Monolithic Solutions: Pros and Cons
Pros
• Only one tool to learn and operate
• Easier than building and maintain ETL-to-RDBMS pipeline
• Integrated data preparation in some solutions
Cons
• Can’t analyze the raw data
o Rely on aggregation or sampling before primary analysis
• Can’t use your existing BI or analytics tools (Tableau, Qlik, R, …)
• Can’t run arbitrary SQL queries
14 BI on Hadoop: What are your Options
SQL-on-Big-Data: Introduction
• SQL queries against Big Data
o Hadoop
o NoSQL
• MongoDB, HBase, ...
o Cloud Storage
• S3, Azure Data Lake, GCS, …
• Use your existing BI tools
o Leverage standard ODBC/JDBC drivers
Tableau, Qlik, R, …
SQL Engine
Hadoop & NoSQL
15 BI on Hadoop: What are your Options
SQL-on-Big-Data: Introduction
Three major design philosophies:
• Native SQL
• Batch & Data Science SQL
• OLAP Cubes on Hadoop
16 BI on Hadoop: What are your Options
Native SQL
• Apache Drill
o Queries Hadoop, RDBMS, Files, NoSQL, Cloud (S3)
o Based on Apache Arrow
o Columnar in-memory execution
• Apache Impala (incubating)
o Utilizes the Hive metastore
o Focused on data in HDFS
• Presto
o Queries Hadoop, RDBMS, Files, NoSQL, Cloud (S3)
17 BI on Hadoop: What are your Options
Native SQL: Pros and Cons
Pros
• Highest performance for Big Data workloads
• Connect to Hadoop and also NoSQL systems
• Make Hadoop “look like a database”
Cons
• Queries may still be too slow for interactive analysis on many TB/PB
• Can’t defeat physics
18 BI on Hadoop: What are your Options
Batch & Data Science SQL
• Hive
o Enables SQL queries to be translated to
MapReduce/Tez
o Most commonly used for batch processing and ETL
workloads
• Spark SQL
o Provides a way to deliver SQL queries in Spark
programs (Scala/Java/Python)
o Excellent interleaving with data science work
19 BI on Hadoop: What are your Options
Batch & Data Science SQL: Pros and Cons
Pros
o Potentially simpler deployment (no daemons)
• New YARN job (MapReduce/Spark) for each query
o Check-pointing support enables very long-running queries
• Days to weeks (ETL work)
o Works well in tandem with machine learning (Spark)
Cons
o Latency prohibitive for for interactive analytics
• Tableau, Qlik Sense, …
o Slower than native SQL engines
20 BI on Hadoop: What are your Options
OLAP Cubes on Hadoop
• Kylin
o Hadoop-only
o Stores OLAP cubes in HBase
o Queries fail if not satisfied by cubes
o Open source
• AtScale
o Hadoop-only
o Leverages external SQL engine
• Hive, Impala, SparkSQL
o Collaborative cube creation
o Closed source
21 BI on Hadoop: What are your Options
OLAP Cubes on Hadoop: Pros and Cons
Pros
o Fast queries on pre-aggregated data
o Can use SQL and MDX tools
Cons
o Explicit cube definition/modeling phase
• Not “self-service”
• Frequent updates required due to dependency on business logic
o Aggregation create and maintenance can be long (and large)
o User connects to and interacts with the cube
• Can’t interact with the raw data
22 BI on Hadoop: What are your Options
SQL-on-Big-Data: Solution Comparison
Native SQL Batch & DS SQL OLAP Cubes
Technologies Drill, Impala, Presto Hive, Spark SQL Kylin, AtScale
Connectivity SQL and NoSQL SQL and NoSQL Hadoop-only
Primary Use Case Interactive
ETL or data-science
focused
Constrained
Interactive
Query Capability Raw data Raw data Aggregated data
Deployment Model
New daemons
collocated with existing
services
New MapReduce and/or
Spark job for each
query
Varies
23 BI on Hadoop: What are your Options
SQL-on-Big-Data: General Pros and Cons
Pros
• Continue using your favorite BI tools and SQL-based clients
o Tableau, Qlik, Power BI, Excel, R, SAS, …
• Technical analysts can write custom SQL queries
Cons
• Another layer in your data stack
• May need to pre-aggregate the data depending on your scale
• Need a separate data preparation tool (or custom scripts)
24 BI on Hadoop: What are your Options
Deciding what is right for you?
25 BI on Hadoop: What are your Options
ETL to
RDBMS
BI on Big data: Heuristic
Do you already have a favorite BI Tool
No
Is External Cluster Okay?
Does your schema change frequently?
No Yes
Yes
Platfora
Zoomdata
No
Yes
Do you want to be
able to write SQL
No
Datameer
No
Do you like Excel Metaphor?
Yes
Monolithic/All-in-one Solutions
No
Is your working data relatively small & static?
Yes
Yes
Yes
Do you have very predictable analysis needs?
OLAP Cubes
on Hadoop
No
Are you focused on interactive BI?
No
Do you need to query NoSQL?
No
Hive
Native SQL
Do you want to combine ML with SQL?
No Yes
SparkSQL
26 BI on Hadoop: What are your Options
Q&A
Tomer Shiran
tshiran@dremio.com
@tshiran
Reach out to learn what we’re up to at Dremio
(or to join the private beta…)

Contenu connexe

Tendances

Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...DataWorks Summit/Hadoop Summit
 
Using Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataUsing Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataDataWorks Summit/Hadoop Summit
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Michael Rys
 
Big Data on azure
Big Data on azureBig Data on azure
Big Data on azureDavid Giard
 
Jethro data meetup index base sql on hadoop - oct-2014
Jethro data meetup    index base sql on hadoop - oct-2014Jethro data meetup    index base sql on hadoop - oct-2014
Jethro data meetup index base sql on hadoop - oct-2014Eli Singer
 
Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...Spark Summit
 
Data lake – On Premise VS Cloud
Data lake – On Premise VS CloudData lake – On Premise VS Cloud
Data lake – On Premise VS CloudIdan Tohami
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 
Big data on Azure for Architects
Big data on Azure for ArchitectsBig data on Azure for Architects
Big data on Azure for ArchitectsTomasz Kopacz
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design PatternsJohn Yeung
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Con LA
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real WorldMark Kromer
 
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...Dipti Borkar
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWSGary Stafford
 

Tendances (20)

Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...
 
Using Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataUsing Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch data
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
 
Big Data with Azure
Big Data with AzureBig Data with Azure
Big Data with Azure
 
Big Data on azure
Big Data on azureBig Data on azure
Big Data on azure
 
The EDW Ecosystem
The EDW EcosystemThe EDW Ecosystem
The EDW Ecosystem
 
Jethro data meetup index base sql on hadoop - oct-2014
Jethro data meetup    index base sql on hadoop - oct-2014Jethro data meetup    index base sql on hadoop - oct-2014
Jethro data meetup index base sql on hadoop - oct-2014
 
Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Data lake – On Premise VS Cloud
Data lake – On Premise VS CloudData lake – On Premise VS Cloud
Data lake – On Premise VS Cloud
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Big data on Azure for Architects
Big data on Azure for ArchitectsBig data on Azure for Architects
Big data on Azure for Architects
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design Patterns
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure Databricks
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real World
 
Big Data Introduction
Big Data IntroductionBig Data Introduction
Big Data Introduction
 
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 

En vedette

Data Science Languages and Industry Analytics
Data Science Languages and Industry AnalyticsData Science Languages and Industry Analytics
Data Science Languages and Industry AnalyticsWes McKinney
 
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...Dremio Corporation
 
The twins that everyone loved too much
The twins that everyone loved too muchThe twins that everyone loved too much
The twins that everyone loved too muchJulian Hyde
 
Apache Calcite: One planner fits all
Apache Calcite: One planner fits allApache Calcite: One planner fits all
Apache Calcite: One planner fits allJulian Hyde
 
Building a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache ArrowBuilding a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache ArrowDremio Corporation
 
Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeDremio Corporation
 
SQL on everything, in memory
SQL on everything, in memorySQL on everything, in memory
SQL on everything, in memoryJulian Hyde
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Julian Hyde
 
Apache Calcite overview
Apache Calcite overviewApache Calcite overview
Apache Calcite overviewJulian Hyde
 

En vedette (10)

Data Science Languages and Industry Analytics
Data Science Languages and Industry AnalyticsData Science Languages and Industry Analytics
Data Science Languages and Industry Analytics
 
Apache Arrow - An Overview
Apache Arrow - An OverviewApache Arrow - An Overview
Apache Arrow - An Overview
 
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
 
The twins that everyone loved too much
The twins that everyone loved too muchThe twins that everyone loved too much
The twins that everyone loved too much
 
Apache Calcite: One planner fits all
Apache Calcite: One planner fits allApache Calcite: One planner fits all
Apache Calcite: One planner fits all
 
Building a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache ArrowBuilding a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache Arrow
 
Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In Practice
 
SQL on everything, in memory
SQL on everything, in memorySQL on everything, in memory
SQL on everything, in memory
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 
Apache Calcite overview
Apache Calcite overviewApache Calcite overview
Apache Calcite overview
 

Similaire à Bi on Big Data - Strata 2016 in London

New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...Rittman Analytics
 
The modern analytics architecture
The modern analytics architectureThe modern analytics architecture
The modern analytics architectureJoseph D'Antoni
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & HadoopBlackvard
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Rittman Analytics
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelUwe Printz
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Andrew Brust
 
Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)Thomas W. Dinsmore
 
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopBig Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopCaserta
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big DataAndrew Brust
 
Big Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R UsersBig Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R UsersAdaryl "Bob" Wakefield, MBA
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Andrew Brust
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences
 
Apache Tajo - An open source big data warehouse
Apache Tajo - An open source big data warehouseApache Tajo - An open source big data warehouse
Apache Tajo - An open source big data warehousehadoopsphere
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Dataconomy Media
 

Similaire à Bi on Big Data - Strata 2016 in London (20)

New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
 
The modern analytics architecture
The modern analytics architectureThe modern analytics architecture
The modern analytics architecture
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data Model
 
Apache drill
Apache drillApache drill
Apache drill
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
 
Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)
 
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopBig Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Big Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R UsersBig Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R Users
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
Apache Tajo - An open source big data warehouse
Apache Tajo - An open source big data warehouseApache Tajo - An open source big data warehouse
Apache Tajo - An open source big data warehouse
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 

Dernier

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 

Bi on Big Data - Strata 2016 in London

  • 1. BI on Big Data Tomer Shiran - @tshiran Co-Founder & CEO, Dremio Strata + Hadoop World London, June 3, 2016 What are your options?
  • 2. 2 BI on Hadoop: What are your Options Dremio Company Background Jacques Nadeau Founder & CTO • Recognized SQL & NoSQL expert • Founder of Apache Arrow & Drill • Previously Quigo (AOL); Offermatica (ADBE); aQuantive (MSFT) Tomer Shiran Founder & CEO • Previously MapR (VP Product & employee #5), MapR; Microsoft; IBM Research • Carnegie Mellon, Technion Julien Le Dem Architect • Founder of Apache Parquet • Apache Pig PMC Member • Previously Twitter (Lead, Analytics Data Pipeline); Yahoo! (Architect) Top Silicon Valley VCs• Stealth data analytics startup • Founded in 2015 • Led by experts in Big Data and open source including the creators of Apache Arrow & Apache Parquet
  • 3. 3 BI on Hadoop: What are your Options Recent changes to the BI landscape Good ol’ Days • Only a few databases (e.g. Oracle, Teradata, SQL Server) • A few BI tools (MicroStrategy, Cognos) • Everything worked with everything • Things were easy! Modern Reality • Larger scale, less control and less structure • Lots of databases! • Data Lake, not database • HDFS: It’s a file system, folks! • NoSQL: Let’s put the schema in the application • It can feel like the wild west!
  • 4. 4 BI on Hadoop: What are your Options Major Approaches to BI on Big Data? ETL to RDBMS o “Make the new world look like the old world!” o Load a transformed set of data into relational database Monolithic (all-in-one) solutions o Use BI tools that connect directly to Big Data SQL-on-Big-Data o Connect BI tools to a query engine sitting on top of Big Data o Three main sub-categories • Native SQL • Batch SQL • OLAP Cubes
  • 5. 5 BI on Hadoop: What are your Options So how do we bring BI to Big Data? Big Data RDBMS BI options ETL tool ETL to Data Warehouse Big Data SQL Engine BI options SQL-on-Big-DataBig Data Monolithic tool with built-in BI Monolithic All-in-one Solutions
  • 6. 6 BI on Hadoop: What are your Options ETL to RDBMS: Introduction • ETL (Extract, Transform, and Load) a subset of the data into a relational database o Oracle, PostgreSQL, Teradata, Redshift, Vertica, … • Connect any desired BI tool to the RDBMS o Tableau, Qlik, … • Two options: o Commercial tools (Informatica, Talend, Pentaho,…) o Custom development, scripts, etc. Big Data RDBMS BI options ETL tool
  • 7. 7 BI on Hadoop: What are your Options ETL to RDBMS: Example • Load web server logs from HDFS into RDBMS • ETL software: Pentaho Data Integration (aka ‘Kettle’) • RDBMS: MySQL Connect ETL to RDBMS Add and Configure Input/Output Connect Input and Output Create and fill RDBMS table Connect BI tool To RDBMS 0 50 100 150 200 250 April May June July Source: http://wiki.pentaho.com/display/BAD/Extracting+Data+from+HDFS+to+Load+an+RDBMS
  • 8. 8 BI on Hadoop: What are your Options ETL to RDBMS: Pros and Cons Pros • Relational databases and their BI integrations are very mature • Use your favorite tools o Tableau, Excel, R, … Cons • Traditional ETL tools don’t work well with modern data o Changing schemas, complex or semi-structured data, … o Hand-coded scripts are a common substitute • Data freshness o How often do you replicate/synchronize? • Data resolution o Can’t store all the raw data in the RDBMS (due to scalability and/or cost) o Need to sample, aggregate or time-constrain the data …and really, who wants to ETL?
  • 9. 9 BI on Hadoop: What are your Options Monolithic (or All-in-One) Solutions: Introduction • Single piece of software on top of Big Data • Performs both data visualization (BI) and execution • Utilize sampling or manual pre-aggregation to reduce the data volume that the user is interacting with • Examples: o Datameer o Platfora o Zoomdata Big Data Monolithic system with built-in BI Monolithic Solutions
  • 10. 10 BI on Hadoop: What are your Options Platfora Architecture Overview • Constructs aggregates that are loaded into an external database o Aggregates provide fast visualizations o Aggregations must be created before consumption MapReduce/Spark HDFS Hadoop Cluster Hadoop Proprietary DB Aggregates Platfora Cluster
  • 11. 11 BI on Hadoop: What are your Options Hadoop Cluster Datameer Nodes Datameer Architecture Overview • Users interact with samples of the data in an Excel-like interface • Finished designs use the whole dataset • Query router determines execution engine based on data size Single Node Custom Execution Tez MapReduce Query Router Sampling Hadoop HDFS
  • 12. 12 BI on Hadoop: What are your Options Zoomdata Architecture Overview • Queries on historical (ie, non-streaming) data are split into many sampling queries • This sampling provides a view of the data that converges toward an accurate picture o But adds load on the data source… • Can handle streaming data sources Stream Processing Engine Spark-based cache HDFS / MongoDB Zoomdata Server Incremental Sampling Streaming Data Source Multiple Data Clusters
  • 13. 13 BI on Hadoop: What are your Options Monolithic Solutions: Pros and Cons Pros • Only one tool to learn and operate • Easier than building and maintain ETL-to-RDBMS pipeline • Integrated data preparation in some solutions Cons • Can’t analyze the raw data o Rely on aggregation or sampling before primary analysis • Can’t use your existing BI or analytics tools (Tableau, Qlik, R, …) • Can’t run arbitrary SQL queries
  • 14. 14 BI on Hadoop: What are your Options SQL-on-Big-Data: Introduction • SQL queries against Big Data o Hadoop o NoSQL • MongoDB, HBase, ... o Cloud Storage • S3, Azure Data Lake, GCS, … • Use your existing BI tools o Leverage standard ODBC/JDBC drivers Tableau, Qlik, R, … SQL Engine Hadoop & NoSQL
  • 15. 15 BI on Hadoop: What are your Options SQL-on-Big-Data: Introduction Three major design philosophies: • Native SQL • Batch & Data Science SQL • OLAP Cubes on Hadoop
  • 16. 16 BI on Hadoop: What are your Options Native SQL • Apache Drill o Queries Hadoop, RDBMS, Files, NoSQL, Cloud (S3) o Based on Apache Arrow o Columnar in-memory execution • Apache Impala (incubating) o Utilizes the Hive metastore o Focused on data in HDFS • Presto o Queries Hadoop, RDBMS, Files, NoSQL, Cloud (S3)
  • 17. 17 BI on Hadoop: What are your Options Native SQL: Pros and Cons Pros • Highest performance for Big Data workloads • Connect to Hadoop and also NoSQL systems • Make Hadoop “look like a database” Cons • Queries may still be too slow for interactive analysis on many TB/PB • Can’t defeat physics
  • 18. 18 BI on Hadoop: What are your Options Batch & Data Science SQL • Hive o Enables SQL queries to be translated to MapReduce/Tez o Most commonly used for batch processing and ETL workloads • Spark SQL o Provides a way to deliver SQL queries in Spark programs (Scala/Java/Python) o Excellent interleaving with data science work
  • 19. 19 BI on Hadoop: What are your Options Batch & Data Science SQL: Pros and Cons Pros o Potentially simpler deployment (no daemons) • New YARN job (MapReduce/Spark) for each query o Check-pointing support enables very long-running queries • Days to weeks (ETL work) o Works well in tandem with machine learning (Spark) Cons o Latency prohibitive for for interactive analytics • Tableau, Qlik Sense, … o Slower than native SQL engines
  • 20. 20 BI on Hadoop: What are your Options OLAP Cubes on Hadoop • Kylin o Hadoop-only o Stores OLAP cubes in HBase o Queries fail if not satisfied by cubes o Open source • AtScale o Hadoop-only o Leverages external SQL engine • Hive, Impala, SparkSQL o Collaborative cube creation o Closed source
  • 21. 21 BI on Hadoop: What are your Options OLAP Cubes on Hadoop: Pros and Cons Pros o Fast queries on pre-aggregated data o Can use SQL and MDX tools Cons o Explicit cube definition/modeling phase • Not “self-service” • Frequent updates required due to dependency on business logic o Aggregation create and maintenance can be long (and large) o User connects to and interacts with the cube • Can’t interact with the raw data
  • 22. 22 BI on Hadoop: What are your Options SQL-on-Big-Data: Solution Comparison Native SQL Batch & DS SQL OLAP Cubes Technologies Drill, Impala, Presto Hive, Spark SQL Kylin, AtScale Connectivity SQL and NoSQL SQL and NoSQL Hadoop-only Primary Use Case Interactive ETL or data-science focused Constrained Interactive Query Capability Raw data Raw data Aggregated data Deployment Model New daemons collocated with existing services New MapReduce and/or Spark job for each query Varies
  • 23. 23 BI on Hadoop: What are your Options SQL-on-Big-Data: General Pros and Cons Pros • Continue using your favorite BI tools and SQL-based clients o Tableau, Qlik, Power BI, Excel, R, SAS, … • Technical analysts can write custom SQL queries Cons • Another layer in your data stack • May need to pre-aggregate the data depending on your scale • Need a separate data preparation tool (or custom scripts)
  • 24. 24 BI on Hadoop: What are your Options Deciding what is right for you?
  • 25. 25 BI on Hadoop: What are your Options ETL to RDBMS BI on Big data: Heuristic Do you already have a favorite BI Tool No Is External Cluster Okay? Does your schema change frequently? No Yes Yes Platfora Zoomdata No Yes Do you want to be able to write SQL No Datameer No Do you like Excel Metaphor? Yes Monolithic/All-in-one Solutions No Is your working data relatively small & static? Yes Yes Yes Do you have very predictable analysis needs? OLAP Cubes on Hadoop No Are you focused on interactive BI? No Do you need to query NoSQL? No Hive Native SQL Do you want to combine ML with SQL? No Yes SparkSQL
  • 26. 26 BI on Hadoop: What are your Options Q&A Tomer Shiran tshiran@dremio.com @tshiran Reach out to learn what we’re up to at Dremio (or to join the private beta…)