SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
SQL on everything, in memory 
Page ‹#› © Hortonworks Inc. 2014 
Julian Hyde 
Strata, NYC 
October 16th, 2014 
Apache 
Calcite
About me 
Julian Hyde 
Architect at Hortonworks 
Open source: 
• Founder & lead, Apache Calcite (query optimization framework) 
• Founder & lead, Pentaho Mondrian (analysis engine) 
• Committer, Apache Drill 
• Contributor, Apache Hive 
• Contributor, Cascading Lingual (SQL interface to Cascading) 
Past: 
• SQLstream (streaming SQL) 
• Broadbase (data warehouse) 
• Oracle (SQL kernel development) 
Page ‹#› © Hortonworks Inc. 2014
SQL: in and out of fashion 
1969 — CODASYL (network database) 
1979 — First commercial SQL RDBMSs 
1990 — Acceptance — transaction processing on SQL 
1993 — Multi-dimensional databases 
1996 — SQL EDWs 
2006 — Hadoop and other “big data” technologies 
2008 — NoSQL 
2011 — SQL on Hadoop 
2014 — Interactive analytics on {Hadoop, NoSQL, DBMS}, using SQL 
! 
SQL remains popular. 
But why? 
Page ‹#› © Hortonworks Inc. 2014
“SQL inside” 
Implementing SQL well is hard 
• System cannot just “run the query” as written 
• Require relational algebra, query planner (optimizer) & metadata 
…but it’s worth the effort 
! 
Algebra-based systems are more flexible 
• Add new algorithms (e.g. a better join) 
• Re-organize data 
• Choose access path based on statistics 
• Dumb queries (e.g. machine-generated) 
• Relational, schema-less, late-schema, non-relational (e.g. key-value, document) 
Page ‹#› © Hortonworks Inc. 2014
Apache Calcite 
Page ‹#› © Hortonworks Inc. 2014 
Apache 
Calcite
Apache Calcite 
Apache incubator project since May, 2014 
• Originally named Optiq 
Query planning framework 
• Relational algebra, rewrite rules, cost model 
• Extensible 
Packaging 
• Library (JDBC server optional) 
• Open source 
• Community-authored rules, adapters 
Adoption 
• Embedded: Lingual (SQL interface to Cascading), Apache Drill, Apache Hive, Kylin OLAP 
• Adapters: Splunk, Spark, MongoDB, JDBC, CSV, JSON, Web tables, In-memory data 
Page ‹#› © Hortonworks Inc. 2014
Conventional DB architecture 
Page ‹#› © Hortonworks Inc. 2014
Calcite architecture 
Page ‹#› © Hortonworks Inc. 2014
Demo 
{sqlline, apache-calcite-0.9.1, .csv} 
Page ‹#› © Hortonworks Inc. 2014
Expression tree 
Splunk 
Table: splunk 
MySQL 
Page ‹#› © Hortonworks Inc. 2014 
SELECT p.“product_name”, COUNT(*) AS c 
FROM “splunk”.”splunk” AS s 
JOIN “mysql”.”products” AS p 
ON s.”product_id” = p.”product_id” 
WHERE s.“action” = 'purchase' 
GROUP BY p.”product_name” 
ORDER BY c DESC 
Key: product_id 
join 
Key: product_name 
Agg: count 
group 
Condition: 
action = 
'purchase' 
filter 
Key: c DESC 
sort 
scan 
scan 
Table: products
Expression tree 
(optimized) 
Splunk 
Table: splunk 
Page ‹#› © Hortonworks Inc. 2014 
Key: product_id 
join 
Key: product_name 
Agg: count 
group 
Condition: 
action = 
'purchase' 
filter 
Key: c DESC 
sort 
scan 
MySQL 
scan 
Table: products 
SELECT p.“product_name”, COUNT(*) AS c 
FROM “splunk”.”splunk” AS s 
JOIN “mysql”.”products” AS p 
ON s.”product_id” = p.”product_id” 
WHERE s.“action” = 'purchase' 
GROUP BY p.”product_name” 
ORDER BY c DESC
Calcite – APIs and SPIs 
Page ‹#› © Hortonworks Inc. 2014 
Cost, statistics 
RelOptCost 
RelOptCostFactory 
RelMetadataProvider 
• RelMdColumnUniquensss 
• RelMdDistinctRowCount 
• RelMdSelectivity 
SQL parser 
SqlNode 
SqlParser 
SqlValidator 
Transformation rules 
RelOptRule 
• MergeFilterRule 
• PushAggregateThroughUnionRule 
• 100+ more 
Global transformations 
• Unification (materialized view) 
• Column trimming 
• De-correlation 
Relational algebra 
RelNode (operator) 
• TableScan 
• Filter 
• Project 
• Union 
• Aggregate 
• … 
RelDataType (type) 
RexNode (expression) 
RelTrait (physical property) 
• RelConvention (calling-convention) 
• RelCollation (sortedness) 
• TBD (bucketedness/distribution) JDBC driver 
Metadata 
Schema 
Table 
Function 
• TableFunction 
• TableMacro 
Lattice
Calcite Planning Process 
SqlNode ! 
RelNode + RexNode 
Page ‹#› © Hortonworks Inc. 2014 
SQL 
parse 
tree 
Planner 
RelNode 
Graph 
Sql-to-Rel Converter 
1. Plan Graph 
• Node for each node in 
Input Plan 
• Each node is a Set of 
alternate Sub Plans 
• Set further divided into 
Subsets: based on traits 
like sortedness 
2. Rules 
• Rule: specifies an Operator 
sub-graph to match and 
logic to generate equivalent 
‘better’ sub-graph 
• New and original sub-graph 
both remain in contention 
3. Cost Model 
• RelNodes have Cost & 
Cumulative Cost 
4. Metadata Providers 
- Used to plug in Schema, 
cost formulas 
- Filter selectivity 
- Join selectivity 
- NDV calculations 
Rule Match Queue 
- Add Rule matches to Queue 
- Apply Rule match 
transformations to plan graph 
- Iterate for fixed iterations or until 
cost doesn’t change 
- Match importance based on 
cost of RelNode and height 
Best RelNode Graph 
Translate to 
runtime 
Logical Plan 
Based on “Volcano” & “Cascades” papers [G. Graefe]
Demo 
{sqlline, apache-calcite-0.9.1, .csv, CsvPushProjectOntoTableRule} 
Page ‹#› © Hortonworks Inc. 2014
Analytics 
Page ‹#› © Hortonworks Inc. 2014
Mondrian OLAP (Saiku user interface)
Interactive queries on NoSQL 
Typical requirements 
NoSQL operational database (e.g. HBase, MongoDB, Cassandra) 
Analytic queries aggregate over full scan 
Speed-of-thought response (< 5 seconds first query, < 1 second second query) 
Data freshness (< 10 minutes) 
! 
Other requirements 
Hybrid system (e.g. Hive + HBase) 
Star schema 
Page ‹#› © Hortonworks Inc. 2014
Star schema 
Page ‹#› © Hortonworks Inc. 2014 
Sales Time Inventory 
Customer 
Warehouse 
Key 
Fact table 
Dimension table 
Many-to-one relationship 
Product 
Product 
class 
Promotion
Simple analytics problem? 
System 
100M US census records 
1KB each record, 100GB total 
4 SATA 3 disks, total read throughput 1.2GB/s 
! 
Requirement 
Count all records in < 5s 
Solution #1 
It’s not possible! It takes 80s just to read the data 
Solution #2 
Cheat! 
Page ‹#› © Hortonworks Inc. 2014
How to cheat 
Multiple tricks 
Compress data 
Column-oriented storage 
Store data in sorted order 
Put data in memory 
Cache previous results 
Pre-compute (materialize) aggregates 
! 
Common factors 
Make a copy of the data 
Organize it in a different way 
Optimizer chooses the most suitable data organization 
SQL query is unchanged 
Page ‹#› © Hortonworks Inc. 2014
Filter-join-aggregate query 
SELECT product.id, sum(sales.units), sum(sales.price), count(*) 
FROM sales … 
JOIN customer ON … 
JOIN time ON … 
JOIN product ON … 
JOIN product_class ON … 
WHERE time.year = 2014 
AND time.quarter = ‘Q1’ 
AND product.color = ‘Red’ 
GROUP BY … 
Page ‹#› © Hortonworks Inc. 2014 
Time 
Sales Inventory 
Product 
Customer 
Warehouse 
ProductClass
Materialized view, lattice, tile 
Materialized view 
A table whose contents are guaranteed to be the same as 
executing a given query. 
Lattice 
Recommends, builds, and recognizes summary 
materialized views (tiles) based on a star schema. 
A query defines the tables and many:1 relationships in the 
star schema. 
Tile 
A summary materialized view that belongs to a lattice. 
A tile may or may not be materialized. 
Materialization methods: 
• Declare in lattice 
• Generate via recommender algorithm 
• Created in response to query 
Page ‹#› © Hortonworks Inc. 2014 
(FAKE SYNTAX) 
CREATE MATERIALIZED VIEW t AS 
SELECT * FROM emps 
WHERE deptno = 10; 
CREATE LATTICE star AS 
SELECT * 
FROM sales_fact_1997 AS s 
JOIN product AS p ON … 
JOIN product_class AS pc ON … 
JOIN customer AS c ON … 
JOIN time_by_day AS t ON …; 
CREATE MATERIALIZED VIEW zg IN star 
SELECT gender, zipcode, 
COUNT(*), SUM(unit_sales) 
FROM star 
GROUP BY gender, zipcode;
Lattice () 1 
Page ‹#› © Hortonworks Inc. 2014 
> select count(*) as c, sum(unit_sales) as s 
> from star; 
+-----------+-----------+ 
| C | S | 
+-----------+-----------+ 
| 1,000,000 | 266,773.0 | 
+-----------+-----------+ 
1 row selected 
! 
> select * from star; 
1,000,000 rows selected 
raw 1m
Lattice - top tiles () 1 
(z) 43k (s) 50 (g) 2 (y) 5 (m) 12 
Page ‹#› © Hortonworks Inc. 2014 
raw 1m 
Key 
! 
z zipcode (43k) 
s state (50) 
g gender (2) 
y year (5) 
m month (12) 
> select zipcode, count(*) as c, 
> sum(unit_sales) as s 
> from star 
> group by zipcode; 
+---------+-----------+-----------+ 
| ZIPCODE | C | S | 
+---------+-----------+-----------+ 
| 10000 | 23 | 31.5 | 
… 
+---------+-----------+-----------+ 
43,000 rows selected 
> select state, count(*) as c, 
> sum(unit_sales) as s 
> from star 
> group by state; 
+-------+-----------+-----------+ 
| STATE | C | S | 
+-------+-----------+-----------+ 
| AL | 201,693 | 5,520.0 | 
… 
+-------+-----------+-----------+ 
50 rows selected
Lattice - more tiles () 1 
(z, s) 43.4k (g, y) 10 (y, m) 60 
Key 
! 
z zipcode (43k) 
s state (50) 
g gender (2) 
y year (5) 
m month (12) 
(z) 43k (s) 50 (g) 2 (y) 5 (m) 12 
Page ‹#› © Hortonworks Inc. 2014 
(z, s, g, y, 
m) 912k 
(s, g, y, m) 6k 
raw 1m 
(g, y, m) 120 
Fewer 
than you would 
expect, because 5m 
combinations cannot 
occur in 1m row 
table 
Fewer than you 
would expect, because 
state depends on 
zipcode
Lattice - complete () 1 
(z, s) 43.4k (y, m) 60 
Key 
! 
z zipcode (43k) 
s state (50) 
g gender (2) 
y year (5) 
m month (12) 
(z) 43k (s) 50 (g) 2 (y) 5 (m) 12 
Page ‹#› © Hortonworks Inc. 2014 
(z, s, g, y, 
m) 910k 
(s, g, y, m) 6k 
(z, g, y, m) 
909k 
(z, s, y, m) 
830k 
raw 1m 
(z, s, g, m) 
643k 
(z, s, g, y) 
391k 
(z, s, g) 87k 
(g, y) 10 
(g, y, m) 120 
(g, m) 24
Lattice - optimized () 1 
(z, s) 43.4k (y, m) 60 
Key 
! 
z zipcode (43k) 
s state (50) 
g gender (2) 
y year (5) 
m month (12) 
(z) 43k (s) 50 (g) 2 (y) 5 (m) 12 
Page ‹#› © Hortonworks Inc. 2014 
(z, s, g, y, 
m) 912k 
(s, g, y, m) 6k 
(z, g, y, m) 
909k 
(z, s, y, m) 
831k 
raw 1m 
(z, s, g, m) 
644k 
(z, s, g, y) 
392k 
(z, s, g) 87k 
(g, y) 10 
(g, y, m) 120 
(g, m) 24
Lattice - optimized () 1 
(z, s) 43.4k (y, m) 60 
Key 
! 
z zipcode (43k) 
s state (50) 
g gender (2) 
y year (5) 
m month (12) 
(z) 43k (s) 50 (g) 2 (y) 5 (m) 12 
Page ‹#› © Hortonworks Inc. 2014 
(z, s, g, y, 
m) 912k 
(s, g, y, m) 6k 
(z, g, y, m) 
909k 
(z, s, y, m) 
831k 
raw 1m 
(z, s, g, m) 
644k 
(z, s, g, y) 
392k 
(z, s, g) 87k 
(g, y) 10 
(g, y, m) 120 
(g, m) 24 
Aggregate Cost 
(rows) 
Benefit (query 
rows saved) 
% queries 
s, g, y, m 6k 497k 50% 
z, s, g 87k 304k 33% 
g, y 10 1.5k 25% 
g, m 24 1.5k 25% 
s, g 100 1.5k 25% 
y, m 60 1.5k 25%
Demo 
{mysql-foodmart-lattice-model.json} 
Page ‹#› © Hortonworks Inc. 2014
Tiled, in-memory materializations 
Page ‹#› © Hortonworks Inc. 2014 
Query: SELECT x, SUM(y) FROM t GROUP BY x 
In-memory 
materialized 
queries 
Tables 
on disk 
Where we’re going… smart, distributed memory cache & compute framework 
http://hortonworks.com/blog/dmmq/
Kylin OLAP engine 
Calcite 
used here 
Page ‹#› © Hortonworks Inc. 2014
Thank you! 
Apache 
Calcite 
@julianhyde 
http://calcite.incubator.apache.org 
http://www.kylin.io 
Page ‹#› © Hortonworks Inc. 2014

Contenu connexe

Tendances

Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteJulian Hyde
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeDatabricks
 
Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Databricks
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...Simplilearn
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in DeltaDatabricks
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsDatabricks
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Databricks
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsAnton Kirillov
 
Beyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesBeyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesDatabricks
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming JobsDatabricks
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...Chester Chen
 
Apache Calcite overview
Apache Calcite overviewApache Calcite overview
Apache Calcite overviewJulian Hyde
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Stamatis Zampetakis
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySparkRussell Jurney
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
 
Speed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with AlluxioSpeed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with AlluxioAlluxio, Inc.
 

Tendances (20)

Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
 
Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
 
Beyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesBeyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFrames
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
Apache Calcite overview
Apache Calcite overviewApache Calcite overview
Apache Calcite overview
 
Spark architecture
Spark architectureSpark architecture
Spark architecture
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySpark
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Speed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with AlluxioSpeed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with Alluxio
 

En vedette

Drill / SQL / Optiq
Drill / SQL / OptiqDrill / SQL / Optiq
Drill / SQL / OptiqJulian Hyde
 
Data Science Languages and Industry Analytics
Data Science Languages and Industry AnalyticsData Science Languages and Industry Analytics
Data Science Languages and Industry AnalyticsWes McKinney
 
Building a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache ArrowBuilding a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache ArrowDremio Corporation
 
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...Dremio Corporation
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonDremio Corporation
 
The twins that everyone loved too much
The twins that everyone loved too muchThe twins that everyone loved too much
The twins that everyone loved too muchJulian Hyde
 
Apache Calcite: One planner fits all
Apache Calcite: One planner fits allApache Calcite: One planner fits all
Apache Calcite: One planner fits allJulian Hyde
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketDremio Corporation
 
Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeDremio Corporation
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Julian Hyde
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheDremio Corporation
 

En vedette (12)

Drill / SQL / Optiq
Drill / SQL / OptiqDrill / SQL / Optiq
Drill / SQL / Optiq
 
Apache Arrow - An Overview
Apache Arrow - An OverviewApache Arrow - An Overview
Apache Arrow - An Overview
 
Data Science Languages and Industry Analytics
Data Science Languages and Industry AnalyticsData Science Languages and Industry Analytics
Data Science Languages and Industry Analytics
 
Building a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache ArrowBuilding a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache Arrow
 
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
 
The twins that everyone loved too much
The twins that everyone loved too muchThe twins that everyone loved too much
The twins that everyone loved too much
 
Apache Calcite: One planner fits all
Apache Calcite: One planner fits allApache Calcite: One planner fits all
Apache Calcite: One planner fits all
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
 
Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In Practice
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
 

Similaire à SQL on everything, in memory

Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Julian Hyde
 
Cost-based Query Optimization in Hive
Cost-based Query Optimization in HiveCost-based Query Optimization in Hive
Cost-based Query Optimization in HiveDataWorks Summit
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Julian Hyde
 
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Julian Hyde
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoopnvvrajesh
 
Migrating on premises workload to azure sql database
Migrating on premises workload to azure sql databaseMigrating on premises workload to azure sql database
Migrating on premises workload to azure sql databasePARIKSHIT SAVJANI
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaData Science Thailand
 
A Smarter Pig: Building a SQL interface to Pig using Apache Calcite
A Smarter Pig: Building a SQL interface to Pig using Apache CalciteA Smarter Pig: Building a SQL interface to Pig using Apache Calcite
A Smarter Pig: Building a SQL interface to Pig using Apache CalciteSalesforce Engineering
 
ASHviz - Dats visualization research experiments using ASH data
ASHviz - Dats visualization research experiments using ASH dataASHviz - Dats visualization research experiments using ASH data
ASHviz - Dats visualization research experiments using ASH dataJohn Beresniewicz
 
Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Lucidworks
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteChris Baynes
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveXu Jiang
 
Why you care about
 relational algebra (even though you didn’t know it)
Why you care about
 relational algebra (even though you didn’t know it)Why you care about
 relational algebra (even though you didn’t know it)
Why you care about
 relational algebra (even though you didn’t know it)Julian Hyde
 
BDM25 - Spark runtime internal
BDM25 - Spark runtime internalBDM25 - Spark runtime internal
BDM25 - Spark runtime internalDavid Lauzon
 
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...DataStax Academy
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Michael Rys
 

Similaire à SQL on everything, in memory (20)

Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14
 
Cost-based Query Optimization in Hive
Cost-based Query Optimization in HiveCost-based Query Optimization in Hive
Cost-based Query Optimization in Hive
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
 
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
 
Polyalgebra
PolyalgebraPolyalgebra
Polyalgebra
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Couchbas for dummies
Couchbas for dummiesCouchbas for dummies
Couchbas for dummies
 
Migrating on premises workload to azure sql database
Migrating on premises workload to azure sql databaseMigrating on premises workload to azure sql database
Migrating on premises workload to azure sql database
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data Sciencea
 
A Smarter Pig: Building a SQL interface to Pig using Apache Calcite
A Smarter Pig: Building a SQL interface to Pig using Apache CalciteA Smarter Pig: Building a SQL interface to Pig using Apache Calcite
A Smarter Pig: Building a SQL interface to Pig using Apache Calcite
 
ASHviz - Dats visualization research experiments using ASH data
ASHviz - Dats visualization research experiments using ASH dataASHviz - Dats visualization research experiments using ASH data
ASHviz - Dats visualization research experiments using ASH data
 
Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Webinar: What's New in Solr 6
Webinar: What's New in Solr 6
 
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
 
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
Why you care about
 relational algebra (even though you didn’t know it)
Why you care about
 relational algebra (even though you didn’t know it)Why you care about
 relational algebra (even though you didn’t know it)
Why you care about
 relational algebra (even though you didn’t know it)
 
BDM25 - Spark runtime internal
BDM25 - Spark runtime internalBDM25 - Spark runtime internal
BDM25 - Spark runtime internal
 
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
 

Plus de Julian Hyde

Building a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteBuilding a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteJulian Hyde
 
Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Julian Hyde
 
Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQLJulian Hyde
 
Morel, a data-parallel programming language
Morel, a data-parallel programming languageMorel, a data-parallel programming language
Morel, a data-parallel programming languageJulian Hyde
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Julian Hyde
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query LanguageJulian Hyde
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Julian Hyde
 
The evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityThe evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityJulian Hyde
 
What to expect when you're Incubating
What to expect when you're IncubatingWhat to expect when you're Incubating
What to expect when you're IncubatingJulian Hyde
 
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteOpen Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteJulian Hyde
 
Efficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesEfficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesJulian Hyde
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Julian Hyde
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineeringJulian Hyde
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Julian Hyde
 
Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databasesJulian Hyde
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Julian Hyde
 
Lazy beats Smart and Fast
Lazy beats Smart and FastLazy beats Smart and Fast
Lazy beats Smart and FastJulian Hyde
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache CalciteJulian Hyde
 
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteJulian Hyde
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache CalciteJulian Hyde
 

Plus de Julian Hyde (20)

Building a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteBuilding a semantic/metrics layer using Calcite
Building a semantic/metrics layer using Calcite
 
Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!
 
Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQL
 
Morel, a data-parallel programming language
Morel, a data-parallel programming languageMorel, a data-parallel programming language
Morel, a data-parallel programming language
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)
 
The evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityThe evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its Community
 
What to expect when you're Incubating
What to expect when you're IncubatingWhat to expect when you're Incubating
What to expect when you're Incubating
 
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteOpen Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
 
Efficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesEfficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databases
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!
 
Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databases
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
 
Lazy beats Smart and Fast
Lazy beats Smart and FastLazy beats Smart and Fast
Lazy beats Smart and Fast
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache Calcite
 
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
 

Dernier

Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 

Dernier (20)

Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 

SQL on everything, in memory

  • 1. SQL on everything, in memory Page ‹#› © Hortonworks Inc. 2014 Julian Hyde Strata, NYC October 16th, 2014 Apache Calcite
  • 2. About me Julian Hyde Architect at Hortonworks Open source: • Founder & lead, Apache Calcite (query optimization framework) • Founder & lead, Pentaho Mondrian (analysis engine) • Committer, Apache Drill • Contributor, Apache Hive • Contributor, Cascading Lingual (SQL interface to Cascading) Past: • SQLstream (streaming SQL) • Broadbase (data warehouse) • Oracle (SQL kernel development) Page ‹#› © Hortonworks Inc. 2014
  • 3. SQL: in and out of fashion 1969 — CODASYL (network database) 1979 — First commercial SQL RDBMSs 1990 — Acceptance — transaction processing on SQL 1993 — Multi-dimensional databases 1996 — SQL EDWs 2006 — Hadoop and other “big data” technologies 2008 — NoSQL 2011 — SQL on Hadoop 2014 — Interactive analytics on {Hadoop, NoSQL, DBMS}, using SQL ! SQL remains popular. But why? Page ‹#› © Hortonworks Inc. 2014
  • 4. “SQL inside” Implementing SQL well is hard • System cannot just “run the query” as written • Require relational algebra, query planner (optimizer) & metadata …but it’s worth the effort ! Algebra-based systems are more flexible • Add new algorithms (e.g. a better join) • Re-organize data • Choose access path based on statistics • Dumb queries (e.g. machine-generated) • Relational, schema-less, late-schema, non-relational (e.g. key-value, document) Page ‹#› © Hortonworks Inc. 2014
  • 5. Apache Calcite Page ‹#› © Hortonworks Inc. 2014 Apache Calcite
  • 6. Apache Calcite Apache incubator project since May, 2014 • Originally named Optiq Query planning framework • Relational algebra, rewrite rules, cost model • Extensible Packaging • Library (JDBC server optional) • Open source • Community-authored rules, adapters Adoption • Embedded: Lingual (SQL interface to Cascading), Apache Drill, Apache Hive, Kylin OLAP • Adapters: Splunk, Spark, MongoDB, JDBC, CSV, JSON, Web tables, In-memory data Page ‹#› © Hortonworks Inc. 2014
  • 7. Conventional DB architecture Page ‹#› © Hortonworks Inc. 2014
  • 8. Calcite architecture Page ‹#› © Hortonworks Inc. 2014
  • 9. Demo {sqlline, apache-calcite-0.9.1, .csv} Page ‹#› © Hortonworks Inc. 2014
  • 10. Expression tree Splunk Table: splunk MySQL Page ‹#› © Hortonworks Inc. 2014 SELECT p.“product_name”, COUNT(*) AS c FROM “splunk”.”splunk” AS s JOIN “mysql”.”products” AS p ON s.”product_id” = p.”product_id” WHERE s.“action” = 'purchase' GROUP BY p.”product_name” ORDER BY c DESC Key: product_id join Key: product_name Agg: count group Condition: action = 'purchase' filter Key: c DESC sort scan scan Table: products
  • 11. Expression tree (optimized) Splunk Table: splunk Page ‹#› © Hortonworks Inc. 2014 Key: product_id join Key: product_name Agg: count group Condition: action = 'purchase' filter Key: c DESC sort scan MySQL scan Table: products SELECT p.“product_name”, COUNT(*) AS c FROM “splunk”.”splunk” AS s JOIN “mysql”.”products” AS p ON s.”product_id” = p.”product_id” WHERE s.“action” = 'purchase' GROUP BY p.”product_name” ORDER BY c DESC
  • 12. Calcite – APIs and SPIs Page ‹#› © Hortonworks Inc. 2014 Cost, statistics RelOptCost RelOptCostFactory RelMetadataProvider • RelMdColumnUniquensss • RelMdDistinctRowCount • RelMdSelectivity SQL parser SqlNode SqlParser SqlValidator Transformation rules RelOptRule • MergeFilterRule • PushAggregateThroughUnionRule • 100+ more Global transformations • Unification (materialized view) • Column trimming • De-correlation Relational algebra RelNode (operator) • TableScan • Filter • Project • Union • Aggregate • … RelDataType (type) RexNode (expression) RelTrait (physical property) • RelConvention (calling-convention) • RelCollation (sortedness) • TBD (bucketedness/distribution) JDBC driver Metadata Schema Table Function • TableFunction • TableMacro Lattice
  • 13. Calcite Planning Process SqlNode ! RelNode + RexNode Page ‹#› © Hortonworks Inc. 2014 SQL parse tree Planner RelNode Graph Sql-to-Rel Converter 1. Plan Graph • Node for each node in Input Plan • Each node is a Set of alternate Sub Plans • Set further divided into Subsets: based on traits like sortedness 2. Rules • Rule: specifies an Operator sub-graph to match and logic to generate equivalent ‘better’ sub-graph • New and original sub-graph both remain in contention 3. Cost Model • RelNodes have Cost & Cumulative Cost 4. Metadata Providers - Used to plug in Schema, cost formulas - Filter selectivity - Join selectivity - NDV calculations Rule Match Queue - Add Rule matches to Queue - Apply Rule match transformations to plan graph - Iterate for fixed iterations or until cost doesn’t change - Match importance based on cost of RelNode and height Best RelNode Graph Translate to runtime Logical Plan Based on “Volcano” & “Cascades” papers [G. Graefe]
  • 14. Demo {sqlline, apache-calcite-0.9.1, .csv, CsvPushProjectOntoTableRule} Page ‹#› © Hortonworks Inc. 2014
  • 15. Analytics Page ‹#› © Hortonworks Inc. 2014
  • 16. Mondrian OLAP (Saiku user interface)
  • 17. Interactive queries on NoSQL Typical requirements NoSQL operational database (e.g. HBase, MongoDB, Cassandra) Analytic queries aggregate over full scan Speed-of-thought response (< 5 seconds first query, < 1 second second query) Data freshness (< 10 minutes) ! Other requirements Hybrid system (e.g. Hive + HBase) Star schema Page ‹#› © Hortonworks Inc. 2014
  • 18. Star schema Page ‹#› © Hortonworks Inc. 2014 Sales Time Inventory Customer Warehouse Key Fact table Dimension table Many-to-one relationship Product Product class Promotion
  • 19. Simple analytics problem? System 100M US census records 1KB each record, 100GB total 4 SATA 3 disks, total read throughput 1.2GB/s ! Requirement Count all records in < 5s Solution #1 It’s not possible! It takes 80s just to read the data Solution #2 Cheat! Page ‹#› © Hortonworks Inc. 2014
  • 20. How to cheat Multiple tricks Compress data Column-oriented storage Store data in sorted order Put data in memory Cache previous results Pre-compute (materialize) aggregates ! Common factors Make a copy of the data Organize it in a different way Optimizer chooses the most suitable data organization SQL query is unchanged Page ‹#› © Hortonworks Inc. 2014
  • 21. Filter-join-aggregate query SELECT product.id, sum(sales.units), sum(sales.price), count(*) FROM sales … JOIN customer ON … JOIN time ON … JOIN product ON … JOIN product_class ON … WHERE time.year = 2014 AND time.quarter = ‘Q1’ AND product.color = ‘Red’ GROUP BY … Page ‹#› © Hortonworks Inc. 2014 Time Sales Inventory Product Customer Warehouse ProductClass
  • 22. Materialized view, lattice, tile Materialized view A table whose contents are guaranteed to be the same as executing a given query. Lattice Recommends, builds, and recognizes summary materialized views (tiles) based on a star schema. A query defines the tables and many:1 relationships in the star schema. Tile A summary materialized view that belongs to a lattice. A tile may or may not be materialized. Materialization methods: • Declare in lattice • Generate via recommender algorithm • Created in response to query Page ‹#› © Hortonworks Inc. 2014 (FAKE SYNTAX) CREATE MATERIALIZED VIEW t AS SELECT * FROM emps WHERE deptno = 10; CREATE LATTICE star AS SELECT * FROM sales_fact_1997 AS s JOIN product AS p ON … JOIN product_class AS pc ON … JOIN customer AS c ON … JOIN time_by_day AS t ON …; CREATE MATERIALIZED VIEW zg IN star SELECT gender, zipcode, COUNT(*), SUM(unit_sales) FROM star GROUP BY gender, zipcode;
  • 23. Lattice () 1 Page ‹#› © Hortonworks Inc. 2014 > select count(*) as c, sum(unit_sales) as s > from star; +-----------+-----------+ | C | S | +-----------+-----------+ | 1,000,000 | 266,773.0 | +-----------+-----------+ 1 row selected ! > select * from star; 1,000,000 rows selected raw 1m
  • 24. Lattice - top tiles () 1 (z) 43k (s) 50 (g) 2 (y) 5 (m) 12 Page ‹#› © Hortonworks Inc. 2014 raw 1m Key ! z zipcode (43k) s state (50) g gender (2) y year (5) m month (12) > select zipcode, count(*) as c, > sum(unit_sales) as s > from star > group by zipcode; +---------+-----------+-----------+ | ZIPCODE | C | S | +---------+-----------+-----------+ | 10000 | 23 | 31.5 | … +---------+-----------+-----------+ 43,000 rows selected > select state, count(*) as c, > sum(unit_sales) as s > from star > group by state; +-------+-----------+-----------+ | STATE | C | S | +-------+-----------+-----------+ | AL | 201,693 | 5,520.0 | … +-------+-----------+-----------+ 50 rows selected
  • 25. Lattice - more tiles () 1 (z, s) 43.4k (g, y) 10 (y, m) 60 Key ! z zipcode (43k) s state (50) g gender (2) y year (5) m month (12) (z) 43k (s) 50 (g) 2 (y) 5 (m) 12 Page ‹#› © Hortonworks Inc. 2014 (z, s, g, y, m) 912k (s, g, y, m) 6k raw 1m (g, y, m) 120 Fewer than you would expect, because 5m combinations cannot occur in 1m row table Fewer than you would expect, because state depends on zipcode
  • 26. Lattice - complete () 1 (z, s) 43.4k (y, m) 60 Key ! z zipcode (43k) s state (50) g gender (2) y year (5) m month (12) (z) 43k (s) 50 (g) 2 (y) 5 (m) 12 Page ‹#› © Hortonworks Inc. 2014 (z, s, g, y, m) 910k (s, g, y, m) 6k (z, g, y, m) 909k (z, s, y, m) 830k raw 1m (z, s, g, m) 643k (z, s, g, y) 391k (z, s, g) 87k (g, y) 10 (g, y, m) 120 (g, m) 24
  • 27. Lattice - optimized () 1 (z, s) 43.4k (y, m) 60 Key ! z zipcode (43k) s state (50) g gender (2) y year (5) m month (12) (z) 43k (s) 50 (g) 2 (y) 5 (m) 12 Page ‹#› © Hortonworks Inc. 2014 (z, s, g, y, m) 912k (s, g, y, m) 6k (z, g, y, m) 909k (z, s, y, m) 831k raw 1m (z, s, g, m) 644k (z, s, g, y) 392k (z, s, g) 87k (g, y) 10 (g, y, m) 120 (g, m) 24
  • 28. Lattice - optimized () 1 (z, s) 43.4k (y, m) 60 Key ! z zipcode (43k) s state (50) g gender (2) y year (5) m month (12) (z) 43k (s) 50 (g) 2 (y) 5 (m) 12 Page ‹#› © Hortonworks Inc. 2014 (z, s, g, y, m) 912k (s, g, y, m) 6k (z, g, y, m) 909k (z, s, y, m) 831k raw 1m (z, s, g, m) 644k (z, s, g, y) 392k (z, s, g) 87k (g, y) 10 (g, y, m) 120 (g, m) 24 Aggregate Cost (rows) Benefit (query rows saved) % queries s, g, y, m 6k 497k 50% z, s, g 87k 304k 33% g, y 10 1.5k 25% g, m 24 1.5k 25% s, g 100 1.5k 25% y, m 60 1.5k 25%
  • 29. Demo {mysql-foodmart-lattice-model.json} Page ‹#› © Hortonworks Inc. 2014
  • 30. Tiled, in-memory materializations Page ‹#› © Hortonworks Inc. 2014 Query: SELECT x, SUM(y) FROM t GROUP BY x In-memory materialized queries Tables on disk Where we’re going… smart, distributed memory cache & compute framework http://hortonworks.com/blog/dmmq/
  • 31. Kylin OLAP engine Calcite used here Page ‹#› © Hortonworks Inc. 2014
  • 32. Thank you! Apache Calcite @julianhyde http://calcite.incubator.apache.org http://www.kylin.io Page ‹#› © Hortonworks Inc. 2014