SlideShare a Scribd company logo
1 of 42
© 2014 MapR Technologies 1© 2014 MapR Technologies
© 2014 MapR Technologies 2
Who I am
Ted Dunning, Chief Applications Architect, MapR Technologies
Email tdunning@mapr.com tdunning@apache.org
Twitter @Ted_Dunning
VP Incubator
Email tdunning@apache.org
Twitter @ApacheMahout @ApacheDrill
Credit for slides to Luke Han and the Kylin dev team
© 2014 MapR Technologies 3
Kylin Committers
ankur Ankur Bansal
jiangxu Jiang Xu
liyang Li Yang
lukehan Luke Han*
mahongbin Hongbin Ma
xduo Xiaodong Duo
yisong George Song
jhyde Julian Hyde
The real deal
Calcite plenipotentiary
© 2014 MapR Technologies 4
Agenda
• What is Apache Kylin?
• Features & Tech Highlights
• Performance
• Roadmap
• Q & A
© 2014 MapR Technologies 5
What is Kylin?
Extreme OLAP Engine for Big Data
Kylin is an open source Distributed Analytics Engine from (originally from eBay) that
provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop for
extremely large datasets
kylin / ˈkiːˈlɪn / 麒麟
--n. (in Chinese art) a mythical animal of composite form
• Open Sourced on Oct 1st, 2014
• Accepted into incubation November, 2014
• Preparing for first Apache release
© 2014 MapR Technologies 6
Big Data Obligatory Slide
• More and more data becoming available on Hadoop
• Limitations in existing Business Intelligence (BI) Tools
– Limited support for Hadoop
– Data size growing exponentially
– High latency of interactive queries
– Scale-Up architecture
• Challenges to adopt Hadoop as interactive analysis system
– Majority of analyst groups are SQL savvy
– No mature SQL interface on Hadoop
– OLAP capability on Hadoop ecosystem not ready yet
© 2014 MapR Technologies 7
Goals
• Sub-second query latency on billions of rows
• ANSI SQL for both analysts and engineers
• Full OLAP capability to offer advanced functionality
• Seamless Integration with BI Tools
• Support for high cardinality and dimensionality
• High concurrency – thousands of end users
• Distributed and scale out architecture for large data volume
© 2014 MapR Technologies 8
Possible Strategies
• Build from scratch
– A grand tradition
– Large-scale SQL support is much harder than it looks
– Huge level of distraction
• Patch Hive
– Not feasible due to design assumptions in Hive
– Weak optimizer
– Hive isn’t standard SQL anyway
– (but isn’t Hive moving to Calcite?)
© 2014 MapR Technologies 9
Kylin’s Strategy
• Use Calcite as SQL core
– Real SQL
– Real cost-based optimizer
– Already in Apache
– Provides linkage to Apache Drill and future of Hive
• Build cubes externally
– Don’t care which tools, currently Hive, soon Spark
• Use Calcite’s Rex interpreter
– Assumes final aggregations fit on one machine
• Possibly integrate with Drill at some point for parallel execution
© 2014 MapR Technologies 10
Transaction
Operation
Strategy
Analytics Query Taxonomy
High Level
Aggregation
• Very High Level, e.g GMV
by site by vertical by weeks
Analysis
Query
• Mid-level, e.g GMV by site
by vertical, by category
(level x) past 12 weeks
Drill Down
to Detail • Detail Level (Summary Table)
Low Level
Aggregation
• First Level
Aggregation
Transaction
Level
• Transaction
Data
OLAP
Kylin is designed to accelerate 80+% of analytics queries on Hadoop
OLTP
© 2014 MapR Technologies 11
Technical Challenges
• Huge volume data
– Table scan
• Big table joins
– Data shuffling
• Analysis on different granularity
– Runtime aggregation expensive
• Map Reduce job
– Batch processing
© 2014 MapR Technologies 12
How Cubes Work
• Start with a simple table
– revenue,time,item,location,supplier
• Build a table of aggregates for every combination of fields
select sum(revenue), max(revenue), supplier from tbl group by time,item,location;
select sum(revenue), max(revenue), location,supplier from tbl group by time,item;
select sum(revenue), max(revenue), location from tbl group by time,item,supplier;
…
• Then transform queries using appropriate magic
select sum(revenue), city from tbl join location_details
where state = ‘MN’ group by city
 select … from (select sum(),location from cube) join location_details
where state = ‘MN’ group by city
© 2014 MapR Technologies 13
How Cubes Don’t Work
• Total number of cubes is exponential in columns
• High cardinality can result in large cubes
• Skewed data can make cubes larger as original data
• Magic may be insufficient to recognize cubable queries
• Keeping cubes up to date can be hard
• Forget OLTP thoughts like pervasive transactions
© 2014 MapR Technologies 15
OLAP Cube – Balance between Space and Time
Base vs. aggregate cells; ancestor vs. descendant cells; parent vs. child cells
1. (9/15, milk, Urbana, Dairy_land) - <time, item, location, supplier>
2. (9/15, milk, Urbana, *) - <time, item, location>
3. (*, milk, Urbana, *) - <item, location>
4. (*, milk, Chicago, *) - <item, location>
5. (*, milk, *, *) - <item>
Cuboid = one combination of dimensions
Cube = all combinations of dimensions
1111
0111 1011 1101 1110
0011 0101 0110 1001 1010 1100
0001 0010 0100 1000
0000
© 2014 MapR Technologies 16
From Relational to Key-Value
© 2014 MapR Technologies 17
Kylin Architecture Overview
17
Cube Build Engine
(MapReduce…)
SQL
Low Latency - SecondsMid Latency - Minutes
Routing
3rd Party App
(Web App, Mobile…)
Metadata
SQL Tools
(BI Tools: Tableau…)
Query Engine
Hadoop
Hive
REST API JDBC/ODBC
 Online Analysis Data Flow
 Offline Data Flow
 Clients/Users interactive with Kylin via
SQL
 OLAP Cube is transparent to users
Star Schema Data Key Value Data
Data CubeOLAP
Cube
(HBase)
SQL
REST Server
© 2014 MapR Technologies 18
Kylin Depends on Hadoop Eco-system
• Hive
– Input source, pre-join star schema during cube building
• MapReduce
– Aggregate metrics during cube building
• HDFS
– Store intermediate files during cube building
• HBase
– Store and query data cubes
• Calcite
– SQL parsing, code generation, optimization
© 2014 MapR Technologies 19
Agenda
• What is Apache Kylin?
• Features & Tech Highlights
• Performance
• Roadmap
• Q & A
© 2014 MapR Technologies 20
Kylin Highlights
• Extremely Fast OLAP Engine at Scale
Kylin is designed to reduce query latency on Hadoop for 10+ billions of rows of data to seconds
• ANSI SQL Interface on Hadoop
Kylin offers ANSI SQL on Hadoop and supports most ANSI SQL query functions
• Seamless Integration with BI Tools
Kylin currently offers integration capability with BI Tools like Tableau.
• Interactive Query Capability
Users can interact with Hadoop data via Kylin at sub-second latency
• MOLAP Cube
User can define a data model and pre-build in Kylin with more than 10+ billions of raw data
records
© 2014 MapR Technologies 21
More Highlights
• Compression and Encoding Support
• Incremental Refresh of Cubes
• Approximate Query Capability for distinct Count (HyperLogLog)
• Leverage HBase Coprocessor for query latency
• Job Management and Monitoring
• Easy Web interface to manage, build, monitor and query cubes
• Security capability to set ACL at Cube/Project Level
• Support LDAP Integration
© 2014 MapR Technologies 22
Cube Designer
© 2014 MapR Technologies 23
Job Management
© 2014 MapR Technologies 24
Query and Visualization
© 2014 MapR Technologies 25
Tableau Integration
© 2014 MapR Technologies 26
Data Modeling Points of View
Cube: …
Fact Table: …
Dimensions: …
Measures: …
Storage(HBase): …
Fact
Dim Dim
Dim
Source
Star Schema
row A
row B
row C
Column Family
Val 1
Val 2
Val 3
Row Key Column
Target
HBase Storage
Mapping
Cube Metadata
End User Cube Modeler Admin
© 2014 MapR Technologies 27
Process Flow
Source Joined
tables
Build
dict
Dimension
dictionaries
Hive
© 2014 MapR Technologies 28
Process Flow
Joined n cuboid
n-1 cuboids
Apex cuboid
MR MR
Dimension
dictionaries
MR
© 2014 MapR Technologies 29
Process flow
n cuboid n-1 cuboids Apex cuboid
MR
H-files
HBase
© 2014 MapR Technologies 30
How To Store Cube? – HBase Schema
© 2014 MapR Technologies 31
SELECT test_cal_dt.week_beg_dt, test_category.category_name, test_category.lvl2_name, test_category.lvl3_name, test_kylin_fact.lstg_format_name,
test_sites.site_name, SUM(test_kylin_fact.price) AS GMV, COUNT(*) AS TRANS_CNT
FROM test_kylin_fact
LEFT JOIN test_cal_dt ON test_kylin_fact.cal_dt = test_cal_dt.cal_dt
LEFT JOIN test_category ON test_kylin_fact.leaf_categ_id = test_category.leaf_categ_id AND test_kylin_fact.lstg_site_id = test_category.site_id
LEFT JOIN test_sites ON test_kylin_fact.lstg_site_id = test_sites.site_id
WHERE test_kylin_fact.seller_id = 123456OR test_kylin_fact.lstg_format_name = ’New'
GROUP BY test_cal_dt.week_beg_dt, test_category.category_name, test_category.lvl2_name, test_category.lvl3_name,
test_kylin_fact.lstg_format_name,test_sites.site_name
OLAPToEnumerableConverter
OLAPProjectRel(WEEK_BEG_DT=[$0], category_name=[$1], CATEG_LVL2_NAME=[$2], CATEG_LVL3_NAME=[$3], LSTG_FORMAT_NAME=[$4],
SITE_NAME=[$5], GMV=[CASE(=($7, 0), null, $6)], TRANS_CNT=[$8])
OLAPAggregateRel(group=[{0, 1, 2, 3, 4, 5}], agg#0=[$SUM0($6)], agg#1=[COUNT($6)], TRANS_CNT=[COUNT()])
OLAPProjectRel(WEEK_BEG_DT=[$13], category_name=[$21], CATEG_LVL2_NAME=[$15], CATEG_LVL3_NAME=[$14], LSTG_FORMAT_NAME=[$5],
SITE_NAME=[$23], PRICE=[$0])
OLAPFilterRel(condition=[OR(=($3, 123456), =($5, ’New'))])
OLAPJoinRel(condition=[=($2, $25)], joinType=[left])
OLAPJoinRel(condition=[AND(=($6, $22), =($2, $17))], joinType=[left])
OLAPJoinRel(condition=[=($4, $12)], joinType=[left])
OLAPTableScan(table=[[DEFAULT, TEST_KYLIN_FACT]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]])
OLAPTableScan(table=[[DEFAULT, TEST_CAL_DT]], fields=[[0, 1]])
OLAPTableScan(table=[[DEFAULT, test_category]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8]])
OLAPTableScan(table=[[DEFAULT, TEST_SITES]], fields=[[0, 1, 2]])
Query Engine – Kylin Explain Plan
© 2014 MapR Technologies 32
Now Let’s Make it Really Work
• Full Cube
– Pre-aggregate all dimension combinations
– “Curse of dimensionality”: N dimension cube has 2N cuboid.
• Partial Cube
– To avoid dimension explosion, we divide the dimensions into different
aggregation groups
• 2N+M+L  2N + 2M + 2L
– For cube with 30 dimensions, if we divide these dimensions into 3
group, the cuboid count is reduced from 1 Billion to 3 thousand
• 230  210 + 210 + 210
– Tradeoff between online aggregation and offline pre-aggregation
© 2014 MapR Technologies 34
Incremental Cube Building
© 2014 MapR Technologies 36
Agenda
• What is Apache Kylin?
• Features & Tech Highlights
• Performance
• Roadmap
• Q & A
© 2014 MapR Technologies 37
# Query Type Return Dataset Query
On Kylin (s)
Query
On Hive (s)
Comments
1 High Level
Aggregation
4 0.129 157.437 1,217 times
2 Analysis Query 22,669 1.615 109.206 68 times
3 Drill Down to Detail 325,029 12.058 113.123 9 times
4 Drill Down to Detail 524,780 22.42 6383.21 278 times
5 Data Dump 972,002 49.054 N/A
0
50
100
150
200
SQL #1 SQL #2 SQL #3
Hive
Kylin
High
Level
Aggregati
on
Analysis
Query
Drill Down
to Detail
Low Level
Aggregati
on
Transactio
n Level
Based on 12+B records
Kylin vs. Hive
© 2014 MapR Technologies 38
Performance Scaleout
Linear scale out with more nodes
© 2014 MapR Technologies 39
Performance - Query Latency
99 %-ile
95 %-ile
© 2014 MapR Technologies 40
Agenda
• What is Apache Kylin?
• Features & Tech Highlights
• Performance
• Roadmap
• Q & A
© 2014 MapR Technologies 41
201520142013
Initial
Prototype
for MOLAP
• Basic end to end
POC
MOLAP
• Incremental
Refresh
• ANSI SQL
• ODBC Driver
• Web GUI
• ACL
• Open Source
HOLAP
• Streaming OLAP
• JDBC Driver
• New UI
• Excel Support
• … more
Next Gen
• Automation
• Capacity
Management
• In-Memory Analysis
(TBD)
• Spark (TBD)
• … more
TBD
Future…
Sep, 2013
Jan, 2014
Sep, 2014
Q1, 2015
Kylin History and Roadmap
© 2014 MapR Technologies 42
Kylin Ecosystem
• Kylin Core
– Fundamental framework of Kylin
OLAP Engine
• Extension
– Plugins to support for additional
functions and features
• Integration
– Lifecycle Management Support to
integrate with other applications
• Interface
– Allows for third party users to build
more features via user-interface atop
Kylin core
• Driver
– ODBC and JDBC Drivers
Kylin OLAP
Core
Extension
 Security
 Redis Storage
 Spark Engine
 Docker
Interface
 Web Console
 Customized BI
 Ambari/Hue Plugin
Integration
 ODBC Driver
 ETL
 Drill
 SparkSQL
© 2014 MapR Technologies 44
If you want to go fast, go alone.
If you want to go far, go together.
--African Proverb
© 2014 MapR Technologies 45
Agenda
• What is Apache Kylin?
• Features & Tech Highlights
• Performance
• Roadmap
• Q & A
© 2014 MapR Technologies 46
Q&A
@mapr maprtech
tdunning@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies

More Related Content

What's hot

Apache kylin (china hadoop summit 2015 shanghai)
Apache kylin (china hadoop summit 2015 shanghai)Apache kylin (china hadoop summit 2015 shanghai)
Apache kylin (china hadoop summit 2015 shanghai)qhzhou
 
Apache Kylin 1.5 Updates
Apache Kylin 1.5 UpdatesApache Kylin 1.5 Updates
Apache Kylin 1.5 UpdatesYang Li
 
Apache Kylin Streaming
Apache Kylin Streaming Apache Kylin Streaming
Apache Kylin Streaming hongbin ma
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveXu Jiang
 
Kylin olap part 1- getting started
Kylin olap   part 1- getting startedKylin olap   part 1- getting started
Kylin olap part 1- getting startedShubham Shirude
 
Kylin OLAP Engine Tour
Kylin OLAP Engine TourKylin OLAP Engine Tour
Kylin OLAP Engine TourLuke Han
 
Kylin Engineering Principles
Kylin Engineering PrinciplesKylin Engineering Principles
Kylin Engineering PrinciplesXu Jiang
 
Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Seshu Adunuthula
 
Apache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and JapanApache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and JapanLuke Han
 
Big Data MDX with Mondrian and Apache Kylin
Big Data MDX with Mondrian and Apache KylinBig Data MDX with Mondrian and Apache Kylin
Big Data MDX with Mondrian and Apache Kylininovex GmbH
 
The Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke HanThe Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke HanLuke Han
 
Adding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark MeetupAdding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark MeetupLuke Han
 
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @ShanghaiLuke Han
 
Accelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache KylinAccelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache KylinTyler Wishnoff
 
Apache Kylin Introduction
Apache Kylin IntroductionApache Kylin Introduction
Apache Kylin IntroductionLuke Han
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopTony Ng
 
Apache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 BeijingApache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 BeijingLuke Han
 
Apache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 BeijingApache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 BeijingLuke Han
 
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...Luke Han
 

What's hot (20)

Apache kylin (china hadoop summit 2015 shanghai)
Apache kylin (china hadoop summit 2015 shanghai)Apache kylin (china hadoop summit 2015 shanghai)
Apache kylin (china hadoop summit 2015 shanghai)
 
The Evolution of Apache Kylin
The Evolution of Apache KylinThe Evolution of Apache Kylin
The Evolution of Apache Kylin
 
Apache Kylin 1.5 Updates
Apache Kylin 1.5 UpdatesApache Kylin 1.5 Updates
Apache Kylin 1.5 Updates
 
Apache Kylin Streaming
Apache Kylin Streaming Apache Kylin Streaming
Apache Kylin Streaming
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
Kylin olap part 1- getting started
Kylin olap   part 1- getting startedKylin olap   part 1- getting started
Kylin olap part 1- getting started
 
Kylin OLAP Engine Tour
Kylin OLAP Engine TourKylin OLAP Engine Tour
Kylin OLAP Engine Tour
 
Kylin Engineering Principles
Kylin Engineering PrinciplesKylin Engineering Principles
Kylin Engineering Principles
 
Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015
 
Apache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and JapanApache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and Japan
 
Big Data MDX with Mondrian and Apache Kylin
Big Data MDX with Mondrian and Apache KylinBig Data MDX with Mondrian and Apache Kylin
Big Data MDX with Mondrian and Apache Kylin
 
The Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke HanThe Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke Han
 
Adding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark MeetupAdding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark Meetup
 
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
 
Accelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache KylinAccelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache Kylin
 
Apache Kylin Introduction
Apache Kylin IntroductionApache Kylin Introduction
Apache Kylin Introduction
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on Hadoop
 
Apache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 BeijingApache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 Beijing
 
Apache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 BeijingApache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 Beijing
 
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
 

Viewers also liked

Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and DruidPulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and DruidTony Ng
 
Low Latency OLAP with Hadoop and HBase
Low Latency OLAP with Hadoop and HBaseLow Latency OLAP with Hadoop and HBase
Low Latency OLAP with Hadoop and HBaseDataWorks Summit
 
IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?DataWorks Summit
 
OLAP options on Hadoop
OLAP options on HadoopOLAP options on Hadoop
OLAP options on HadoopYuta Imai
 
Introduzione al Data Warehousing ed alla Progettazione di Data Warehouse Dime...
Introduzione al Data Warehousing ed alla Progettazione di Data Warehouse Dime...Introduzione al Data Warehousing ed alla Progettazione di Data Warehouse Dime...
Introduzione al Data Warehousing ed alla Progettazione di Data Warehouse Dime...Davide Ciambelli
 
Real-time Analytics with Apache Flink and Druid
Real-time Analytics with Apache Flink and DruidReal-time Analytics with Apache Flink and Druid
Real-time Analytics with Apache Flink and DruidJan Graßegger
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerMichael Spector
 
Druid at Hadoop Ecosystem
Druid at Hadoop EcosystemDruid at Hadoop Ecosystem
Druid at Hadoop EcosystemSlim Bouguerra
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkEvan Chan
 
Apache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBaseApache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBaseHBaseCon
 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...DataWorks Summit/Hadoop Summit
 
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @ShanghaiLuke Han
 
ТФРВС - весна 2014 - лекция 1
ТФРВС - весна 2014 - лекция 1ТФРВС - весна 2014 - лекция 1
ТФРВС - весна 2014 - лекция 1Alexey Paznikov
 

Viewers also liked (16)

Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and DruidPulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
 
Low Latency OLAP with Hadoop and HBase
Low Latency OLAP with Hadoop and HBaseLow Latency OLAP with Hadoop and HBase
Low Latency OLAP with Hadoop and HBase
 
IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?
 
OLAP options on Hadoop
OLAP options on HadoopOLAP options on Hadoop
OLAP options on Hadoop
 
Introduzione al Data Warehousing ed alla Progettazione di Data Warehouse Dime...
Introduzione al Data Warehousing ed alla Progettazione di Data Warehouse Dime...Introduzione al Data Warehousing ed alla Progettazione di Data Warehouse Dime...
Introduzione al Data Warehousing ed alla Progettazione di Data Warehouse Dime...
 
Real-time Analytics with Apache Flink and Druid
Real-time Analytics with Apache Flink and DruidReal-time Analytics with Apache Flink and Druid
Real-time Analytics with Apache Flink and Druid
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at Appsflyer
 
Druid at Hadoop Ecosystem
Druid at Hadoop EcosystemDruid at Hadoop Ecosystem
Druid at Hadoop Ecosystem
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and Spark
 
Apache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBaseApache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBase
 
Scalable Real-time analytics using Druid
Scalable Real-time analytics using DruidScalable Real-time analytics using Druid
Scalable Real-time analytics using Druid
 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
 
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
 
Xml4js pentaho
Xml4js pentahoXml4js pentaho
Xml4js pentaho
 
ТФРВС - весна 2014 - лекция 1
ТФРВС - весна 2014 - лекция 1ТФРВС - весна 2014 - лекция 1
ТФРВС - весна 2014 - лекция 1
 
NEA Innovation Physics - Part 1
NEA Innovation Physics - Part 1NEA Innovation Physics - Part 1
NEA Innovation Physics - Part 1
 

Similar to Apache Kylin - OLAP Cubes for SQL on Hadoop

Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop BigDataEverywhere
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoopTed Dunning
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014John Berns
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownTed Dunning
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015Ted Dunning
 
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet  With High Performance Time Series DatabaseDealing with an Upside Down Internet  With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series DatabaseDataWorks Summit
 
Stinger Initiative - Deep Dive
Stinger Initiative - Deep DiveStinger Initiative - Deep Dive
Stinger Initiative - Deep DiveHortonworks
 
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad AnsersonUsing Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad AnsersonMapR Technologies
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranThe Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranMapR Technologies
 
Big Data Ecosystem- Impetus Technologies
Big Data Ecosystem-  Impetus TechnologiesBig Data Ecosystem-  Impetus Technologies
Big Data Ecosystem- Impetus TechnologiesImpetus Technologies
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleIan Downard
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with HadoopDataWorks Summit
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation TechnTed Dunning
 
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Mathieu Dumoulin
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on HadoopCarol McDonald
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Dataconomy Media
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Mats Uddenfeldt
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelt3rmin4t0r
 

Similar to Apache Kylin - OLAP Cubes for SQL on Hadoop (20)

Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014
 
Keys for Success from Streams to Queries
Keys for Success from Streams to QueriesKeys for Success from Streams to Queries
Keys for Success from Streams to Queries
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
 
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet  With High Performance Time Series DatabaseDealing with an Upside Down Internet  With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series Database
 
Stinger Initiative - Deep Dive
Stinger Initiative - Deep DiveStinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
 
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad AnsersonUsing Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to Spark
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranThe Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
 
Big Data Ecosystem- Impetus Technologies
Big Data Ecosystem-  Impetus TechnologiesBig Data Ecosystem-  Impetus Technologies
Big Data Ecosystem- Impetus Technologies
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating Example
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with Hadoop
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation Techn
 
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on Hadoop
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
 

More from Ted Dunning

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxTed Dunning
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with KubernetesTed Dunning
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in KubernetesTed Dunning
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forTed Dunning
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning LogisticsTed Dunning
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTed Dunning
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logisticsTed Dunning
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real DataTed Dunning
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteTed Dunning
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Ted Dunning
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data SecurelyTed Dunning
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeTed Dunning
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossibleTed Dunning
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningTed Dunning
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesTed Dunning
 
What's new in Apache Mahout
What's new in Apache MahoutWhat's new in Apache Mahout
What's new in Apache MahoutTed Dunning
 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Ted Dunning
 
My talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveMy talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveTed Dunning
 

More from Ted Dunning (20)

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
 
T digest-update
T digest-updateT digest-update
T digest-update
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data Securely
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossible
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approaches
 
What's new in Apache Mahout
What's new in Apache MahoutWhat's new in Apache Mahout
What's new in Apache Mahout
 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0
 
My talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveMy talk about recommendation and search to the Hive
My talk about recommendation and search to the Hive
 

Recently uploaded

Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 

Recently uploaded (20)

Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 

Apache Kylin - OLAP Cubes for SQL on Hadoop

  • 1. © 2014 MapR Technologies 1© 2014 MapR Technologies
  • 2. © 2014 MapR Technologies 2 Who I am Ted Dunning, Chief Applications Architect, MapR Technologies Email tdunning@mapr.com tdunning@apache.org Twitter @Ted_Dunning VP Incubator Email tdunning@apache.org Twitter @ApacheMahout @ApacheDrill Credit for slides to Luke Han and the Kylin dev team
  • 3. © 2014 MapR Technologies 3 Kylin Committers ankur Ankur Bansal jiangxu Jiang Xu liyang Li Yang lukehan Luke Han* mahongbin Hongbin Ma xduo Xiaodong Duo yisong George Song jhyde Julian Hyde The real deal Calcite plenipotentiary
  • 4. © 2014 MapR Technologies 4 Agenda • What is Apache Kylin? • Features & Tech Highlights • Performance • Roadmap • Q & A
  • 5. © 2014 MapR Technologies 5 What is Kylin? Extreme OLAP Engine for Big Data Kylin is an open source Distributed Analytics Engine from (originally from eBay) that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop for extremely large datasets kylin / ˈkiːˈlɪn / 麒麟 --n. (in Chinese art) a mythical animal of composite form • Open Sourced on Oct 1st, 2014 • Accepted into incubation November, 2014 • Preparing for first Apache release
  • 6. © 2014 MapR Technologies 6 Big Data Obligatory Slide • More and more data becoming available on Hadoop • Limitations in existing Business Intelligence (BI) Tools – Limited support for Hadoop – Data size growing exponentially – High latency of interactive queries – Scale-Up architecture • Challenges to adopt Hadoop as interactive analysis system – Majority of analyst groups are SQL savvy – No mature SQL interface on Hadoop – OLAP capability on Hadoop ecosystem not ready yet
  • 7. © 2014 MapR Technologies 7 Goals • Sub-second query latency on billions of rows • ANSI SQL for both analysts and engineers • Full OLAP capability to offer advanced functionality • Seamless Integration with BI Tools • Support for high cardinality and dimensionality • High concurrency – thousands of end users • Distributed and scale out architecture for large data volume
  • 8. © 2014 MapR Technologies 8 Possible Strategies • Build from scratch – A grand tradition – Large-scale SQL support is much harder than it looks – Huge level of distraction • Patch Hive – Not feasible due to design assumptions in Hive – Weak optimizer – Hive isn’t standard SQL anyway – (but isn’t Hive moving to Calcite?)
  • 9. © 2014 MapR Technologies 9 Kylin’s Strategy • Use Calcite as SQL core – Real SQL – Real cost-based optimizer – Already in Apache – Provides linkage to Apache Drill and future of Hive • Build cubes externally – Don’t care which tools, currently Hive, soon Spark • Use Calcite’s Rex interpreter – Assumes final aggregations fit on one machine • Possibly integrate with Drill at some point for parallel execution
  • 10. © 2014 MapR Technologies 10 Transaction Operation Strategy Analytics Query Taxonomy High Level Aggregation • Very High Level, e.g GMV by site by vertical by weeks Analysis Query • Mid-level, e.g GMV by site by vertical, by category (level x) past 12 weeks Drill Down to Detail • Detail Level (Summary Table) Low Level Aggregation • First Level Aggregation Transaction Level • Transaction Data OLAP Kylin is designed to accelerate 80+% of analytics queries on Hadoop OLTP
  • 11. © 2014 MapR Technologies 11 Technical Challenges • Huge volume data – Table scan • Big table joins – Data shuffling • Analysis on different granularity – Runtime aggregation expensive • Map Reduce job – Batch processing
  • 12. © 2014 MapR Technologies 12 How Cubes Work • Start with a simple table – revenue,time,item,location,supplier • Build a table of aggregates for every combination of fields select sum(revenue), max(revenue), supplier from tbl group by time,item,location; select sum(revenue), max(revenue), location,supplier from tbl group by time,item; select sum(revenue), max(revenue), location from tbl group by time,item,supplier; … • Then transform queries using appropriate magic select sum(revenue), city from tbl join location_details where state = ‘MN’ group by city  select … from (select sum(),location from cube) join location_details where state = ‘MN’ group by city
  • 13. © 2014 MapR Technologies 13 How Cubes Don’t Work • Total number of cubes is exponential in columns • High cardinality can result in large cubes • Skewed data can make cubes larger as original data • Magic may be insufficient to recognize cubable queries • Keeping cubes up to date can be hard • Forget OLTP thoughts like pervasive transactions
  • 14. © 2014 MapR Technologies 15 OLAP Cube – Balance between Space and Time Base vs. aggregate cells; ancestor vs. descendant cells; parent vs. child cells 1. (9/15, milk, Urbana, Dairy_land) - <time, item, location, supplier> 2. (9/15, milk, Urbana, *) - <time, item, location> 3. (*, milk, Urbana, *) - <item, location> 4. (*, milk, Chicago, *) - <item, location> 5. (*, milk, *, *) - <item> Cuboid = one combination of dimensions Cube = all combinations of dimensions 1111 0111 1011 1101 1110 0011 0101 0110 1001 1010 1100 0001 0010 0100 1000 0000
  • 15. © 2014 MapR Technologies 16 From Relational to Key-Value
  • 16. © 2014 MapR Technologies 17 Kylin Architecture Overview 17 Cube Build Engine (MapReduce…) SQL Low Latency - SecondsMid Latency - Minutes Routing 3rd Party App (Web App, Mobile…) Metadata SQL Tools (BI Tools: Tableau…) Query Engine Hadoop Hive REST API JDBC/ODBC  Online Analysis Data Flow  Offline Data Flow  Clients/Users interactive with Kylin via SQL  OLAP Cube is transparent to users Star Schema Data Key Value Data Data CubeOLAP Cube (HBase) SQL REST Server
  • 17. © 2014 MapR Technologies 18 Kylin Depends on Hadoop Eco-system • Hive – Input source, pre-join star schema during cube building • MapReduce – Aggregate metrics during cube building • HDFS – Store intermediate files during cube building • HBase – Store and query data cubes • Calcite – SQL parsing, code generation, optimization
  • 18. © 2014 MapR Technologies 19 Agenda • What is Apache Kylin? • Features & Tech Highlights • Performance • Roadmap • Q & A
  • 19. © 2014 MapR Technologies 20 Kylin Highlights • Extremely Fast OLAP Engine at Scale Kylin is designed to reduce query latency on Hadoop for 10+ billions of rows of data to seconds • ANSI SQL Interface on Hadoop Kylin offers ANSI SQL on Hadoop and supports most ANSI SQL query functions • Seamless Integration with BI Tools Kylin currently offers integration capability with BI Tools like Tableau. • Interactive Query Capability Users can interact with Hadoop data via Kylin at sub-second latency • MOLAP Cube User can define a data model and pre-build in Kylin with more than 10+ billions of raw data records
  • 20. © 2014 MapR Technologies 21 More Highlights • Compression and Encoding Support • Incremental Refresh of Cubes • Approximate Query Capability for distinct Count (HyperLogLog) • Leverage HBase Coprocessor for query latency • Job Management and Monitoring • Easy Web interface to manage, build, monitor and query cubes • Security capability to set ACL at Cube/Project Level • Support LDAP Integration
  • 21. © 2014 MapR Technologies 22 Cube Designer
  • 22. © 2014 MapR Technologies 23 Job Management
  • 23. © 2014 MapR Technologies 24 Query and Visualization
  • 24. © 2014 MapR Technologies 25 Tableau Integration
  • 25. © 2014 MapR Technologies 26 Data Modeling Points of View Cube: … Fact Table: … Dimensions: … Measures: … Storage(HBase): … Fact Dim Dim Dim Source Star Schema row A row B row C Column Family Val 1 Val 2 Val 3 Row Key Column Target HBase Storage Mapping Cube Metadata End User Cube Modeler Admin
  • 26. © 2014 MapR Technologies 27 Process Flow Source Joined tables Build dict Dimension dictionaries Hive
  • 27. © 2014 MapR Technologies 28 Process Flow Joined n cuboid n-1 cuboids Apex cuboid MR MR Dimension dictionaries MR
  • 28. © 2014 MapR Technologies 29 Process flow n cuboid n-1 cuboids Apex cuboid MR H-files HBase
  • 29. © 2014 MapR Technologies 30 How To Store Cube? – HBase Schema
  • 30. © 2014 MapR Technologies 31 SELECT test_cal_dt.week_beg_dt, test_category.category_name, test_category.lvl2_name, test_category.lvl3_name, test_kylin_fact.lstg_format_name, test_sites.site_name, SUM(test_kylin_fact.price) AS GMV, COUNT(*) AS TRANS_CNT FROM test_kylin_fact LEFT JOIN test_cal_dt ON test_kylin_fact.cal_dt = test_cal_dt.cal_dt LEFT JOIN test_category ON test_kylin_fact.leaf_categ_id = test_category.leaf_categ_id AND test_kylin_fact.lstg_site_id = test_category.site_id LEFT JOIN test_sites ON test_kylin_fact.lstg_site_id = test_sites.site_id WHERE test_kylin_fact.seller_id = 123456OR test_kylin_fact.lstg_format_name = ’New' GROUP BY test_cal_dt.week_beg_dt, test_category.category_name, test_category.lvl2_name, test_category.lvl3_name, test_kylin_fact.lstg_format_name,test_sites.site_name OLAPToEnumerableConverter OLAPProjectRel(WEEK_BEG_DT=[$0], category_name=[$1], CATEG_LVL2_NAME=[$2], CATEG_LVL3_NAME=[$3], LSTG_FORMAT_NAME=[$4], SITE_NAME=[$5], GMV=[CASE(=($7, 0), null, $6)], TRANS_CNT=[$8]) OLAPAggregateRel(group=[{0, 1, 2, 3, 4, 5}], agg#0=[$SUM0($6)], agg#1=[COUNT($6)], TRANS_CNT=[COUNT()]) OLAPProjectRel(WEEK_BEG_DT=[$13], category_name=[$21], CATEG_LVL2_NAME=[$15], CATEG_LVL3_NAME=[$14], LSTG_FORMAT_NAME=[$5], SITE_NAME=[$23], PRICE=[$0]) OLAPFilterRel(condition=[OR(=($3, 123456), =($5, ’New'))]) OLAPJoinRel(condition=[=($2, $25)], joinType=[left]) OLAPJoinRel(condition=[AND(=($6, $22), =($2, $17))], joinType=[left]) OLAPJoinRel(condition=[=($4, $12)], joinType=[left]) OLAPTableScan(table=[[DEFAULT, TEST_KYLIN_FACT]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]]) OLAPTableScan(table=[[DEFAULT, TEST_CAL_DT]], fields=[[0, 1]]) OLAPTableScan(table=[[DEFAULT, test_category]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8]]) OLAPTableScan(table=[[DEFAULT, TEST_SITES]], fields=[[0, 1, 2]]) Query Engine – Kylin Explain Plan
  • 31. © 2014 MapR Technologies 32 Now Let’s Make it Really Work • Full Cube – Pre-aggregate all dimension combinations – “Curse of dimensionality”: N dimension cube has 2N cuboid. • Partial Cube – To avoid dimension explosion, we divide the dimensions into different aggregation groups • 2N+M+L  2N + 2M + 2L – For cube with 30 dimensions, if we divide these dimensions into 3 group, the cuboid count is reduced from 1 Billion to 3 thousand • 230  210 + 210 + 210 – Tradeoff between online aggregation and offline pre-aggregation
  • 32. © 2014 MapR Technologies 34 Incremental Cube Building
  • 33. © 2014 MapR Technologies 36 Agenda • What is Apache Kylin? • Features & Tech Highlights • Performance • Roadmap • Q & A
  • 34. © 2014 MapR Technologies 37 # Query Type Return Dataset Query On Kylin (s) Query On Hive (s) Comments 1 High Level Aggregation 4 0.129 157.437 1,217 times 2 Analysis Query 22,669 1.615 109.206 68 times 3 Drill Down to Detail 325,029 12.058 113.123 9 times 4 Drill Down to Detail 524,780 22.42 6383.21 278 times 5 Data Dump 972,002 49.054 N/A 0 50 100 150 200 SQL #1 SQL #2 SQL #3 Hive Kylin High Level Aggregati on Analysis Query Drill Down to Detail Low Level Aggregati on Transactio n Level Based on 12+B records Kylin vs. Hive
  • 35. © 2014 MapR Technologies 38 Performance Scaleout Linear scale out with more nodes
  • 36. © 2014 MapR Technologies 39 Performance - Query Latency 99 %-ile 95 %-ile
  • 37. © 2014 MapR Technologies 40 Agenda • What is Apache Kylin? • Features & Tech Highlights • Performance • Roadmap • Q & A
  • 38. © 2014 MapR Technologies 41 201520142013 Initial Prototype for MOLAP • Basic end to end POC MOLAP • Incremental Refresh • ANSI SQL • ODBC Driver • Web GUI • ACL • Open Source HOLAP • Streaming OLAP • JDBC Driver • New UI • Excel Support • … more Next Gen • Automation • Capacity Management • In-Memory Analysis (TBD) • Spark (TBD) • … more TBD Future… Sep, 2013 Jan, 2014 Sep, 2014 Q1, 2015 Kylin History and Roadmap
  • 39. © 2014 MapR Technologies 42 Kylin Ecosystem • Kylin Core – Fundamental framework of Kylin OLAP Engine • Extension – Plugins to support for additional functions and features • Integration – Lifecycle Management Support to integrate with other applications • Interface – Allows for third party users to build more features via user-interface atop Kylin core • Driver – ODBC and JDBC Drivers Kylin OLAP Core Extension  Security  Redis Storage  Spark Engine  Docker Interface  Web Console  Customized BI  Ambari/Hue Plugin Integration  ODBC Driver  ETL  Drill  SparkSQL
  • 40. © 2014 MapR Technologies 44 If you want to go fast, go alone. If you want to go far, go together. --African Proverb
  • 41. © 2014 MapR Technologies 45 Agenda • What is Apache Kylin? • Features & Tech Highlights • Performance • Roadmap • Q & A
  • 42. © 2014 MapR Technologies 46 Q&A @mapr maprtech tdunning@mapr.com Engage with us! MapR maprtech mapr-technologies