SlideShare une entreprise Scribd logo
1  sur  37
http://kylin.io
Apache Kylin Deep Dive
Streaming & Plugin Architecture
Oct 10, 2015 |@ApacheKylin
Yang Li Architect & Tech Leader | yangli9@ebay.com
http://kylin.io
Agenda
n What’s	
  Apache	
  Kylin?
n Plugin	
  Architecture
n Fast	
  Cubing
n Streaming	
  Cubing
n Summary
http://kylin.io
Extreme	
  OLAP Engine	
  for	
  Big	
  Data
Apache	
  Kylin is	
  an	
  open	
  source	
  Distributed	
  Analytics	
  Engine	
  designed	
  to	
  
provide	
  SQL	
  interface	
  and	
  multi-­‐dimensional	
  analysis	
  (OLAP)	
  on	
  
Hadoop	
  supporting	
  extremely	
  large	
  datasets,	
  original	
  contributed	
  from	
  
eBay	
  Inc.
What’s	
  Kylin
kylin /  ˈkiːˈlɪn /  麒麟
-­-­n.  (in  Chinese  art)  a  mythical  animal  of  composite  form  
• Open	
  Sourced	
  on	
  Oct	
  1st,	
  2014
• Accepted	
  as	
  Apache	
  Incubator	
  Project	
  on	
  Nov	
  25th,	
  2014
http://kylin.io
$28B
GMV VIA MOBILE
(2014)
266M
MOBILE APP
GLOBALLY
1B
LISTINGS CREATED
VIA MOBILE
157M
ACTIVE BUYERS
25M
ACTIVE SELLERS
800M
ACTIVE LISTINGS
8.8M
NEW LISTINGS
EVERY WEEK
Big	
  Data	
  @	
  eBay
http://kylin.io
n eBay
n Adoptions
n Baidu	
  Map,	
  China	
  Mobile,	
  明略数据,	
  京东,	
  美团,	
  唯品会…
n Expedia,	
  Microsoft,	
  Tableau,	
  Infoworks.io…
Feature	
  – Big	
  Data
Case Cube  
Size
Raw  Records
Session  Analysis 20  TB 81+  billion  rows
Traffic  Analysis 30  TB 28+  billion  rows
Transaction  Analysis 560  GB 1.2+  billion  rows
http://kylin.io
Feature	
  – SQL	
  Interface	
  
http://kylin.io
Feature	
  – BI	
  Integration	
  via	
  ODBC,	
  JDBC
http://kylin.io
Feature	
  – Low	
  Latency
90%  queries  <5s
Dark-­blue  line:  90%tile  queries
Light-­blue  line:  95%tile  queries
90%  query  returns  in  3  seconds
http://kylin.io
Feature	
  – Scalable	
  Throughput
Linear  scale  out  with  more  nodes
http://kylin.io
n A	
  query	
  may	
  consider	
  only	
  3	
  dimensions
How	
  it	
  works	
  – Materialized	
  View
http://kylin.io
n Base	
  vs.	
  aggregate	
  cells;	
  ancestor	
  vs.	
  descendant	
  cells;	
  parent	
  vs.	
  child	
  cells
1. (9/15,	
  milk,	
  Urbana,	
  Dairy_land)	
  	
  -­‐ <time, item, location, supplier>
2. (9/15,	
  milk,	
  Urbana,	
  *)	
  	
  -­‐ <time, item, location>
3. (*,	
  milk,	
  Urbana,	
  *)	
  	
  -­‐ <item, location>
4. (*,	
  milk,	
  Chicago,	
  *)	
  -­‐ <item, location>
5. (*,	
  milk,	
  *,	
  *)	
  	
  -­‐ <item>
How	
  it	
  works	
  – OLAP	
  Cube,	
  space	
  for	
  time
• Cuboid	
  =	
  one	
  combination	
  of	
  dimensions
• Cube	
  =	
  all	
  combination	
  of	
  dimensions	
   	
  (all	
  cuboids)
time, item
time, item, location
time, item, location, supplier
time item location supplier
time, location
Time, supplier
item, location
item, supplier
location, supplier
time, item, supplier
time, location, supplier
item, location, supplier
0-D(apex) cuboid
1-D cuboids
2-D cuboids
3-D cuboids
4-D(base) cuboid
http://kylin.io
Agenda
n What’s	
  Apache	
  Kylin?
n Plugin	
  Architecture
n Fast	
  Cubing
n Streaming	
  Cubing
n Summary
http://kylin.io
Kylin	
  Architecture	
  Overview
13
Cube Builder (MapReduce…)
SQL
Low	
  	
  Latency	
  -­‐
SecondsRouting
3rd	
  Party	
  App
(Web	
  App,	
  Mobile…)
Metadata
SQL-­‐Based	
  Tool
(BI	
  Tools:	
  Tableau…)
Query	
  Engine
Hadoop
Hive
REST	
  API JDBC/ODBC
Ø Online	
  Analysis	
  Data	
  Flow
Ø Offline	
  Data	
  Flow
Ø Clients/Users	
   interactive	
   with	
  
Kylin	
  via	
  SQL
Ø OLAP	
  Cube	
  is	
  transparent	
   to	
  
users
Star	
  Schema	
  Data Key	
  Value	
  Data
Data	
  
Cube
OLAP
Cubes
(HBase)
SQL
REST	
  Server
Data	
  Source	
  
Abstraction	
  
Engine	
  
Abstraction	
  
Storage
Abstraction	
  
http://kylin.io
Engine
Plugin	
  Architecture
IN OUT
Hive HBase
Cube	
  Meta
http://kylin.io
MapRed
Plugin	
  Architecture
Hive
Hive	
  Adapter HBase Adapter
HBase
Cube	
  Meta
http://kylin.io
n Engine
n MR	
  V1
n MR	
  V2
n Spark
n Streaming
n Source
n Hive
n Kafka
n Spark	
  SQL	
  &	
  DataFrames
n Storage
n HBase
n ?	
  Kudu	
  (Cloudera)
n ?	
  Cassandra
2.x	
  Developing	
  Modules
http://kylin.io
n The	
  freedom
n Zoo	
  break,	
  not	
  bound	
  to	
  Hadoop	
  any	
  more
n Free	
  to	
  go	
  to	
  a	
  better	
  engine	
  or	
  storage
n Extensibility
n Accept	
  any	
  input,	
  e.g.	
  Kafka
n Embrace	
  next-­‐gen	
  distributed	
  platform,	
  e.g.	
  Spark
n Flexibility
n Choose	
  different	
  engine	
  for	
  different	
  data	
  set
The	
  Freedom,	
  Extensibility,	
  Flexibility
http://kylin.io
Agenda
n What’s	
  Apache	
  Kylin?
n Plugin	
  Architecture
n Fast	
  Cubing
n Streaming	
  Cubing
n Summary
http://kylin.io
Layered	
  Cubing	
  (MR	
  Engine	
  V1)
Full	
  Data
0-­‐D	
  Cuboid
1-­‐D	
  Cuboid
2-­‐D	
  Cuboid
3-­‐D	
  Cuboid
4-­‐D	
  Cuboid
MR
MR
MR
MR
MR
A,B,C,D
A,B,C A,B,D A,C,D B,C,D
http://kylin.io
n Pros
n Simple	
  implementation,	
  depends	
  on	
  MR	
  shuffle	
  to	
  
merge	
  sort	
  and	
  then	
  aggregate
n Little	
  requirement	
  on	
  memory
n Cons
n Aggregation	
  happens	
  at	
  reducer	
  side
n Mapper	
  outputs	
  raw	
  data	
  thus	
  shuffle	
  is	
  huge
n Multiple	
  rounds	
  of	
  MR	
  overhead
n Shuffle	
  can	
  be	
  100x	
  of	
  cube	
  size,	
  big	
  I/O	
  pressure
Layered	
  Cubing (MR	
  Engine	
  V1)
http://kylin.io
Fast	
  Cubing	
  (MR	
  Engine	
  V2)
Data	
  Split
Cube	
  Segment
Data	
  Split
Cube	
  Segment
Data	
  Split
Cube	
  Segment
……
Final	
  Cube
Merge	
  Sort
(Shuffle)
mapper mapper mapper
reducer
http://kylin.io
n One	
  round	
  MR	
  calculates	
  the	
  whole	
  cube
n Minimize	
  scheduling	
  overhead
n Aggregation	
  happens	
  at	
  mapper	
  side
n 1M	
  raw	
  records	
  becomes	
  10K	
  at	
  base	
  level
n Reduced	
  shuffles	
  size,	
  20x	
  total	
  cube	
  size
n Memory	
  eater
Fast	
  Cubing	
  (MR	
  Engine	
  V2)
http://kylin.io
n A	
  simplified	
  star	
  cubing	
  algorithm
n Xin,	
  Dong,	
  et	
  al.	
  "Star-­‐cubing:	
  Computing	
  iceberg	
  cubes	
  by	
  top-­‐down	
  and	
  bottom-­‐up	
  integration." Proceedings	
  of	
  
the	
  29th	
  international	
  conference	
  on	
  Very	
  large	
  data	
  bases-­‐Volume	
  29.	
  VLDB	
  Endowment,	
  2003.
n Top-­‐down;	
  Free	
  resource	
  on	
  branch	
  complete
n Multi-­‐threading	
  if	
  mem	
  available;	
  Ordered	
  output
In-­‐Mem	
  Cubing
http://kylin.io
n Pros
n Lesser	
  network	
  pressure
n Independent	
  cubing	
  algorithm	
  that	
  can	
  be	
  
reused	
  by	
  Streaming,	
  Spark	
  etc.
n Seems	
  30%-­‐50%	
  faster
n Cons
n Code	
  complexity
n High	
  mapper	
  CPU/Mem	
  consumption
Fast	
  Cubing	
  Summary
http://kylin.io
Comparison	
  on	
  ~500	
  GB	
  cubes
Fast  cubing  is  30%  -­ 50%  faster
0
20
40
60
80
100
120
Case  1 Case  2
Layered  Cubing Fast  Cubing
http://kylin.io
Agenda
n What’s	
  Apache	
  Kylin?
n Plugin	
  Architecture
n Fast	
  Cubing
n Streaming	
  Cubing
n Summary
http://kylin.io
Incremental	
  Build
http://kylin.io
n Do	
  micro	
  batch	
  at	
  minutes	
  interval
n Source	
  data	
  from	
  streaming	
  input
n Fast	
  cubing
Xin,	
  Dong,	
  et	
  al.	
  "Star-­‐cubing:	
  Computing	
  iceberg	
  cubes	
  by	
  top-­‐down	
  
and	
  bottom-­‐up	
  integration."Proceedings	
  of	
  the	
  29th	
  international	
  
conference	
  on	
  Very	
  large	
  data	
  bases-­‐Volume	
  29.	
  VLDB	
  Endowment,	
  
2003.
n Cube	
  auto	
  merge	
  and	
  garbage	
  collection
Push	
  the	
  Idea	
  to	
  Near	
  Realtime
http://kylin.io
Fast	
  Cubing
Streaming	
  Setup
Kafka
Kafka	
  Adapter HBase Adapter
HBase
Streaming	
  Cube
http://kylin.io
Stream	
  Data	
  Consuming	
  
http://kylin.io
Cube	
  Auto	
  Merge
In-­‐Memory	
  
Cube	
  	
  Building
Auto	
  Cube	
  
Merge	
  with	
  MR
http://kylin.io
Use	
  Case:	
  SEO	
  Operational	
  Dashboard
• eBay	
  Site
– ebay.com,	
  ebay.co.uk,	
  ebay.de
• Buyer	
  Country
– US,	
  CN,	
  RU
• Search	
  Engine	
  
– Google,	
   Bing,	
  Yahoo!
• Referrer
– google.com,	
  google.co.uk
• Page
– Search,	
  View	
  Item,	
  Product
• User	
  Experience
– Desktop,	
  Mobile	
  APP,	
  mWeb
• Visits, GMB $, GMB share,
conversion rate, bounce rate, # of
view items, # of bought items etc.
Dimensions
Measurements
http://kylin.io
Future	
  Lambda	
  Architecture	
  for	
  Realtime
Cube	
  StorageReal-­‐time	
  In-­‐Mem	
  Store
streaming Kafka
SQL	
  Query
minute	
  batch
Latest	
  second
Inverted	
  
Index
Hybrid	
  Storage	
  
Interface
Cube
http://kylin.io
DT,LOC TopN
2015-­‐10-­‐1,CN Item	
  A, $500
Item	
  B,	
  $300
…
TopN Support
select dt,	
  loc,	
  item,	
  sum(gmv)
from test_kylin_fact
where dt=‘2015-­‐10-­‐1’	
  and loc=‘CN’
group	
  by dt,	
  loc,	
  item
order	
  by 2	
  desc
limit 100 cube	
  pre-­‐calculation
n TopN as	
  a	
  measure
n Answer	
  TopN queries	
  directly	
  from	
  pre-­‐calculation
n Approximate	
  algorithm
n SpaceSaving TopN
n Ahmed	
  Metwally,	
  et	
  al.	
  “Efficient	
  computation	
  of	
  frequent	
  and	
  top-­‐k	
  elements	
  in	
  data	
  streams”.	
  Proceeding	
  ICDT'05	
  
Proceedings	
  of	
  the	
  10th	
  international	
  conference	
  on	
  Database	
  Theory,	
  2005.
n A	
  parallel	
  version
n Massimo	
  Cafaro,	
  et	
  al.	
  “A	
  parallel	
  space	
  saving	
  algorithm	
  for	
  frequent	
  items	
  and	
  the	
  Hurwitz	
  zeta	
  distribution”.	
  
Proceeding	
  arXiv:	
  1401.0702v12	
  [cs.DS]	
  19	
  Setp 2015.
http://kylin.io
Agenda
n What’s	
  Apache	
  Kylin?
n Plugin	
  Architecture
n Fast	
  Cubing
n Streaming	
  Cubing
n Summary
http://kylin.io
n Coming	
  soon…
n Plugin	
  Architecture
n Replaceable	
  engine,	
  storage,	
  source
n Fast	
  Cubing
n 30%-­‐50%	
  faster
n Streaming	
  Cubing
n Support	
  NRT	
  analysis
n Lightening	
  fast	
  TopN
New	
  features	
  in	
  2.x
http://kylin.io
n Kylin Site:
n http://kylin.io
n Twitter/微博:
n @ApacheKylin
n 微信公众号
n ApacheKylin
We	
  are	
  hiring

Contenu connexe

Tendances

The Apache Way - Building Open Source Community in China - Luke Han
The Apache Way - Building Open Source Community in China - Luke HanThe Apache Way - Building Open Source Community in China - Luke Han
The Apache Way - Building Open Source Community in China - Luke HanLuke Han
 
Apache Kylin: Hadoop OLAP Engine, 2014 Dec
Apache Kylin: Hadoop OLAP Engine, 2014 DecApache Kylin: Hadoop OLAP Engine, 2014 Dec
Apache Kylin: Hadoop OLAP Engine, 2014 DecYang Li
 
Apache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBaseApache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBaseHBaseCon
 
Apache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 BeijingApache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 BeijingLuke Han
 
Apache Kylin 1.5 Updates
Apache Kylin 1.5 UpdatesApache Kylin 1.5 Updates
Apache Kylin 1.5 UpdatesYang Li
 
Apache Kylin Streaming
Apache Kylin Streaming Apache Kylin Streaming
Apache Kylin Streaming hongbin ma
 
Kylin olap part 1- getting started
Kylin olap   part 1- getting startedKylin olap   part 1- getting started
Kylin olap part 1- getting startedShubham Shirude
 
Apache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and JapanApache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and JapanLuke Han
 
Big Data MDX with Mondrian and Apache Kylin
Big Data MDX with Mondrian and Apache KylinBig Data MDX with Mondrian and Apache Kylin
Big Data MDX with Mondrian and Apache Kylininovex GmbH
 
Apache Kylin - Balance between space and time - Hadoop Summit 2015
Apache Kylin -  Balance between space and time - Hadoop Summit 2015Apache Kylin -  Balance between space and time - Hadoop Summit 2015
Apache Kylin - Balance between space and time - Hadoop Summit 2015Debashis Saha
 
Apache Kylin – Cubes on Hadoop
Apache Kylin – Cubes on HadoopApache Kylin – Cubes on Hadoop
Apache Kylin – Cubes on HadoopDataWorks Summit
 
Apache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouseApache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouseYang Li
 
Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Seshu Adunuthula
 
Apache Kylin on HBase: Extreme OLAP engine for big data
Apache Kylin on HBase: Extreme OLAP engine for big dataApache Kylin on HBase: Extreme OLAP engine for big data
Apache Kylin on HBase: Extreme OLAP engine for big dataShi Shao Feng
 
Design cube in Apache Kylin
Design cube in Apache KylinDesign cube in Apache Kylin
Design cube in Apache KylinYang Li
 
Kylin Engineering Principles
Kylin Engineering PrinciplesKylin Engineering Principles
Kylin Engineering PrinciplesXu Jiang
 
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @ShanghaiLuke Han
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopTed Dunning
 
ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015Luke Han
 

Tendances (20)

The Apache Way - Building Open Source Community in China - Luke Han
The Apache Way - Building Open Source Community in China - Luke HanThe Apache Way - Building Open Source Community in China - Luke Han
The Apache Way - Building Open Source Community in China - Luke Han
 
Apache Kylin: Hadoop OLAP Engine, 2014 Dec
Apache Kylin: Hadoop OLAP Engine, 2014 DecApache Kylin: Hadoop OLAP Engine, 2014 Dec
Apache Kylin: Hadoop OLAP Engine, 2014 Dec
 
Apache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBaseApache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBase
 
Apache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 BeijingApache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 Beijing
 
Apache Kylin 1.5 Updates
Apache Kylin 1.5 UpdatesApache Kylin 1.5 Updates
Apache Kylin 1.5 Updates
 
Apache Kylin Streaming
Apache Kylin Streaming Apache Kylin Streaming
Apache Kylin Streaming
 
Kylin olap part 1- getting started
Kylin olap   part 1- getting startedKylin olap   part 1- getting started
Kylin olap part 1- getting started
 
Apache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and JapanApache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and Japan
 
Big Data MDX with Mondrian and Apache Kylin
Big Data MDX with Mondrian and Apache KylinBig Data MDX with Mondrian and Apache Kylin
Big Data MDX with Mondrian and Apache Kylin
 
Apache Kylin - Balance between space and time - Hadoop Summit 2015
Apache Kylin -  Balance between space and time - Hadoop Summit 2015Apache Kylin -  Balance between space and time - Hadoop Summit 2015
Apache Kylin - Balance between space and time - Hadoop Summit 2015
 
Apache Kylin – Cubes on Hadoop
Apache Kylin – Cubes on HadoopApache Kylin – Cubes on Hadoop
Apache Kylin – Cubes on Hadoop
 
Apache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouseApache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouse
 
Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015
 
Apache Kylin on HBase: Extreme OLAP engine for big data
Apache Kylin on HBase: Extreme OLAP engine for big dataApache Kylin on HBase: Extreme OLAP engine for big data
Apache Kylin on HBase: Extreme OLAP engine for big data
 
The Evolution of Apache Kylin
The Evolution of Apache KylinThe Evolution of Apache Kylin
The Evolution of Apache Kylin
 
Design cube in Apache Kylin
Design cube in Apache KylinDesign cube in Apache Kylin
Design cube in Apache Kylin
 
Kylin Engineering Principles
Kylin Engineering PrinciplesKylin Engineering Principles
Kylin Engineering Principles
 
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
 
ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015
 

En vedette

5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @ShanghaiLuke Han
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveXu Jiang
 
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @ShanghaiLuke Han
 
[Dec 1 meetup] upgrading microservices
[Dec 1 meetup] upgrading microservices[Dec 1 meetup] upgrading microservices
[Dec 1 meetup] upgrading microservicesMadhuri Yechuri
 
OLAP with Cassandra and Spark
OLAP with Cassandra and SparkOLAP with Cassandra and Spark
OLAP with Cassandra and SparkEvan Chan
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerMichael Spector
 
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for HadoopHBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for HadoopHBaseCon
 
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and DruidPulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and DruidTony Ng
 
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFiTaking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFiBryan Bende
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraVictor Coustenoble
 
OLAP options on Hadoop
OLAP options on HadoopOLAP options on Hadoop
OLAP options on HadoopYuta Imai
 
Cassandra techniques de modelisation avancee
Cassandra techniques de modelisation avanceeCassandra techniques de modelisation avancee
Cassandra techniques de modelisation avanceeDuyhai Doan
 
Architecting a Next Generation Data Platform
Architecting a Next Generation Data PlatformArchitecting a Next Generation Data Platform
Architecting a Next Generation Data Platformhadooparchbook
 
Dispatches From the New Economy: The On-Demand Workforce
Dispatches From the New Economy: The On-Demand WorkforceDispatches From the New Economy: The On-Demand Workforce
Dispatches From the New Economy: The On-Demand WorkforceIntuit Inc.
 

En vedette (16)

5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
 
[Dec 1 meetup] upgrading microservices
[Dec 1 meetup] upgrading microservices[Dec 1 meetup] upgrading microservices
[Dec 1 meetup] upgrading microservices
 
OLAP with Cassandra and Spark
OLAP with Cassandra and SparkOLAP with Cassandra and Spark
OLAP with Cassandra and Spark
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at Appsflyer
 
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for HadoopHBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
 
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and DruidPulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
 
Building a Smarter Home with Apache NiFi and Spark
Building a Smarter Home with Apache NiFi and SparkBuilding a Smarter Home with Apache NiFi and Spark
Building a Smarter Home with Apache NiFi and Spark
 
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFiTaking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
 
OLAP options on Hadoop
OLAP options on HadoopOLAP options on Hadoop
OLAP options on Hadoop
 
Cassandra techniques de modelisation avancee
Cassandra techniques de modelisation avanceeCassandra techniques de modelisation avancee
Cassandra techniques de modelisation avancee
 
Introduction to Microservices
Introduction to MicroservicesIntroduction to Microservices
Introduction to Microservices
 
Architecting a Next Generation Data Platform
Architecting a Next Generation Data PlatformArchitecting a Next Generation Data Platform
Architecting a Next Generation Data Platform
 
Dispatches From the New Economy: The On-Demand Workforce
Dispatches From the New Economy: The On-Demand WorkforceDispatches From the New Economy: The On-Demand Workforce
Dispatches From the New Economy: The On-Demand Workforce
 

Similaire à 1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin Meetup @Shanghai

When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?DataWorks Summit
 
Apache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and TimeApache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and TimeDataWorks Summit
 
Accelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache KylinAccelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache KylinTyler Wishnoff
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeDatabricks
 
Apache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data SpainApache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data SpainLuke Han
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify StoryNeville Li
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalSub Szabolcs Feczak
 
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...Lucidworks
 
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng ShiDatabricks
 
Apache kylin boost your sqls on extremely large dataset
Apache kylin boost your sqls on extremely large datasetApache kylin boost your sqls on extremely large dataset
Apache kylin boost your sqls on extremely large datasetssuser931288
 
Apache kylin boost your SQLs on extremely large dataset
Apache kylin boost your SQLs on extremely large datasetApache kylin boost your SQLs on extremely large dataset
Apache kylin boost your SQLs on extremely large datasetChun'en Ni
 
HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
HBaseConAsia2018  Track2-2: Apache Kylin on HBase: Extreme OLAP for big dataHBaseConAsia2018  Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big dataMichael Stack
 
H-Hypermap Heatmap Analytics at Scale
H-Hypermap Heatmap Analytics at ScaleH-Hypermap Heatmap Analytics at Scale
H-Hypermap Heatmap Analytics at ScaleDavid Smiley
 
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...Lightbend
 
Building Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSIBuilding Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSILuke Han
 
Intro to Joyent's Manta Object Storage Service
Intro to Joyent's Manta Object Storage ServiceIntro to Joyent's Manta Object Storage Service
Intro to Joyent's Manta Object Storage ServiceRod Boothby
 
Ensuring Quality in Data Lakes (D&D Meetup Feb 22)
Ensuring Quality in Data Lakes  (D&D Meetup Feb 22)Ensuring Quality in Data Lakes  (D&D Meetup Feb 22)
Ensuring Quality in Data Lakes (D&D Meetup Feb 22)lakeFS
 

Similaire à 1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin Meetup @Shanghai (18)

When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?
 
Apache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and TimeApache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and Time
 
Accelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache KylinAccelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache Kylin
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data Lake
 
Apache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data SpainApache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data Spain
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify Story
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - final
 
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
 
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 
Apache kylin boost your sqls on extremely large dataset
Apache kylin boost your sqls on extremely large datasetApache kylin boost your sqls on extremely large dataset
Apache kylin boost your sqls on extremely large dataset
 
Apache kylin boost your SQLs on extremely large dataset
Apache kylin boost your SQLs on extremely large datasetApache kylin boost your SQLs on extremely large dataset
Apache kylin boost your SQLs on extremely large dataset
 
HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
HBaseConAsia2018  Track2-2: Apache Kylin on HBase: Extreme OLAP for big dataHBaseConAsia2018  Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
 
H-Hypermap Heatmap Analytics at Scale
H-Hypermap Heatmap Analytics at ScaleH-Hypermap Heatmap Analytics at Scale
H-Hypermap Heatmap Analytics at Scale
 
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
 
Building Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSIBuilding Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSI
 
Data Science
Data ScienceData Science
Data Science
 
Intro to Joyent's Manta Object Storage Service
Intro to Joyent's Manta Object Storage ServiceIntro to Joyent's Manta Object Storage Service
Intro to Joyent's Manta Object Storage Service
 
Ensuring Quality in Data Lakes (D&D Meetup Feb 22)
Ensuring Quality in Data Lakes  (D&D Meetup Feb 22)Ensuring Quality in Data Lakes  (D&D Meetup Feb 22)
Ensuring Quality in Data Lakes (D&D Meetup Feb 22)
 

Dernier

Watermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesWatermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesShyamsundar Das
 
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies
 
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Jaydeep Chhasatia
 
Growing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesGrowing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesSoftwareMill
 
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsYour Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsJaydeep Chhasatia
 
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorOpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorShane Coughlan
 
Fields in Java and Kotlin and what to expect.pptx
Fields in Java and Kotlin and what to expect.pptxFields in Java and Kotlin and what to expect.pptx
Fields in Java and Kotlin and what to expect.pptxJoão Esperancinha
 
Generative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilGenerative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilVICTOR MAESTRE RAMIREZ
 
Enterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze IncEnterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze Incrobinwilliams8624
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxAutus Cyber Tech
 
Why Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfWhy Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfBrain Inventory
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 
Streamlining Your Application Builds with Cloud Native Buildpacks
Streamlining Your Application Builds  with Cloud Native BuildpacksStreamlining Your Application Builds  with Cloud Native Buildpacks
Streamlining Your Application Builds with Cloud Native BuildpacksVish Abrams
 
Webinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.pptWebinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.pptkinjal48
 
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine HarmonyLeveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmonyelliciumsolutionspun
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?AmeliaSmith90
 
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...OnePlan Solutions
 
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionseAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionsNirav Modi
 
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfTobias Schneck
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLAlluxio, Inc.
 

Dernier (20)

Watermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesWatermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security Challenges
 
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in Trivandrum
 
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
 
Growing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesGrowing Oxen: channel operators and retries
Growing Oxen: channel operators and retries
 
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsYour Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
 
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorOpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS Calculator
 
Fields in Java and Kotlin and what to expect.pptx
Fields in Java and Kotlin and what to expect.pptxFields in Java and Kotlin and what to expect.pptx
Fields in Java and Kotlin and what to expect.pptx
 
Generative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilGenerative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-Council
 
Enterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze IncEnterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze Inc
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptx
 
Why Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfWhy Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdf
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Streamlining Your Application Builds with Cloud Native Buildpacks
Streamlining Your Application Builds  with Cloud Native BuildpacksStreamlining Your Application Builds  with Cloud Native Buildpacks
Streamlining Your Application Builds with Cloud Native Buildpacks
 
Webinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.pptWebinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.ppt
 
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine HarmonyLeveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?
 
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
 
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionseAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspections
 
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 

1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin Meetup @Shanghai

  • 1. http://kylin.io Apache Kylin Deep Dive Streaming & Plugin Architecture Oct 10, 2015 |@ApacheKylin Yang Li Architect & Tech Leader | yangli9@ebay.com
  • 2. http://kylin.io Agenda n What’s  Apache  Kylin? n Plugin  Architecture n Fast  Cubing n Streaming  Cubing n Summary
  • 3. http://kylin.io Extreme  OLAP Engine  for  Big  Data Apache  Kylin is  an  open  source  Distributed  Analytics  Engine  designed  to   provide  SQL  interface  and  multi-­‐dimensional  analysis  (OLAP)  on   Hadoop  supporting  extremely  large  datasets,  original  contributed  from   eBay  Inc. What’s  Kylin kylin /  ˈkiːˈlɪn /  麒麟 -­-­n.  (in  Chinese  art)  a  mythical  animal  of  composite  form   • Open  Sourced  on  Oct  1st,  2014 • Accepted  as  Apache  Incubator  Project  on  Nov  25th,  2014
  • 4. http://kylin.io $28B GMV VIA MOBILE (2014) 266M MOBILE APP GLOBALLY 1B LISTINGS CREATED VIA MOBILE 157M ACTIVE BUYERS 25M ACTIVE SELLERS 800M ACTIVE LISTINGS 8.8M NEW LISTINGS EVERY WEEK Big  Data  @  eBay
  • 5. http://kylin.io n eBay n Adoptions n Baidu  Map,  China  Mobile,  明略数据,  京东,  美团,  唯品会… n Expedia,  Microsoft,  Tableau,  Infoworks.io… Feature  – Big  Data Case Cube   Size Raw  Records Session  Analysis 20  TB 81+  billion  rows Traffic  Analysis 30  TB 28+  billion  rows Transaction  Analysis 560  GB 1.2+  billion  rows
  • 7. http://kylin.io Feature  – BI  Integration  via  ODBC,  JDBC
  • 8. http://kylin.io Feature  – Low  Latency 90%  queries  <5s Dark-­blue  line:  90%tile  queries Light-­blue  line:  95%tile  queries 90%  query  returns  in  3  seconds
  • 9. http://kylin.io Feature  – Scalable  Throughput Linear  scale  out  with  more  nodes
  • 10. http://kylin.io n A  query  may  consider  only  3  dimensions How  it  works  – Materialized  View
  • 11. http://kylin.io n Base  vs.  aggregate  cells;  ancestor  vs.  descendant  cells;  parent  vs.  child  cells 1. (9/15,  milk,  Urbana,  Dairy_land)    -­‐ <time, item, location, supplier> 2. (9/15,  milk,  Urbana,  *)    -­‐ <time, item, location> 3. (*,  milk,  Urbana,  *)    -­‐ <item, location> 4. (*,  milk,  Chicago,  *)  -­‐ <item, location> 5. (*,  milk,  *,  *)    -­‐ <item> How  it  works  – OLAP  Cube,  space  for  time • Cuboid  =  one  combination  of  dimensions • Cube  =  all  combination  of  dimensions    (all  cuboids) time, item time, item, location time, item, location, supplier time item location supplier time, location Time, supplier item, location item, supplier location, supplier time, item, supplier time, location, supplier item, location, supplier 0-D(apex) cuboid 1-D cuboids 2-D cuboids 3-D cuboids 4-D(base) cuboid
  • 12. http://kylin.io Agenda n What’s  Apache  Kylin? n Plugin  Architecture n Fast  Cubing n Streaming  Cubing n Summary
  • 13. http://kylin.io Kylin  Architecture  Overview 13 Cube Builder (MapReduce…) SQL Low    Latency  -­‐ SecondsRouting 3rd  Party  App (Web  App,  Mobile…) Metadata SQL-­‐Based  Tool (BI  Tools:  Tableau…) Query  Engine Hadoop Hive REST  API JDBC/ODBC Ø Online  Analysis  Data  Flow Ø Offline  Data  Flow Ø Clients/Users   interactive   with   Kylin  via  SQL Ø OLAP  Cube  is  transparent   to   users Star  Schema  Data Key  Value  Data Data   Cube OLAP Cubes (HBase) SQL REST  Server Data  Source   Abstraction   Engine   Abstraction   Storage Abstraction  
  • 16. http://kylin.io n Engine n MR  V1 n MR  V2 n Spark n Streaming n Source n Hive n Kafka n Spark  SQL  &  DataFrames n Storage n HBase n ?  Kudu  (Cloudera) n ?  Cassandra 2.x  Developing  Modules
  • 17. http://kylin.io n The  freedom n Zoo  break,  not  bound  to  Hadoop  any  more n Free  to  go  to  a  better  engine  or  storage n Extensibility n Accept  any  input,  e.g.  Kafka n Embrace  next-­‐gen  distributed  platform,  e.g.  Spark n Flexibility n Choose  different  engine  for  different  data  set The  Freedom,  Extensibility,  Flexibility
  • 18. http://kylin.io Agenda n What’s  Apache  Kylin? n Plugin  Architecture n Fast  Cubing n Streaming  Cubing n Summary
  • 19. http://kylin.io Layered  Cubing  (MR  Engine  V1) Full  Data 0-­‐D  Cuboid 1-­‐D  Cuboid 2-­‐D  Cuboid 3-­‐D  Cuboid 4-­‐D  Cuboid MR MR MR MR MR A,B,C,D A,B,C A,B,D A,C,D B,C,D
  • 20. http://kylin.io n Pros n Simple  implementation,  depends  on  MR  shuffle  to   merge  sort  and  then  aggregate n Little  requirement  on  memory n Cons n Aggregation  happens  at  reducer  side n Mapper  outputs  raw  data  thus  shuffle  is  huge n Multiple  rounds  of  MR  overhead n Shuffle  can  be  100x  of  cube  size,  big  I/O  pressure Layered  Cubing (MR  Engine  V1)
  • 21. http://kylin.io Fast  Cubing  (MR  Engine  V2) Data  Split Cube  Segment Data  Split Cube  Segment Data  Split Cube  Segment …… Final  Cube Merge  Sort (Shuffle) mapper mapper mapper reducer
  • 22. http://kylin.io n One  round  MR  calculates  the  whole  cube n Minimize  scheduling  overhead n Aggregation  happens  at  mapper  side n 1M  raw  records  becomes  10K  at  base  level n Reduced  shuffles  size,  20x  total  cube  size n Memory  eater Fast  Cubing  (MR  Engine  V2)
  • 23. http://kylin.io n A  simplified  star  cubing  algorithm n Xin,  Dong,  et  al.  "Star-­‐cubing:  Computing  iceberg  cubes  by  top-­‐down  and  bottom-­‐up  integration." Proceedings  of   the  29th  international  conference  on  Very  large  data  bases-­‐Volume  29.  VLDB  Endowment,  2003. n Top-­‐down;  Free  resource  on  branch  complete n Multi-­‐threading  if  mem  available;  Ordered  output In-­‐Mem  Cubing
  • 24. http://kylin.io n Pros n Lesser  network  pressure n Independent  cubing  algorithm  that  can  be   reused  by  Streaming,  Spark  etc. n Seems  30%-­‐50%  faster n Cons n Code  complexity n High  mapper  CPU/Mem  consumption Fast  Cubing  Summary
  • 25. http://kylin.io Comparison  on  ~500  GB  cubes Fast  cubing  is  30%  -­ 50%  faster 0 20 40 60 80 100 120 Case  1 Case  2 Layered  Cubing Fast  Cubing
  • 26. http://kylin.io Agenda n What’s  Apache  Kylin? n Plugin  Architecture n Fast  Cubing n Streaming  Cubing n Summary
  • 28. http://kylin.io n Do  micro  batch  at  minutes  interval n Source  data  from  streaming  input n Fast  cubing Xin,  Dong,  et  al.  "Star-­‐cubing:  Computing  iceberg  cubes  by  top-­‐down   and  bottom-­‐up  integration."Proceedings  of  the  29th  international   conference  on  Very  large  data  bases-­‐Volume  29.  VLDB  Endowment,   2003. n Cube  auto  merge  and  garbage  collection Push  the  Idea  to  Near  Realtime
  • 29. http://kylin.io Fast  Cubing Streaming  Setup Kafka Kafka  Adapter HBase Adapter HBase Streaming  Cube
  • 31. http://kylin.io Cube  Auto  Merge In-­‐Memory   Cube    Building Auto  Cube   Merge  with  MR
  • 32. http://kylin.io Use  Case:  SEO  Operational  Dashboard • eBay  Site – ebay.com,  ebay.co.uk,  ebay.de • Buyer  Country – US,  CN,  RU • Search  Engine   – Google,   Bing,  Yahoo! • Referrer – google.com,  google.co.uk • Page – Search,  View  Item,  Product • User  Experience – Desktop,  Mobile  APP,  mWeb • Visits, GMB $, GMB share, conversion rate, bounce rate, # of view items, # of bought items etc. Dimensions Measurements
  • 33. http://kylin.io Future  Lambda  Architecture  for  Realtime Cube  StorageReal-­‐time  In-­‐Mem  Store streaming Kafka SQL  Query minute  batch Latest  second Inverted   Index Hybrid  Storage   Interface Cube
  • 34. http://kylin.io DT,LOC TopN 2015-­‐10-­‐1,CN Item  A, $500 Item  B,  $300 … TopN Support select dt,  loc,  item,  sum(gmv) from test_kylin_fact where dt=‘2015-­‐10-­‐1’  and loc=‘CN’ group  by dt,  loc,  item order  by 2  desc limit 100 cube  pre-­‐calculation n TopN as  a  measure n Answer  TopN queries  directly  from  pre-­‐calculation n Approximate  algorithm n SpaceSaving TopN n Ahmed  Metwally,  et  al.  “Efficient  computation  of  frequent  and  top-­‐k  elements  in  data  streams”.  Proceeding  ICDT'05   Proceedings  of  the  10th  international  conference  on  Database  Theory,  2005. n A  parallel  version n Massimo  Cafaro,  et  al.  “A  parallel  space  saving  algorithm  for  frequent  items  and  the  Hurwitz  zeta  distribution”.   Proceeding  arXiv:  1401.0702v12  [cs.DS]  19  Setp 2015.
  • 35. http://kylin.io Agenda n What’s  Apache  Kylin? n Plugin  Architecture n Fast  Cubing n Streaming  Cubing n Summary
  • 36. http://kylin.io n Coming  soon… n Plugin  Architecture n Replaceable  engine,  storage,  source n Fast  Cubing n 30%-­‐50%  faster n Streaming  Cubing n Support  NRT  analysis n Lightening  fast  TopN New  features  in  2.x
  • 37. http://kylin.io n Kylin Site: n http://kylin.io n Twitter/微博: n @ApacheKylin n 微信公众号 n ApacheKylin We  are  hiring