More Related Content Similar to Apache Kylin and Use Cases - 2018 Big Data Spain (20) Apache Kylin and Use Cases - 2018 Big Data Spain1. Apache Kylin & Use Cases
Luke Han | luke.han@kyligence.io
2018 Big Data Spain
2. Luke Han
• Co-founder & CEO at Kyligence
• Co-creator and PMC Chairof Apache Kylin
• Apache Software FoundationMember
• Microsoft RegionalDirector & MVP
• Former eBay Big Data Product Manager Lead
© Kyligence Inc. 2018.
About Luke Han
3. Kyligence = Kylin + Intelligence
- Kyligence is formed bythe team who created ApacheKylin, leading opensource OLAP for Big
Data. Kyligence provides an intelligent data warehouse built fordata cognitive analytics at web
scale.
- Funding by leading VCs:
- Redpoint Ventures, Cisco,
- CBC Capital and Shunwei Capital,
- Eight Roads Ventures (Fidelity International Arm)
- CRN Top 10 Big Data Startups 2018
© Kyligence Inc. 2018.
About Kyligence
5. © Kyligence Inc. 2018.
About Apache Kylin
• Leading Open Source OLAP for Big Data
• Open sourced by eBay in 2014
• Graduated to Apache Top Project in 2015
• 1000+ Adoptions world wild
• 2015 InfoWorld Bossie Awards
• 2016 InfoWorld Bossie Awards
6. © Kyligence Inc. 2018.
1000+ Global Users
Apache Kylin - Leading Open Source OLAP for Big Data
7. © Kyligence Inc. 2018.
Presentation
Visualization
Data
Lake
Data
Source
o Too many options
o Low performance
o Long learning curve
o Compatibility issue
o Technology vs Data
OLAP: The Missing Part of Big Data
Hive Impala Spark
SQL
Drill
MapReduce …Spark
8. © Kyligence Inc. 2018.
Presentation
Visualization
Data
Lake
Data
Source
o SQL Acceleration for Big Data
o Semantic Layer
o Speed up Analytics
o ANSI SQL Interface
o High Performance and High
Concurrency
Apache Kylin: Bring OLAP back to Big Data
OLAP
Data Mart
Hive Impala Spark SQL Drill
MapReduce …Spark
10. © Kyligence Inc. 2018.
OLAP and OLAP Cube
Online analytical processing, or OLAP,
is an approach to answering multi-
dimensional analytical (MDA) queries
swiftly in computing. – Wikipedia
Basic operations
– Roll-up
– Drill-down
– Slice and dice
– Pivot
OLAP cube is a data
structure optimized for
very quick data analysis.
11. © Kyligence Inc. 2018.
Cube: balance between space and time
OLAP Cube
--Key-Value
Multiple Dimensional Model
--Relational
Classification,
aggregation, and
sorting
12. © Kyligence Inc. 2018.
Apache Kylin Architecture Overview
Apache Kylin
Data Analyst, BI Tools, Web App…
SQL
Online calculation
Offline calculation
Scan & filter
Extract
Compute
Load
Optimize & Rewrite
13. © Kyligence Inc. 2018.
SQL execution plan without Cube
select
l_returnflag,
o_orderstatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price
from
v_lineitem
inner join
v_orders on l_orderkey = o_orderkey
where
l_shipdate <= '1998-09-16'
group by
l_returnflag,
o_orderstatus
order by
l_returnflag,
o_orderstatus;
Sample:Check the order return and order status relationship in a time range
Sort
Aggr.
Filter
Tables
O(N)
Join
No cube, all need online
calculations, CPU and IO
intensive, latency is
remarkable.
14. © Kyligence Inc. 2018.
SQL execution plan with Cube
Cube technology speed up query performance with pre-calculation
Sort
Cube
Filter
Sort
Aggr.
Filter
Tables
O(N)
Join
O(flag x status x days) = O(1)
Aggregated data
The table join
and aggregation
are completed
offline.
Directly from aggregated
data (cube) with index;
Much less CPU and IO.
Latency is small.
15. © Kyligence Inc. 2018.
ORDERS
CUSTOMER
SUPPLIER
PART
LINEITEM
PARTSUPP
NATION
REGION
Join
Join
Join
Join
Join
ORDERS
CUSTOMER
PART
LINEITEM
PARTSUPP
Join
Join
Join
All rights reserved ©Kyligence Inc.
http://kyligence.io
Multidimensional Schema
Apach Kylin supports Star-Schema, Snowflake-Schema
16. © Kyligence Inc. 2018.
Persistent the cube in HBase
Relational to Key Value store
17. © Kyligence Inc. 2018.
How to query the cube
Translate cube query into HBase table scan
– Columns, Group by → Cuboid ID
– Filters -> Scan Range (Row Key)
– Aggregations -> Measure Columns (Row Values)
Scan HBase table and translate HBase result into cube result
– HBase Result (key + value) -> Cube Result (dimensions +
measures)
No Hive touch, no MapReduce job in the query time
18. © Kyligence Inc. 2018.
High performance & High concurrency together
Sub-second latency on PB scale dataset
Star schema benchmark:
http://www.cs.umb.edu/~poneil/StarSchemaB.PDF
SQL Latency
Lower is better
Data VolumeScale
Lower is better
19. © Kyligence Inc. 2018.
Seamless integration with BI tools
From open source to commercial BI
21. © Kyligence Inc. 2018.
Apache Kylin Use Cases
Solution
• Behavior Analytics
• LogAnalysis
• Data Mart/DW
• Self-service Data Service
• Retail Analytics
• Financial Asset
• Advertising Analytics
• Real-time Analytics
• Gaming Analysis
Apache Kylin fits various scenarios
1000+ adoptions all of the world
22. © Kyligence Inc. 2018.
Use Case – Insight on Trillion Data
Top1 news feed app in China
23. © Kyligence Inc. 2018.
Use Case: PB-level Analytics Platform
Cube Storage: 971TB (almost PB)
Cube numbers: 973 Cube
Data Records: 8.9 Trillion rows
90%ile latency: <1.2s
Frequency: 3.8 million queries / day
Top O2O services provider in China
Supporting all critical business
lines including E-Takeaways,
Hotel, Movie, LBS, Tickets…
Latest updated -201808
24. © Kyligence Inc. 2018.
Use Case: Online Shopping Reporting
https://techblog.yahoo.co.jp/oss/apache-kylin/
▪ Our reporting system used Impala as a backend database
previously.
- It took a long time (about 60 sec) to show Web UI.
▪ In order to lower the latency, we moved to Apache Kylin.
- Average latency < 1sec for most cases
▪ Thanks to low latency with Kylin, we become possible
to focus on adding functions for users.
▪ We provide a reporting system that show statistics
for store owners.
- e. g. impressions, clicks and sales.
The most visited website in Japan
Yahoo! Japan
25. © Kyligence Inc. 2018.
Use Case: Data Factory for Business
• Serving 18 business lines as the engine for mi’s“data factory”
• Daily incremental 17 billion
• 95% queries < 500ms.
Leading smart phone and smart device manufacture
26. © Kyligence Inc. 2018.
The data platform based on Apache Kylin solved the problem of massive user
queries excellently.
-- Chase Zhang, Data Platform Engineer of Strikingly
Performance
• Use Apache Kylin to speedup analytics
with Keen.io, and support high
concurrency
Containerizing
• Apache Kylin runs on AWS ECS
Integration
• Developed a scheduler systemto
manage all kinds of jobs
Use Case – Website traffic Analytics
A company to provide convenient and one stop website building solutions.
28. © Kyligence Inc. 2018.
Apache Kylin Roadmap
• New storage support
–Parquet
• Real-time support
• Containerization
From the community
32. Augmented Analytics Platform
SQL
Query Log
Analytic
Behavior
Data
Schema
Data
Profile
ML-based
Discoveryof
Analytic Pattern
ProprietaryData
Modeling
Automation
Self-directed
Storage Layer
Optimization
Intelligent
QueryPush-
down &Routing
BI
Real-time
Analysis
Data-as-a-
Service
Local
Deployment
Cloud
Platform
Container
Data
Services
© Kyligence Inc. 2018, Confidential.
34. © Kyligence Inc. 2018.
Kyligence Cloud
Transforming Big Data Analytics to Cloud
Kyligence Cloud
ANSI SQL
Dashboard OLAP
Hadoop
Customer Cloud Account
client
cloud
Kyligence Enterprise Platform
streaming
Cluster Deploy
Account Management
Diagnosis &
Optimization
Queries & Reporting
cloud
storage
tables, logs, files
RDBMS
(metadata)
ANSI SQL
Cloud Data
Warehouse
Cluster Management
35. © Kyligence Inc. 2018.
Kyligence Cloud
Available: AWS, Azure, Google Cloud, Alibaba Cloud , Huawei Cloud
One-click
provisioning
Auto Scaling
High
Performance
Seamless
Integration
Intelligent
Ops
Deploy globally in 30
minutes
Scale cluster
automatically for
different workloads
Powered by Kyligence
Analytics Platform
Connect to cloud data
sources
Enterprise ODBC driver
for BI
Online diagnosis and
continuous
optimization
Speed Upmission-critical analytics in the cloud
36. © Kyligence Inc. 2018.
Use Case : Replaced IBM Cognos
1 Kyligence cube replaced 800+ IBM Cognos cubes
PB level (300B records)
big data warehouse of both
self-service aggregation
query and raw data query by
business analysts
Self-Service
Big Data Warehouse
Efficient
IT Operation
Significantly increase IT
operation efficiency
as 1 Kyligence cube
replacing 800 Cognos
cubes with unified data
access management
Kyligence scale-out
architecture provide best
flexibility for IT infrastructure
when faced with increasing
analytics and concurrency
demands
Better flexibility
of Architecture
Support analysis on high
granularity dimensions such
as Merchant (10M
cardinality) and Card (10B
cardinality)
Merchant or Card
Multi-dimensional Analytics
37. © Kyligence Inc. 2018.
Use Case: Customer 360 for FMCG
Azure + Kyligence
➢ 360 degree view of user profile.
➢ Powering analysts insight into
data without IT
➢ HDInsight + Kyligence + Power BI
38. © Kyligence Inc. 2018.
Global Partners
Kyligence Open Ecosystem
Microsoft Azure Partner
AWS Technology Partner
Tableau Technology Partner
Cloudera Sliver Partner
MapR Converge Partner
Hortonworks Community Partner
Huawei Solution Partner