Refactoring your EDW with Mobile Analytics Products

Refactoring your EDW
With Mobile Analytics Products
Zhi Zhu @CCB FinTech
Luke Han @Kyligence

Strata New York 2018
EDW core data
1PB
Incremental data
4TB/DAY
On-line data storage
5PB
>600M customers
>2,000M accounts
Big data – How big is big?
CCB - 2nd biggest bank in China.
About China Construction Bank (CCB)

Tactical
Decision
Makers
General Business
Users
Strategic Decision
Makers
Operational Decision
Makers
Headquarters
Source Systems
ALS
CLPM
CCMI
S
EDW
Teradata 5450
(6 nodes), 18T
ERPF
CCBS
SMIS
Material DSS Database
OCR
M
…
Cube
CMIS
CMIS
CCD
A
…1104
Operational
Data Storage
ODS
Historic
al data
Branches
Source Systems
100+
reports
100+ Users
1st Generation EDW (2004)

Dining Room
Readily Accessible to End Users
(and BI Developers)
Safe, Hospitable Environment
Data Assets “Ready for Primetime”
Dimensionally Structured
Kitchen
Off Limits to End Users
Data Professionals Only Please
Dangerous / Inhospitable Environment
”Data Assets “Not Ready for Primetime”
Structured Variably For Data Processing
Dimensional Semantic Layer
Dimensional Tier
[Physical or Virtual (CIF or Data Vault)］
(Virtual or Physical)
Un/Semi-Structured Data Movement
Un/Semi-Structured Source Data
Persistent
Un/Semi-
Structured
Staging Area
Unstructured ->
Structured Data
Discovery
Processing
Structured Data Movement
Structured Source Data
Persistent Structured Data
Repository
Insight
Generation /
Data Mining
Big Data Blueprint (2012)

Tactical
Decision
Makers
General Business
Users
Strategic Decision
Makers
Operational Decision
Makers
Presentation Layer
Headquarters
Source System
ALS
CLPM
CCMIS
EDW
Teradata 6650
(10+10 nodes), 600T
`Big Data Analytics Platform
Hadoop
Legacy Data Marts
OCRM…
1000+
Cube
CMIS
Historical
Data
SOR
MPP DB
ERPF
CCBS CCDA…1104
Operational
Data Storage
Branch
ODSB
Performance
Marketing
EDW
Teradata 2750
(32 nodes), 750T
Branches
Source Systems
25,000+
reports
2,000+
Data
Mining
Theme
s
SQL Translation between different databases is a big lesson.
100,000+ Users
User Experience Challenges：
Data latency
High-performance
EDW Challenges：
System I/O
Maintanence and data lineage
Big Data Transformation (2016)

100,000+ users1,200+ million records
PB-level data storageMillisecond-level responding
Metrics can be published by sub-organizations,
and be subscribed by end-user touching
Intelligent Eyes(1st version, Sept 2016)
Mobile product brought an opportunity

Benefits
TCO
 Teradata no longer
increased
 Cost of unit storage ↓ 66%
 Delivery cycle time ↓ from
6 months to 1 months
1 Performance
 Mobile users ↑ from 0 to
100,000+;
 Active PC users ↓ 90%；
 Page view (PV) up to
1,000,000 daily
 Real-time applications
emerged
 Data latency ↓
from 48 hours to 7
hours
 Millisecond-level
responding.
2 User
Experience
 Access data anywhere and
anytime
 25000+ reports ↓
to 5000 and 800
mobile data
metrics
 Eliminating vertical
shaft data
problems
3

How to re-engineering legacy EDW to Data Lake
• Discover users’ values by collecting their usage
records.
• Enable end users to join the data game.
• Build data conformance bus on Hive.
• Rebuild Analytics layer by Apache Kylin.
• Testing driven development.

AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP
L2 Cache Oracle Database
DATA MARTS
TEST/
DEV
ANALYTICAL
ARCHIVE
CAPTURE | STORE | REFINE
MDX RESTFUL
SERVICE
DATA LAB
INDEPENDENT
DATA MART
DUAL
SYSTEMS
TD 66XX
TD 2700
L2 Cache HBase
GP
L1 Cache Redis
ETL
EDW has evolved to Data Ecosystem

About Apache Kylin
• Leading Open Source OLAP for Big Data
• Open source by eBay in 2014
• Graduated to Apache Top Project in 2015
• 1000+ Adoptions world wild
• 2015 InfoWorld Bossie Awards
• 2016 InfoWorld Bossie Awards

Presentation
Visualization
Data
Lake
Data
Source
o Too many options
o Low performance
o Long learning curve
o Compatibility issue
o Technology vs Data
OLAP: The Missing Part of Big Data
Hive Impala Spark
SQL
Drill
MapReduce …Spark

Presentation
Visualization
Data
Lake
Data
Source
o SQL Acceleration for Big Data
o Semantic Layer
o Speed up Analytics
o ANSI SQL Interface
o High Performance and High
Concurrency
Apache Kylin: Bring OLAP back to Big Data
OLAP
Data Mart
Hive Impala Spark SQL Drill
MapReduce …Spark

Kyligence=
Kylin + Intelligence

About Us
Kyligence = Kylin + Intelligence
- Kyligence is formed by the team who created Apache Kylin, leading open source OLAP for Big
Data. Kyligence provides an intelligent data warehouse built for data cognitive analytics at web
scale.
- Funding by leading VCs: Redpoint Ventures, Cisco, CBC Capital and Shunwei Capital, Eight
Roads Ventures (Fidelity International Arm)
- CRN Top 10 Big Data Startups 2018
© Kyligence Inc. 2018, Confidential.

Featured Customers
Trusted by Fortune 500
Lenovo
#226 of Fortune 500
OPPO
#4 Smart Phone Vendor
Global
Lufax
#1 Fintech in China
CPIC
#252 of Fortune 500
SAIC
#41 of Fortune 500
#47 of Fortune 500
Huawei
#83 of Fortune 500
Huatai Securities
Top Securities in China
Top 3 Telecom in China
McDonald’s
#436 Fortune 500
China UnionPay
#3 Payment Network
Data from Fortune Global 500 year 2017:
http://fortune.com/global500/list/
#33 of Fortune 500

Partners
Global Ecosystem
Microsoft Azure Partner
Amazon Web Service Technology Partner
Tableau Technology Partner
Cloudera Sliver Partner
MapR Converge Partner
Hortonworks Community Partner
Huawei Solution Partner

Evolution of Data Warehousing
Data Mart
Orders
Payments
Contacts
Products
Customers
Data Warehouse
Contacts
Orders
Payments
Products
Data
Warehouse
Data Lake
Contacts
Orders
Payments
Products
Data
Warehouse
Contacts
Orders
Payments
Products
Next GenerationCloud
Contacts
Orders
Payments
Products
Data
Warehouse
Products
Contacts
Orders
Payments ？

Traditional Data Warehousing
Enormous Manual Efforts and Repeated Work

Human
Intelligence
Intelligence and Automation
The future of Data Analytics
Artificial
IntelligenceVS

Historical Real time
Fusion of Historical &
Real-time Data
Fusion of
Local and Cloud
On-premises Cloud
EDW Data Lake
Fusion of
Traditional DW & Big Data
Fusional DW Architecture
Kyligence Enterprise
Product Screenshot

Augmented Analytics Platform
SQL
Query Log
Analytic
Behavior
Data
Schema
Data
Profile
ML-based
Discovery of
Analytic Pattern
Proprietary Data
Modeling
Automation
Self-directed
Storage Layer
Optimization
Intelligent
Query Push-
down & Routing
BI
Real-time
Analysis
Data-as-a-
Service
Local
Deployment
Cloud
Platform
Container
Data
Services

Kyligence Position in Big Data Ecosystem
Fill the gap between business and technology
powered by Apache Kylin
BI
Visualization
OLAP
Data Mart
Data Lake
Source
Data
HDFS YARN MapReduce Spark Kafka …Spark SQL
• Fusional
• Unified EDW & Data Lake
• Unified Realtime and Historical
• Unified On-Prem and Cloud
• Intelligent
• Machine Learning-augmented
modeling
• High Performance
• Sub-seconds query speed on
massive dataset
• High Concurrency
• Web-scale OLAP query

Evolution of Data Warehousing
Data Mart
Orders
Payments
Contacts
Products
Customers
Data Warehouse
Contacts
Orders
Payments
Products
Data
Warehouse
Data Lake
Contacts
Orders
Payments
Products
Data
Warehouse
Contacts
Orders
Payments
Products
Fusional &
Intelligent DW
Cloud
Contacts
Orders
Payments
Products
Data
Warehouse
Products
Contacts
Orders
Payments

Kyligence Cloud
Transforming Big Data Analytics to Cloud
Kyligence Cloud
ANSI SQL
Dashboard OLAP
Hadoop
Customer Cloud Account
client
cloud
Kyligence Enterprise Platform
streaming
Cluster Deploy
Account Management
Diagnosis &
Optimization
Queries & Reporting
cloud
storage
tables, logs, files
RDBMS
(metadata)
ANSI SQL
Cloud Data
Warehouse
Cluster Management

Kyligence Cloud
Transforming Big Data Analytics to Cloud
One-click
provisioning
Auto Scaling
High
Performance
Seamless
Integration
Intelligent
Ops
Deploy globally in 30
minutes
Scale cluster
automatically for
different workloads
Powered by Kyligence
Analytics Platform
Connect to cloud data
sources
Enterprise ODBC driver
for BI
Online diagnosis and
continuous
optimization
Speed Up OLAP analysis and mission-critical queries to interactive speed

SQL Acceleration for Big Data
Powered by Apache Kylin
ANSI SQL
Kyligence
Storage
Hadoop Platform
T-SQL Oracle SQL PostgreSQL
Ingestion SQL Pushdown
Impala
Query
Analytics

< 1s
DB
line_orders
buyer_accounts
seller_accounts
product_items
…
√
√
√
SQL SQL

Intelligent Cubing
ANSI SQL
Pushdown
For Ad-Hoc
Aggregation
& Index query
Solution
• Speed up SQL on Hadoop automatically
• Supports Hive, Impala, Spark SQL and more will
coming
• High performance and high concurrency OLAP
Benefits
• Unified analytics platform for aggregation and ad-hoc
query
• Self-services enables analysts without IT
SQL on
Hadoop

Powering Excel for Big Data
Extend big data analytics to every analysts desktop
Analyze Your Big Data LIVE with
Excel
MDX/ANSI SQL Interface
Self-service Big Data from On-
Perm to Cloud

LIVE
No data import is needed
Slice and dice your big data
Your Excel can fully leverage
Kyligence Cube capability

Anywhere
Desktop
Website
Mobile
Kyligence currently support Pin your Excel report to Power BI mobile

Kyligence Acceleration Solution for Greenplum
Build Cube
SQL
SQL Pushdown
~ minutes
Cube Access
~ sub-seconds
• Change data source connection
• Intelligently build cubes from
Greenplum
• Accelerate mission-critical
analytics
• Pushdown flexible queries to
Greenplum for data exploration

Kyligence Acceleration Solution for Greenplum
100x faster
SQL Pushdown to Greenplum: minutes latency，min duration > 20s
After acceleration：sub-seconds latency，max duration < 1s
Seamlessly migrated
Query Performance ~ 100x
Reporting rendering ~14x
Same Tableau reports
100x faster!

Streaming OLAP
for near real time

Streaming OLAP
Consume Streaming Data via
Kafka
MDX/ANSI SQL Interface
Batch & Streaming together
Data Source
HDFS
(Recent data)
Pushdown Cube Access
Build Cube
Loading
Processing
Kafka Topic
Monitor
Prediction
Alerts …
BI
MOLAP …
Cube
(Full history data)
Near Real-time
(On recent data)
Historical
(On full history data)

Refactoring your EDW with Mobile Analytics Products

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Refactoring your EDW with Mobile Analytics Products

Similaire à Refactoring your EDW with Mobile Analytics Products (20)

Plus de Luke Han

Plus de Luke Han (17)

Dernier

Dernier (20)

Refactoring your EDW with Mobile Analytics Products

Notes de l'éditeur