SlideShare une entreprise Scribd logo
1  sur  48
Refactoring your EDW
With Mobile Analytics Products
Zhi Zhu @CCB FinTech
Luke Han @Kyligence
Strata New York 2018
EDW core data
1PB
Incremental data
4TB/DAY
On-line data storage
5PB
>600M customers
>2,000M accounts
Big data – How big is big?
CCB - 2nd biggest bank in China.
About China Construction Bank (CCB)
Strata New York 2018
Tactical
Decision
Makers
General Business
Users
Strategic Decision
Makers
Operational Decision
Makers
Headquarters
Source Systems
ALS
CLPM
CCMI
S
EDW
Teradata 5450
(6 nodes), 18T
ERPF
CCBS
SMIS
Material DSS Database
OCR
M
…
Cube
CMIS
CMIS
CCD
A
…1104
Operational
Data Storage
ODS
Historic
al data
Branches
Source Systems
100+
reports
100+ Users
1st Generation EDW (2004)
Strata New York 2018
Dining Room
Readily Accessible to End Users
(and BI Developers)
Safe, Hospitable Environment
Data Assets “Ready for Primetime”
Dimensionally Structured
Kitchen
Off Limits to End Users
Data Professionals Only Please
Dangerous / Inhospitable Environment
”Data Assets “Not Ready for Primetime”
Structured Variably For Data Processing
Dimensional Semantic Layer
Dimensional Tier
[Physical or Virtual (CIF or Data Vault)]
(Virtual or Physical)
Un/Semi-Structured Data Movement
Un/Semi-Structured Source Data
Persistent
Un/Semi-
Structured
Staging Area
Unstructured ->
Structured Data
Discovery
Processing
Structured Data Movement
Structured Source Data
Persistent Structured Data
Repository
Insight
Generation /
Data Mining
Big Data Blueprint (2012)
Strata New York 2018
Tactical
Decision
Makers
General Business
Users
Strategic Decision
Makers
Operational Decision
Makers
Presentation Layer
Headquarters
Source System
ALS
CLPM
CCMIS
EDW
Teradata 6650
(10+10 nodes), 600T
`Big Data Analytics Platform
Hadoop
Legacy Data Marts
OCRM…
1000+
Cube
CMIS
Historical
Data
SOR
MPP DB
ERPF
CCBS CCDA…1104
Operational
Data Storage
Branch
ODSB
Performance
Marketing
EDW
Teradata 2750
(32 nodes), 750T
Branches
Source Systems
25,000+
reports
2,000+
Data
Mining
Theme
s
SQL Translation between different databases is a big lesson.
100,000+ Users
User Experience Challenges:
Data latency
High-performance
EDW Challenges:
System I/O
Maintanence and data lineage
Big Data Transformation (2016)
Strata New York 2018
Tactical
Decision
Makers
General Business
Users
Strategic Decision
Makers
Operational Decision
Makers
Presentation Layer
Headquarters
Source System
ALS
CLPM
CCMIS
EDW
Teradata 6650
(10+10 nodes), 600T
`Big Data Analytics Platform
Hadoop
Legacy Data Marts
OCRM…
1000+
Cube
CMIS
Historical
Data
SOR
MPP DB
ERPF
CCBS CCDA…1104
Operational
Data Storage
Branch
ODSB
Performance
Marketing
EDW
Teradata 2750
(32 nodes), 750T
Branches
Source Systems
25,000+
reports
2,000+
Data
Mining
Theme
s
SQL Translation between different databases is a big lesson.
100,000+ Users
User Experience Challenges:
Data latency
High-performance
EDW Challenges:
System I/O
Maintanence and data lineage
Big Data Transformation (2016)
Strata New York 2018
Tactical
Decision
Makers
General Business
Users
Strategic Decision
Makers
Operational Decision
Makers
Presentation Layer
Headquarters
Source System
ALS
CLPM
CCMIS
EDW
Teradata 6650
(10+10 nodes), 600T
`Big Data Analytics Platform
Hadoop
Legacy Data Marts
OCRM…
1000+
Cube
CMIS
Historical
Data
SOR
MPP DB
ERPF
CCBS CCDA…1104
Operational
Data Storage
Branch
ODSB
Performance
Marketing
EDW
Teradata 2750
(32 nodes), 750T
Branches
Source Systems
25,000+
reports
2,000+
Data
Mining
Theme
s
SQL Translation between different databases is a big lesson.
100,000+ Users
User Experience Challenges:
Data latency
High-performance
EDW Challenges:
System I/O
Maintanence and data lineage
Big Data Transformation (2016)
Strata New York 2018
Tactical
Decision
Makers
General Business
Users
Strategic Decision
Makers
Operational Decision
Makers
Presentation Layer
Headquarters
Source System
ALS
CLPM
CCMIS
EDW
Teradata 6650
(10+10 nodes), 600T
`Big Data Analytics Platform
Hadoop
Legacy Data Marts
OCRM…
1000+
Cube
CMIS
Historical
Data
SOR
MPP DB
ERPF
CCBS CCDA…1104
Operational
Data Storage
Branch
ODSB
Performance
Marketing
EDW
Teradata 2750
(32 nodes), 750T
Branches
Source Systems
25,000+
reports
2,000+
Data
Mining
Theme
s
SQL Translation between different databases is a big lesson.
100,000+ Users
User Experience Challenges:
Data latency
High-performance
EDW Challenges:
System I/O
Maintanence and data lineage
Big Data Transformation (2016)
Strata New York 2018
100,000+ users1,200+ million records
PB-level data storageMillisecond-level responding
Metrics can be published by sub-organizations,
and be subscribed by end-user touching
Intelligent Eyes(1st version, Sept 2016)
Mobile product brought an opportunity
Strata New York 2018
Benefits
TCO
 Teradata no longer
increased
 Cost of unit storage ↓ 66%
 Delivery cycle time ↓ from
6 months to 1 months
1 Performance
 Mobile users ↑ from 0 to
100,000+;
 Active PC users ↓ 90%;
 Page view (PV) up to
1,000,000 daily
 Real-time applications
emerged
 Data latency ↓
from 48 hours to 7
hours
 Millisecond-level
responding.
2 User
Experience
 Access data anywhere and
anytime
 25000+ reports ↓
to 5000 and 800
mobile data
metrics
 Eliminating vertical
shaft data
problems
3
Strata New York 2018
How to re-engineering legacy EDW to Data Lake
• Discover users’ values by collecting their usage
records.
• Enable end users to join the data game.
• Build data conformance bus on Hive.
• Rebuild Analytics layer by Apache Kylin.
• Testing driven development.
Strata New York 2018
AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP
L2 Cache Oracle Database
DATA MARTS
TEST/
DEV
ANALYTICAL
ARCHIVE
CAPTURE | STORE | REFINE
MDX RESTFUL
SERVICE
DATA LAB
INDEPENDENT
DATA MART
DUAL
SYSTEMS
TD 66XX
TD 2700
L2 Cache HBase
GP
L1 Cache Redis
ETL
EDW has evolved to Data Ecosystem
Strata New York 2018
AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP
L2 Cache Oracle Database
DATA MARTS
TEST/
DEV
ANALYTICAL
ARCHIVE
CAPTURE | STORE | REFINE
MDX RESTFUL
SERVICE
DATA LAB
INDEPENDENT
DATA MART
DUAL
SYSTEMS
TD 66XX
TD 2700
L2 Cache HBase
GP
L1 Cache Redis
ETL
EDW has evolved to Data Ecosystem
Strata New York 2018
AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP
L2 Cache Oracle Database
DATA MARTS
TEST/
DEV
ANALYTICAL
ARCHIVE
CAPTURE | STORE | REFINE
MDX RESTFUL
SERVICE
DATA LAB
INDEPENDENT
DATA MART
DUAL
SYSTEMS
TD 66XX
TD 2700
L2 Cache HBase
GP
L1 Cache Redis
ETL
EDW has evolved to Data Ecosystem
Apache Kylin
Strata New York 2018
About Apache Kylin
• Leading Open Source OLAP for Big Data
• Open source by eBay in 2014
• Graduated to Apache Top Project in 2015
• 1000+ Adoptions world wild
• 2015 InfoWorld Bossie Awards
• 2016 InfoWorld Bossie Awards
Strata New York 2018
Presentation
Visualization
Data
Lake
Data
Source
o Too many options
o Low performance
o Long learning curve
o Compatibility issue
o Technology vs Data
OLAP: The Missing Part of Big Data
Hive Impala Spark
SQL
Drill
MapReduce …Spark
Strata New York 2018
Presentation
Visualization
Data
Lake
Data
Source
o SQL Acceleration for Big Data
o Semantic Layer
o Speed up Analytics
o ANSI SQL Interface
o High Performance and High
Concurrency
Apache Kylin: Bring OLAP back to Big Data
OLAP
Data Mart
Hive Impala Spark SQL Drill
MapReduce …Spark
Kyligence=
Kylin + Intelligence
About Us
Kyligence = Kylin + Intelligence
- Kyligence is formed by the team who created Apache Kylin, leading open source OLAP for Big
Data. Kyligence provides an intelligent data warehouse built for data cognitive analytics at web
scale.
- Funding by leading VCs: Redpoint Ventures, Cisco, CBC Capital and Shunwei Capital, Eight
Roads Ventures (Fidelity International Arm)
- CRN Top 10 Big Data Startups 2018
© Kyligence Inc. 2018, Confidential.
Strata New York 2018
Featured Customers
Trusted by Fortune 500
Lenovo
#226 of Fortune 500
OPPO
#4 Smart Phone Vendor
Global
Lufax
#1 Fintech in China
CPIC
#252 of Fortune 500
SAIC
#41 of Fortune 500
#47 of Fortune 500
Huawei
#83 of Fortune 500
Huatai Securities
Top Securities in China
Top 3 Telecom in China
McDonald’s
#436 Fortune 500
China UnionPay
#3 Payment Network
Data from Fortune Global 500 year 2017:
http://fortune.com/global500/list/
#33 of Fortune 500
Strata New York 2018
Partners
Global Ecosystem
Microsoft Azure Partner
Amazon Web Service Technology Partner
Tableau Technology Partner
Cloudera Sliver Partner
MapR Converge Partner
Hortonworks Community Partner
Huawei Solution Partner
Evolution of Data Warehousing
Data Mart
Orders
Payments
Contacts
Products
Customers
Data Warehouse
Contacts
Orders
Payments
Products
Data
Warehouse
Data Lake
Contacts
Orders
Payments
Products
Data
Warehouse
Contacts
Orders
Payments
Products
Next GenerationCloud
Contacts
Orders
Payments
Products
Data
Warehouse
Products
Contacts
Orders
Payments ?
Traditional Data Warehousing
Enormous Manual Efforts and Repeated Work
© Kyligence Inc. 2018, Confidential.
Human
Intelligence
Intelligence and Automation
The future of Data Analytics
Artificial
IntelligenceVS
Historical Real time
Fusion of Historical &
Real-time Data
Fusion of
Local and Cloud
On-premises Cloud
EDW Data Lake
Fusion of
Traditional DW & Big Data
Fusional DW Architecture
Kyligence Enterprise
Product Screenshot
Intelligent DW Architecture
Augmented Analytics Platform
SQL
Query Log
Analytic
Behavior
Data
Schema
Data
Profile
ML-based
Discovery of
Analytic Pattern
Proprietary Data
Modeling
Automation
Self-directed
Storage Layer
Optimization
Intelligent
Query Push-
down & Routing
BI
Real-time
Analysis
Data-as-a-
Service
Local
Deployment
Cloud
Platform
Container
Data
Services
© Kyligence Inc. 2018, Confidential.
Strata New York 2018
Kyligence Position in Big Data Ecosystem
Fill the gap between business and technology
Kyligence Enterprise
powered by Apache Kylin
BI
Visualization
OLAP
Data Mart
Data Lake
Source
Data
HDFS YARN MapReduce Spark Kafka …Spark SQL
• Fusional
• Unified EDW & Data Lake
• Unified Realtime and Historical
• Unified On-Prem and Cloud
• Intelligent
• Machine Learning-augmented
modeling
• High Performance
• Sub-seconds query speed on
massive dataset
• High Concurrency
• Web-scale OLAP query
Evolution of Data Warehousing
Data Mart
Orders
Payments
Contacts
Products
Customers
Data Warehouse
Contacts
Orders
Payments
Products
Data
Warehouse
Data Lake
Contacts
Orders
Payments
Products
Data
Warehouse
Contacts
Orders
Payments
Products
Fusional &
Intelligent DW
Cloud
Contacts
Orders
Payments
Products
Data
Warehouse
Products
Contacts
Orders
Payments
Strata New York 2018
Kyligence Cloud
Transforming Big Data Analytics to Cloud
Kyligence Cloud
ANSI SQL
Dashboard OLAP
Hadoop
Customer Cloud Account
client
cloud
Kyligence Enterprise Platform
streaming
Cluster Deploy
Account Management
Diagnosis &
Optimization
Queries & Reporting
cloud
storage
tables, logs, files
RDBMS
(metadata)
ANSI SQL
Cloud Data
Warehouse
Cluster Management
Strata New York 2018
Kyligence Cloud
Transforming Big Data Analytics to Cloud
One-click
provisioning
Auto Scaling
High
Performance
Seamless
Integration
Intelligent
Ops
Deploy globally in 30
minutes
Scale cluster
automatically for
different workloads
Powered by Kyligence
Analytics Platform
Connect to cloud data
sources
Enterprise ODBC driver
for BI
Online diagnosis and
continuous
optimization
Speed Up OLAP analysis and mission-critical queries to interactive speed
Solutions
SQL Acceleration
for Big Data
Strata New York 2018
SQL Acceleration for Big Data
Kyligence Enterprise
Powered by Apache Kylin
ANSI SQL
Kyligence
Storage
Hadoop Platform
T-SQL Oracle SQL PostgreSQL
Ingestion SQL Pushdown
Impala
Query
Analytics
Strata New York 2018
SQL Acceleration for Big Data
< 1s
DB
line_orders
buyer_accounts
seller_accounts
product_items
…
√
√
√
SQL SQL
Strata New York 2018
SQL Acceleration for Big Data
Intelligent Cubing
Kyligence Enterprise
ANSI SQL
Pushdown
For Ad-Hoc
Aggregation
& Index query
Solution
• Speed up SQL on Hadoop automatically
• Supports Hive, Impala, Spark SQL and more will
coming
• High performance and high concurrency OLAP
Benefits
• Unified analytics platform for aggregation and ad-hoc
query
• Self-services enables analysts without IT
SQL on
Hadoop
Powering Excel
for Big Data
Strata New York 2018
Powering Excel for Big Data
Extend big data analytics to every analysts desktop
Analyze Your Big Data LIVE with
Excel
MDX/ANSI SQL Interface
Self-service Big Data from On-
Perm to Cloud
Strata New York 2018
LIVE
No data import is needed
Slice and dice your big data
Your Excel can fully leverage
Kyligence Cube capability
Strata New York 2018
LIVE
No data import is needed
Slice and dice your big data
Your Excel can fully leverage
Kyligence Cube capability
Strata New York 2018
Anywhere
Desktop
Website
Mobile
Kyligence currently support Pin your Excel report to Power BI mobile
Migrating EDW
to Data Lake
Strata New York 2018
Kyligence Acceleration Solution for Greenplum
Kyligence Enterprise
Build Cube
SQL
SQL Pushdown
~ minutes
Cube Access
~ sub-seconds
• Change data source connection
• Intelligently build cubes from
Greenplum
• Accelerate mission-critical
analytics
• Pushdown flexible queries to
Greenplum for data exploration
Strata New York 2018
Kyligence Acceleration Solution for Greenplum
100x faster
SQL Pushdown to Greenplum: minutes latency,min duration > 20s
After acceleration:sub-seconds latency,max duration < 1s
Seamlessly migrated
Query Performance ~ 100x
Reporting rendering ~14x
Same Tableau reports
100x faster!
Streaming OLAP
for near real time
Strata New York 2018
Streaming OLAP
Consume Streaming Data via
Kafka
MDX/ANSI SQL Interface
Batch & Streaming together
Data Source
HDFS
(Recent data)
Kyligence Enterprise
Pushdown Cube Access
Build Cube
Loading
Processing
Kafka Topic
Monitor
Prediction
Alerts …
BI
MOLAP …
Cube
(Full history data)
Near Real-time
(On recent data)
Historical
(On full history data)
Q & A
luke.han@kyligence.io

Contenu connexe

Tendances

How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?Jeraldine Phneah
 
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...Kai Wähner
 
Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...
Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...
Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...Big Data Spain
 
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)Spark Summit
 
Pipelining the Heroes with Kafka and Graph
Pipelining the Heroes with Kafka and GraphPipelining the Heroes with Kafka and Graph
Pipelining the Heroes with Kafka and Graphconfluent
 
Javaedge 2010-cschalk
Javaedge 2010-cschalkJavaedge 2010-cschalk
Javaedge 2010-cschalkChris Schalk
 
Data to Drive Decision-Making - CaliStream Meetup
Data to Drive Decision-Making - CaliStream MeetupData to Drive Decision-Making - CaliStream Meetup
Data to Drive Decision-Making - CaliStream MeetupJerome Boulon
 
Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...
Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...
Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...Codemotion
 
Apache Hadoop India Summit 2011 talk "Data Integration on Hadoop" by Sanjay K...
Apache Hadoop India Summit 2011 talk "Data Integration on Hadoop" by Sanjay K...Apache Hadoop India Summit 2011 talk "Data Integration on Hadoop" by Sanjay K...
Apache Hadoop India Summit 2011 talk "Data Integration on Hadoop" by Sanjay K...Yahoo Developer Network
 
2020 Big Data & Analytics Maturity Survey Results
2020 Big Data & Analytics Maturity Survey Results2020 Big Data & Analytics Maturity Survey Results
2020 Big Data & Analytics Maturity Survey ResultsCarole Gunst
 
Data Warehousing Patterns for Hadoop
Data Warehousing Patterns for HadoopData Warehousing Patterns for Hadoop
Data Warehousing Patterns for HadoopMichelle Ufford
 
Ingesting IoT data in Food Processing
Ingesting IoT data in Food ProcessingIngesting IoT data in Food Processing
Ingesting IoT data in Food Processingconfluent
 
Data Science and Enterprise Engineering with Michael Finger and Chris Robison
Data Science and Enterprise Engineering with Michael Finger and Chris RobisonData Science and Enterprise Engineering with Michael Finger and Chris Robison
Data Science and Enterprise Engineering with Michael Finger and Chris RobisonDatabricks
 
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...VoltDB
 
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...Amazon Web Services
 
Successful AI/ML Projects with End-to-End Cloud Data Engineering
Successful AI/ML Projects with End-to-End Cloud Data EngineeringSuccessful AI/ML Projects with End-to-End Cloud Data Engineering
Successful AI/ML Projects with End-to-End Cloud Data EngineeringDatabricks
 
QCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformQCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformDeepak Chandramouli
 
Analytics at the Speed of Thought: Actian Express Overview
Analytics at the Speed of Thought: Actian Express Overview Analytics at the Speed of Thought: Actian Express Overview
Analytics at the Speed of Thought: Actian Express Overview Actian Corporation
 

Tendances (20)

How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?
 
Spark meets Smart Meters
Spark meets Smart MetersSpark meets Smart Meters
Spark meets Smart Meters
 
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...
 
Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...
Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...
Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...
 
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
 
Pipelining the Heroes with Kafka and Graph
Pipelining the Heroes with Kafka and GraphPipelining the Heroes with Kafka and Graph
Pipelining the Heroes with Kafka and Graph
 
Javaedge 2010-cschalk
Javaedge 2010-cschalkJavaedge 2010-cschalk
Javaedge 2010-cschalk
 
Data to Drive Decision-Making - CaliStream Meetup
Data to Drive Decision-Making - CaliStream MeetupData to Drive Decision-Making - CaliStream Meetup
Data to Drive Decision-Making - CaliStream Meetup
 
Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...
Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...
Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...
 
Apache Hadoop India Summit 2011 talk "Data Integration on Hadoop" by Sanjay K...
Apache Hadoop India Summit 2011 talk "Data Integration on Hadoop" by Sanjay K...Apache Hadoop India Summit 2011 talk "Data Integration on Hadoop" by Sanjay K...
Apache Hadoop India Summit 2011 talk "Data Integration on Hadoop" by Sanjay K...
 
2020 Big Data & Analytics Maturity Survey Results
2020 Big Data & Analytics Maturity Survey Results2020 Big Data & Analytics Maturity Survey Results
2020 Big Data & Analytics Maturity Survey Results
 
Data Warehousing Patterns for Hadoop
Data Warehousing Patterns for HadoopData Warehousing Patterns for Hadoop
Data Warehousing Patterns for Hadoop
 
Ingesting IoT data in Food Processing
Ingesting IoT data in Food ProcessingIngesting IoT data in Food Processing
Ingesting IoT data in Food Processing
 
The API Lie
The API LieThe API Lie
The API Lie
 
Data Science and Enterprise Engineering with Michael Finger and Chris Robison
Data Science and Enterprise Engineering with Michael Finger and Chris RobisonData Science and Enterprise Engineering with Michael Finger and Chris Robison
Data Science and Enterprise Engineering with Michael Finger and Chris Robison
 
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
 
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
 
Successful AI/ML Projects with End-to-End Cloud Data Engineering
Successful AI/ML Projects with End-to-End Cloud Data EngineeringSuccessful AI/ML Projects with End-to-End Cloud Data Engineering
Successful AI/ML Projects with End-to-End Cloud Data Engineering
 
QCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformQCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic Platform
 
Analytics at the Speed of Thought: Actian Express Overview
Analytics at the Speed of Thought: Actian Express Overview Analytics at the Speed of Thought: Actian Express Overview
Analytics at the Speed of Thought: Actian Express Overview
 

Similaire à Refactoring your EDW with Mobile Analytics Products

Accelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsAccelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsArcadia Data
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks DeltaDatabricks
 
Dell Digital Transformation Through AI and Data Analytics Webinar
Dell Digital Transformation Through AI and  Data Analytics WebinarDell Digital Transformation Through AI and  Data Analytics Webinar
Dell Digital Transformation Through AI and Data Analytics WebinarBill Wong
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2Joe_F
 
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"MDS ap
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017SingleStore
 
Denodo Data Virtualization - IT Days in Luxembourg with Oktopus
Denodo Data Virtualization - IT Days in Luxembourg with OktopusDenodo Data Virtualization - IT Days in Luxembourg with Oktopus
Denodo Data Virtualization - IT Days in Luxembourg with OktopusDenodo
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Hortonworks
 
Digitalising the Core – How Analytics is Shaping the Energy Industry Daniel J...
Digitalising the Core – How Analytics is Shaping the Energy Industry Daniel J...Digitalising the Core – How Analytics is Shaping the Energy Industry Daniel J...
Digitalising the Core – How Analytics is Shaping the Energy Industry Daniel J...Spark Summit
 
The State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and BeyondThe State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and BeyondSingleStore
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016StampedeCon
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)Denodo
 
Customer migration to Azure SQL database, December 2019
Customer migration to Azure SQL database, December 2019Customer migration to Azure SQL database, December 2019
Customer migration to Azure SQL database, December 2019George Walters
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Denodo
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesDenodo
 
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?Denodo
 
Architecting a Modern Data Warehouse: Enterprise Must-Haves
Architecting a Modern Data Warehouse: Enterprise Must-HavesArchitecting a Modern Data Warehouse: Enterprise Must-Haves
Architecting a Modern Data Warehouse: Enterprise Must-HavesYellowbrick Data
 
Data architecture for modern enterprise
Data architecture for modern enterpriseData architecture for modern enterprise
Data architecture for modern enterprisekayalvizhi kandasamy
 

Similaire à Refactoring your EDW with Mobile Analytics Products (20)

Accelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsAccelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time Analytics
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Dell Digital Transformation Through AI and Data Analytics Webinar
Dell Digital Transformation Through AI and  Data Analytics WebinarDell Digital Transformation Through AI and  Data Analytics Webinar
Dell Digital Transformation Through AI and Data Analytics Webinar
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2
 
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
 
Denodo Data Virtualization - IT Days in Luxembourg with Oktopus
Denodo Data Virtualization - IT Days in Luxembourg with OktopusDenodo Data Virtualization - IT Days in Luxembourg with Oktopus
Denodo Data Virtualization - IT Days in Luxembourg with Oktopus
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
 
Digitalising the Core – How Analytics is Shaping the Energy Industry Daniel J...
Digitalising the Core – How Analytics is Shaping the Energy Industry Daniel J...Digitalising the Core – How Analytics is Shaping the Energy Industry Daniel J...
Digitalising the Core – How Analytics is Shaping the Energy Industry Daniel J...
 
The State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and BeyondThe State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and Beyond
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data Lake
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
Customer migration to Azure SQL database, December 2019
Customer migration to Azure SQL database, December 2019Customer migration to Azure SQL database, December 2019
Customer migration to Azure SQL database, December 2019
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & Bénéfices
 
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
 
Architecting a Modern Data Warehouse: Enterprise Must-Haves
Architecting a Modern Data Warehouse: Enterprise Must-HavesArchitecting a Modern Data Warehouse: Enterprise Must-Haves
Architecting a Modern Data Warehouse: Enterprise Must-Haves
 
Data architecture for modern enterprise
Data architecture for modern enterpriseData architecture for modern enterprise
Data architecture for modern enterprise
 

Plus de Luke Han

Augmented OLAP for Big Data
Augmented OLAP for Big DataAugmented OLAP for Big Data
Augmented OLAP for Big DataLuke Han
 
Apache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and JapanApache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and JapanLuke Han
 
The Apache Way - Building Open Source Community in China - Luke Han
The Apache Way - Building Open Source Community in China - Luke HanThe Apache Way - Building Open Source Community in China - Luke Han
The Apache Way - Building Open Source Community in China - Luke HanLuke Han
 
The Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke HanThe Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke HanLuke Han
 
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @ShanghaiLuke Han
 
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @ShanghaiLuke Han
 
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @ShanghaiLuke Han
 
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @ShanghaiLuke Han
 
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...Luke Han
 
Apache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 BeijingApache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 BeijingLuke Han
 
ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015Luke Han
 
Apache Kylin Extreme OLAP Engine for Big Data
Apache Kylin Extreme OLAP Engine for Big DataApache Kylin Extreme OLAP Engine for Big Data
Apache Kylin Extreme OLAP Engine for Big DataLuke Han
 
Apache Kylin Introduction
Apache Kylin IntroductionApache Kylin Introduction
Apache Kylin IntroductionLuke Han
 
Adding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark MeetupAdding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark MeetupLuke Han
 
Apache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 BeijingApache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 BeijingLuke Han
 
Kylin OLAP Engine Tour
Kylin OLAP Engine TourKylin OLAP Engine Tour
Kylin OLAP Engine TourLuke Han
 
Actuate presentation 2011
Actuate presentation   2011Actuate presentation   2011
Actuate presentation 2011Luke Han
 

Plus de Luke Han (17)

Augmented OLAP for Big Data
Augmented OLAP for Big DataAugmented OLAP for Big Data
Augmented OLAP for Big Data
 
Apache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and JapanApache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and Japan
 
The Apache Way - Building Open Source Community in China - Luke Han
The Apache Way - Building Open Source Community in China - Luke HanThe Apache Way - Building Open Source Community in China - Luke Han
The Apache Way - Building Open Source Community in China - Luke Han
 
The Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke HanThe Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke Han
 
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
 
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
 
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
 
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
 
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
 
Apache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 BeijingApache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 Beijing
 
ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015
 
Apache Kylin Extreme OLAP Engine for Big Data
Apache Kylin Extreme OLAP Engine for Big DataApache Kylin Extreme OLAP Engine for Big Data
Apache Kylin Extreme OLAP Engine for Big Data
 
Apache Kylin Introduction
Apache Kylin IntroductionApache Kylin Introduction
Apache Kylin Introduction
 
Adding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark MeetupAdding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark Meetup
 
Apache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 BeijingApache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 Beijing
 
Kylin OLAP Engine Tour
Kylin OLAP Engine TourKylin OLAP Engine Tour
Kylin OLAP Engine Tour
 
Actuate presentation 2011
Actuate presentation   2011Actuate presentation   2011
Actuate presentation 2011
 

Dernier

Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineeringssuserb3a23b
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 

Dernier (20)

Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineering
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 

Refactoring your EDW with Mobile Analytics Products

  • 1. Refactoring your EDW With Mobile Analytics Products Zhi Zhu @CCB FinTech Luke Han @Kyligence
  • 2. Strata New York 2018 EDW core data 1PB Incremental data 4TB/DAY On-line data storage 5PB >600M customers >2,000M accounts Big data – How big is big? CCB - 2nd biggest bank in China. About China Construction Bank (CCB)
  • 3. Strata New York 2018 Tactical Decision Makers General Business Users Strategic Decision Makers Operational Decision Makers Headquarters Source Systems ALS CLPM CCMI S EDW Teradata 5450 (6 nodes), 18T ERPF CCBS SMIS Material DSS Database OCR M … Cube CMIS CMIS CCD A …1104 Operational Data Storage ODS Historic al data Branches Source Systems 100+ reports 100+ Users 1st Generation EDW (2004)
  • 4. Strata New York 2018 Dining Room Readily Accessible to End Users (and BI Developers) Safe, Hospitable Environment Data Assets “Ready for Primetime” Dimensionally Structured Kitchen Off Limits to End Users Data Professionals Only Please Dangerous / Inhospitable Environment ”Data Assets “Not Ready for Primetime” Structured Variably For Data Processing Dimensional Semantic Layer Dimensional Tier [Physical or Virtual (CIF or Data Vault)] (Virtual or Physical) Un/Semi-Structured Data Movement Un/Semi-Structured Source Data Persistent Un/Semi- Structured Staging Area Unstructured -> Structured Data Discovery Processing Structured Data Movement Structured Source Data Persistent Structured Data Repository Insight Generation / Data Mining Big Data Blueprint (2012)
  • 5. Strata New York 2018 Tactical Decision Makers General Business Users Strategic Decision Makers Operational Decision Makers Presentation Layer Headquarters Source System ALS CLPM CCMIS EDW Teradata 6650 (10+10 nodes), 600T `Big Data Analytics Platform Hadoop Legacy Data Marts OCRM… 1000+ Cube CMIS Historical Data SOR MPP DB ERPF CCBS CCDA…1104 Operational Data Storage Branch ODSB Performance Marketing EDW Teradata 2750 (32 nodes), 750T Branches Source Systems 25,000+ reports 2,000+ Data Mining Theme s SQL Translation between different databases is a big lesson. 100,000+ Users User Experience Challenges: Data latency High-performance EDW Challenges: System I/O Maintanence and data lineage Big Data Transformation (2016)
  • 6. Strata New York 2018 Tactical Decision Makers General Business Users Strategic Decision Makers Operational Decision Makers Presentation Layer Headquarters Source System ALS CLPM CCMIS EDW Teradata 6650 (10+10 nodes), 600T `Big Data Analytics Platform Hadoop Legacy Data Marts OCRM… 1000+ Cube CMIS Historical Data SOR MPP DB ERPF CCBS CCDA…1104 Operational Data Storage Branch ODSB Performance Marketing EDW Teradata 2750 (32 nodes), 750T Branches Source Systems 25,000+ reports 2,000+ Data Mining Theme s SQL Translation between different databases is a big lesson. 100,000+ Users User Experience Challenges: Data latency High-performance EDW Challenges: System I/O Maintanence and data lineage Big Data Transformation (2016)
  • 7. Strata New York 2018 Tactical Decision Makers General Business Users Strategic Decision Makers Operational Decision Makers Presentation Layer Headquarters Source System ALS CLPM CCMIS EDW Teradata 6650 (10+10 nodes), 600T `Big Data Analytics Platform Hadoop Legacy Data Marts OCRM… 1000+ Cube CMIS Historical Data SOR MPP DB ERPF CCBS CCDA…1104 Operational Data Storage Branch ODSB Performance Marketing EDW Teradata 2750 (32 nodes), 750T Branches Source Systems 25,000+ reports 2,000+ Data Mining Theme s SQL Translation between different databases is a big lesson. 100,000+ Users User Experience Challenges: Data latency High-performance EDW Challenges: System I/O Maintanence and data lineage Big Data Transformation (2016)
  • 8. Strata New York 2018 Tactical Decision Makers General Business Users Strategic Decision Makers Operational Decision Makers Presentation Layer Headquarters Source System ALS CLPM CCMIS EDW Teradata 6650 (10+10 nodes), 600T `Big Data Analytics Platform Hadoop Legacy Data Marts OCRM… 1000+ Cube CMIS Historical Data SOR MPP DB ERPF CCBS CCDA…1104 Operational Data Storage Branch ODSB Performance Marketing EDW Teradata 2750 (32 nodes), 750T Branches Source Systems 25,000+ reports 2,000+ Data Mining Theme s SQL Translation between different databases is a big lesson. 100,000+ Users User Experience Challenges: Data latency High-performance EDW Challenges: System I/O Maintanence and data lineage Big Data Transformation (2016)
  • 9. Strata New York 2018 100,000+ users1,200+ million records PB-level data storageMillisecond-level responding Metrics can be published by sub-organizations, and be subscribed by end-user touching Intelligent Eyes(1st version, Sept 2016) Mobile product brought an opportunity
  • 10. Strata New York 2018 Benefits TCO  Teradata no longer increased  Cost of unit storage ↓ 66%  Delivery cycle time ↓ from 6 months to 1 months 1 Performance  Mobile users ↑ from 0 to 100,000+;  Active PC users ↓ 90%;  Page view (PV) up to 1,000,000 daily  Real-time applications emerged  Data latency ↓ from 48 hours to 7 hours  Millisecond-level responding. 2 User Experience  Access data anywhere and anytime  25000+ reports ↓ to 5000 and 800 mobile data metrics  Eliminating vertical shaft data problems 3
  • 11. Strata New York 2018 How to re-engineering legacy EDW to Data Lake • Discover users’ values by collecting their usage records. • Enable end users to join the data game. • Build data conformance bus on Hive. • Rebuild Analytics layer by Apache Kylin. • Testing driven development.
  • 12. Strata New York 2018 AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP L2 Cache Oracle Database DATA MARTS TEST/ DEV ANALYTICAL ARCHIVE CAPTURE | STORE | REFINE MDX RESTFUL SERVICE DATA LAB INDEPENDENT DATA MART DUAL SYSTEMS TD 66XX TD 2700 L2 Cache HBase GP L1 Cache Redis ETL EDW has evolved to Data Ecosystem
  • 13. Strata New York 2018 AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP L2 Cache Oracle Database DATA MARTS TEST/ DEV ANALYTICAL ARCHIVE CAPTURE | STORE | REFINE MDX RESTFUL SERVICE DATA LAB INDEPENDENT DATA MART DUAL SYSTEMS TD 66XX TD 2700 L2 Cache HBase GP L1 Cache Redis ETL EDW has evolved to Data Ecosystem
  • 14. Strata New York 2018 AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP L2 Cache Oracle Database DATA MARTS TEST/ DEV ANALYTICAL ARCHIVE CAPTURE | STORE | REFINE MDX RESTFUL SERVICE DATA LAB INDEPENDENT DATA MART DUAL SYSTEMS TD 66XX TD 2700 L2 Cache HBase GP L1 Cache Redis ETL EDW has evolved to Data Ecosystem
  • 16. Strata New York 2018 About Apache Kylin • Leading Open Source OLAP for Big Data • Open source by eBay in 2014 • Graduated to Apache Top Project in 2015 • 1000+ Adoptions world wild • 2015 InfoWorld Bossie Awards • 2016 InfoWorld Bossie Awards
  • 17. Strata New York 2018 Presentation Visualization Data Lake Data Source o Too many options o Low performance o Long learning curve o Compatibility issue o Technology vs Data OLAP: The Missing Part of Big Data Hive Impala Spark SQL Drill MapReduce …Spark
  • 18. Strata New York 2018 Presentation Visualization Data Lake Data Source o SQL Acceleration for Big Data o Semantic Layer o Speed up Analytics o ANSI SQL Interface o High Performance and High Concurrency Apache Kylin: Bring OLAP back to Big Data OLAP Data Mart Hive Impala Spark SQL Drill MapReduce …Spark
  • 20. About Us Kyligence = Kylin + Intelligence - Kyligence is formed by the team who created Apache Kylin, leading open source OLAP for Big Data. Kyligence provides an intelligent data warehouse built for data cognitive analytics at web scale. - Funding by leading VCs: Redpoint Ventures, Cisco, CBC Capital and Shunwei Capital, Eight Roads Ventures (Fidelity International Arm) - CRN Top 10 Big Data Startups 2018 © Kyligence Inc. 2018, Confidential.
  • 21. Strata New York 2018 Featured Customers Trusted by Fortune 500 Lenovo #226 of Fortune 500 OPPO #4 Smart Phone Vendor Global Lufax #1 Fintech in China CPIC #252 of Fortune 500 SAIC #41 of Fortune 500 #47 of Fortune 500 Huawei #83 of Fortune 500 Huatai Securities Top Securities in China Top 3 Telecom in China McDonald’s #436 Fortune 500 China UnionPay #3 Payment Network Data from Fortune Global 500 year 2017: http://fortune.com/global500/list/ #33 of Fortune 500
  • 22. Strata New York 2018 Partners Global Ecosystem Microsoft Azure Partner Amazon Web Service Technology Partner Tableau Technology Partner Cloudera Sliver Partner MapR Converge Partner Hortonworks Community Partner Huawei Solution Partner
  • 23. Evolution of Data Warehousing Data Mart Orders Payments Contacts Products Customers Data Warehouse Contacts Orders Payments Products Data Warehouse Data Lake Contacts Orders Payments Products Data Warehouse Contacts Orders Payments Products Next GenerationCloud Contacts Orders Payments Products Data Warehouse Products Contacts Orders Payments ?
  • 24. Traditional Data Warehousing Enormous Manual Efforts and Repeated Work © Kyligence Inc. 2018, Confidential.
  • 25. Human Intelligence Intelligence and Automation The future of Data Analytics Artificial IntelligenceVS
  • 26. Historical Real time Fusion of Historical & Real-time Data Fusion of Local and Cloud On-premises Cloud EDW Data Lake Fusion of Traditional DW & Big Data Fusional DW Architecture Kyligence Enterprise Product Screenshot
  • 28. Augmented Analytics Platform SQL Query Log Analytic Behavior Data Schema Data Profile ML-based Discovery of Analytic Pattern Proprietary Data Modeling Automation Self-directed Storage Layer Optimization Intelligent Query Push- down & Routing BI Real-time Analysis Data-as-a- Service Local Deployment Cloud Platform Container Data Services © Kyligence Inc. 2018, Confidential.
  • 29. Strata New York 2018 Kyligence Position in Big Data Ecosystem Fill the gap between business and technology Kyligence Enterprise powered by Apache Kylin BI Visualization OLAP Data Mart Data Lake Source Data HDFS YARN MapReduce Spark Kafka …Spark SQL • Fusional • Unified EDW & Data Lake • Unified Realtime and Historical • Unified On-Prem and Cloud • Intelligent • Machine Learning-augmented modeling • High Performance • Sub-seconds query speed on massive dataset • High Concurrency • Web-scale OLAP query
  • 30. Evolution of Data Warehousing Data Mart Orders Payments Contacts Products Customers Data Warehouse Contacts Orders Payments Products Data Warehouse Data Lake Contacts Orders Payments Products Data Warehouse Contacts Orders Payments Products Fusional & Intelligent DW Cloud Contacts Orders Payments Products Data Warehouse Products Contacts Orders Payments
  • 31. Strata New York 2018 Kyligence Cloud Transforming Big Data Analytics to Cloud Kyligence Cloud ANSI SQL Dashboard OLAP Hadoop Customer Cloud Account client cloud Kyligence Enterprise Platform streaming Cluster Deploy Account Management Diagnosis & Optimization Queries & Reporting cloud storage tables, logs, files RDBMS (metadata) ANSI SQL Cloud Data Warehouse Cluster Management
  • 32. Strata New York 2018 Kyligence Cloud Transforming Big Data Analytics to Cloud One-click provisioning Auto Scaling High Performance Seamless Integration Intelligent Ops Deploy globally in 30 minutes Scale cluster automatically for different workloads Powered by Kyligence Analytics Platform Connect to cloud data sources Enterprise ODBC driver for BI Online diagnosis and continuous optimization Speed Up OLAP analysis and mission-critical queries to interactive speed
  • 35. Strata New York 2018 SQL Acceleration for Big Data Kyligence Enterprise Powered by Apache Kylin ANSI SQL Kyligence Storage Hadoop Platform T-SQL Oracle SQL PostgreSQL Ingestion SQL Pushdown Impala Query Analytics
  • 36. Strata New York 2018 SQL Acceleration for Big Data < 1s DB line_orders buyer_accounts seller_accounts product_items … √ √ √ SQL SQL
  • 37. Strata New York 2018 SQL Acceleration for Big Data Intelligent Cubing Kyligence Enterprise ANSI SQL Pushdown For Ad-Hoc Aggregation & Index query Solution • Speed up SQL on Hadoop automatically • Supports Hive, Impala, Spark SQL and more will coming • High performance and high concurrency OLAP Benefits • Unified analytics platform for aggregation and ad-hoc query • Self-services enables analysts without IT SQL on Hadoop
  • 39. Strata New York 2018 Powering Excel for Big Data Extend big data analytics to every analysts desktop Analyze Your Big Data LIVE with Excel MDX/ANSI SQL Interface Self-service Big Data from On- Perm to Cloud
  • 40. Strata New York 2018 LIVE No data import is needed Slice and dice your big data Your Excel can fully leverage Kyligence Cube capability
  • 41. Strata New York 2018 LIVE No data import is needed Slice and dice your big data Your Excel can fully leverage Kyligence Cube capability
  • 42. Strata New York 2018 Anywhere Desktop Website Mobile Kyligence currently support Pin your Excel report to Power BI mobile
  • 44. Strata New York 2018 Kyligence Acceleration Solution for Greenplum Kyligence Enterprise Build Cube SQL SQL Pushdown ~ minutes Cube Access ~ sub-seconds • Change data source connection • Intelligently build cubes from Greenplum • Accelerate mission-critical analytics • Pushdown flexible queries to Greenplum for data exploration
  • 45. Strata New York 2018 Kyligence Acceleration Solution for Greenplum 100x faster SQL Pushdown to Greenplum: minutes latency,min duration > 20s After acceleration:sub-seconds latency,max duration < 1s Seamlessly migrated Query Performance ~ 100x Reporting rendering ~14x Same Tableau reports 100x faster!
  • 47. Strata New York 2018 Streaming OLAP Consume Streaming Data via Kafka MDX/ANSI SQL Interface Batch & Streaming together Data Source HDFS (Recent data) Kyligence Enterprise Pushdown Cube Access Build Cube Loading Processing Kafka Topic Monitor Prediction Alerts … BI MOLAP … Cube (Full history data) Near Real-time (On recent data) Historical (On full history data)

Notes de l'éditeur

  1. Good afternoon everyone! I’m Zhu Zhi from China Construction Bank. The topic I would like to share today is about the big data migration. We started this work in 2012. It was really passive to complete it with thoughts of data warehouses. About two years ago, we started to transform the data warehouse to our data lake driven by a mobile data app ([æp]), and it worked well. Then encouraged by Mr. Han, I fortunately have an opportunity to share the experience over here.
  2. China Construction Bank is known as the second largest bank in China, serving more than 600 million retail customers and more than 10 million corporate customers. 4 Trillionbyte (TB: 单位量级一般写作缩写TB,读作Trillionbyte,如果不方便记忆,只读TB缩写也可) of incremental data is generated daily by more than 2 billion accounts. Up to date, we have stored more than 5 Petabyte (PB同理) online-data, 20% of it belonging to the data warehouse.
  3. Many companies, especially Internet companies, can use a new big data technology stack like Hadoop from the beginning. But our company can’t directly replace traditional applications with new technologies since we have a long history of building enterprise data warehouses (EDW). In 2004, we built the first generation of Teradata data warehouse which only had 16 Trillionbyte (TB) of data volume and more than 100 users and reports. By 2012, the data volume had increased by 35 times, as much as 600 Trillionbyte (TB); number of users increased nearly 500 times, apps 250(读作two-fifty更常见) times, reports reaching a total of 25,000.
  4. When the concept of big data was proposed in 2011, our company had to face great challenges: First, the total cost of ownership (TCO) maintained much high. Second, more semi-structured data and unstructured data were applied gradually. Third, Business Intelligence (BI)-based data apps needed to be upgraded to advanced analysis algorithms in order to satisfy the increasing requirements of users. Therefore, we put forward the idea of big data blueprint. As you can see, the entire blueprint was divided into two parts: the above part was the restaurant, aimed at numerous business users; we hope that they could use the data in a self-service way. Whereas, the lower part was the kitchen for technical engineers; it was available for unstructured data and advanced data insights compared to traditional data warehouses,At that point, we realized,the outbreak of Hadoop and Spark ecosystems, but due to the transformation problems of the technical team, we still maintained the development mode of Perl+SQL. In that case, we could work through the problems by choosing an open MPP database and transferring the data and programs.
  5. In 2016, we ended up realizing that this way couldn’t handle the core problem. Our development speed couldn’t keep up with changes of business users’ requirements. Additionally, this open MPP database could neither satisfy the growing data volume nor replace Teradata completely. The code was harder to maintain and the problems were exploding. For example, this is a very common sql statement translated automatically in the picture, which is five pages’ long. This kind of sql statement was flooding in our entire data warehouse. When we scanned the code into this data graph, I finally understood what a spaghetti-like system was. This was a very painful experience.
  6. In 2016, we ended up realizing that this way couldn’t handle the core problem. Our development speed couldn’t keep up with changes of business users’ requirements. Additionally, this open MPP database could neither satisfy the growing data volume nor replace Teradata completely. The code was harder to maintain and the problems were exploding. For example, this is a very common sql statement translated automatically in the picture, which is five pages’ long. This kind of sql statement was flooding in our entire data warehouse. When we scanned the code into this data graph, I finally understood what a spaghetti-like system was. This was a very painful experience.
  7. In 2016, we ended up realizing that this way couldn’t handle the core problem. Our development speed couldn’t keep up with changes of business users’ requirements. Additionally, this open MPP database could neither satisfy the growing data volume nor replace Teradata completely. The code was harder to maintain and the problems were exploding. For example, this is a very common sql statement translated automatically in the picture, which is five pages’ long. This kind of sql statement was flooding in our entire data warehouse. When we scanned the code into this data graph, I finally understood what a spaghetti-like system was. This was a very painful experience.
  8. In 2016, we ended up realizing that this way couldn’t handle the core problem. Our development speed couldn’t keep up with changes of business users’ requirements. Additionally, this open MPP database could neither satisfy the growing data volume nor replace Teradata completely. The code was harder to maintain and the problems were exploding. For example, this is a very common sql statement translated automatically in the picture, which is five pages’ long. This kind of sql statement was flooding in our entire data warehouse. When we scanned the code into this data graph, I finally understood what a spaghetti-like system was. This was a very painful experience.
  9. To provide convenience for users, my team developed a mobile data app in 2016. Initially, we just wanted to simplify the process of access to data for users. After meeting Luke, I suddenly realized that this app was a key to transform a data warehouse to a data lake. This picture shows our earliest version of app (MVP). It is similar to the famous software called Straight Flush in China. We thought outside the box and gradually released data cubes from the data warehouse onto mobile phones. End users subscribed to the data they cared about on mobile phones, so that we could know the most essential data by counting their clicks. Then we transferred the most valuable things from the traditional architecture to a data lake using Kylin system and MapReduce. This app not only met requirements of numerous users, but also made the overall architecture highly maintainable, reducing TCO.
  10. It turned out that this mobile data app outperformed our previous design. With regard to TCO, Teradata no longer increased (本句直译,我不太理解teradata不再增长的含义,这里的teradata是单位量级的含义吗?), and the cost of unit storage declined 66%. The delivery cycle time of new apps was reduced from 6 months to 1 months. In terms of performance, mobile users grew from 0 to 100,000, while the active PC users dropped 90%. The page view (PV) was up to one million daily. Additionally, real-time applications emerged due to the reduced data latency from 48 hours to 7 hours and millisecond-level responding. As for user experience, users were free to access data not limited by time or places since 25000 reports were reduced to 5000 and 800 mobile data metrics and silo data problems were eliminated.
  11. What we learned through the process: First is discovering users’ values by collecting their usage records in order to transform the logics of 25,000 reports into about 1000 cube apps; Second is Kylin’s self-service system enabling end users to join the data game and to share, greatly reducing the expenses of developers; Third is building the data conformance bus on Hive to eliminate the vertical shaft data problem; Fourth is Rebuild Analytics layer by Apache Kylin system to make programs highly maintainable; Fifth is applying test-driven development method to refactor the calculation logic.
  12. Two years later, the data warehouse has eventually evolved into a vibrant data ecosystem, where Hadoop gathers all structured data into the data lake and Kylin’s self-service system with our data service system continuously deliver the most valuable data to various apps. Appreciate Mr. Luke’s support to our project. Next, let Mr. Luke to introduce some ideas about enterprise intelligent data warehouse.
  13. Two years later, the data warehouse has eventually evolved into a vibrant data ecosystem, where Hadoop gathers all structured data into the data lake and Kylin’s self-service system with our data service system continuously deliver the most valuable data to various apps. Appreciate Mr. Luke’s support to our project. Next, let Mr. Luke to introduce some ideas about enterprise intelligent data warehouse.
  14. Two years later, the data warehouse has eventually evolved into a vibrant data ecosystem, where Hadoop gathers all structured data into the data lake and Kylin’s self-service system with our data service system continuously deliver the most valuable data to various apps. Appreciate Mr. Luke’s support to our project. Next, let Mr. Luke to introduce some ideas about enterprise intelligent data warehouse.