15. Intro | ? Data Pipeline (2016-02, Phase III)
API Logs
MySQL
Google
Analytics
S3
Dashboard
Serve
Batched Results
Python Batch
CRON
Batch Processing System
16. Intro | ! Data Pipeline (2018-08, Phase VI)
Update
Lifetime Storage
Update
/ Use
Lifetim
e
Storage
EventLogs
MySQL
3rdParty
(Firebase /
Adjust)
S3
Spark Streaming
Kafka
Cluster
ElasticSearch/
Kibana
API/
Redis
Hive
Cluster
Presto
Cluster
Batch
Dashboard
Analytics with
Queries
Realtim
e
Dashboard
HBase
Automated
Personalized
Operation
Realtime Processing System
Serve
Batched Results
Spark Cluster
Spark Batch
Airflow
Batch Processing System
Superset
Zeppelin
Redash
20. Phase I (~ 50G) | ? Pipeline
MySQL
(+ServiceLog)
Google
Analytics
(+Goal,
Ecommerce)
Unknown
GA
+Ecommerce
+Goal = Conversion
MySQL
+ Service Monitoring Log (table)