Developer Data Modeling Mistakes: From Postgres to NoSQL
The Bitter Lesson of ML Pipelines
1. The Bitter Lesson of
ML Pipelines
jim_dowling
CEO @ Logical Clocks
Assoc Prof @ KTH
Senior Research @ RISE
WASP4ALL – Future Computing Platforms for X
Nov 2019
2. “Methods that scale with computation
are the future of AI”*
Rich Sutton (Founding Father of Reinforcement Learning), May 2018
* https://www.youtube.com/watch?v=EeMCEQa85tw
3. Massive Increase in Compute for AI*
3/38
Distributed Systems
3.5 month-doubling time
*https://blog.openai.com/ai-and-compute
4. Distributed Systems are important for Deep Learning
Distributed
Deep Learning
Hyper
Parameter
Optimization
Distributed
Training
Larger
Training
Datasets
Elastic
Model
Serving
Parallel
Experiments
(Commodity)
GPU Clusters
Auto
ML
5. The Bitter Lesson
“The biggest lesson .. is that general methods that
leverage computation are ultimately the most
effective, and by a large margin…
The two (general purpose) methods that seem to
scale ... are search and learning.”
Rich Sutton, March 2019
http://www.incompleteideas.net/IncIdeas/BitterLesson.html
7. Learning needs structure
● In learning theory, the No Free
Lunch theorem* tells us that
without structure (innate priors), it
is very difficult to learn anything.
● Warning! Structure is not free - it
adds assumptions about the data
that may not hold for all of your
data.
*Free lunch today
for all WASP4ALL
attendees
8. What do you mean by Structure?
● By structure, we mean prior knowledge
○ Not just a prior probability
1. Innate priors
○ A linear model assumes the data is linear.
○ The convolution/pooling assumption in Convolutional
Neural Nets.
2. Some structure can be computed dynamically
○ Semi-/self-supervised learning
9. The Trend: Less Structure and More Data/Compute
● There is a trade off between the
amount of structure you need to put in
your learning systems and the amount
of training data and compute available.
● Recent increases in the amount of
available training data for supervised
ML and decreasing sample complexity
for some reinforcement learning
domains means you need less
structure.
Structure
Data
Compute
10. Self-supervised is SoTA in Image Classification
Pre-trained with 3.5B
weakly labeled
Instagram images
using 256 TPUs v3 for
3.5 days.
13. Not all Structure can be learned….
● We need the meta-methods that can find and capture complexity
● For Deep Learning, these meta-methods must scale on GPUs
○ Convolutional Neural Network
○ Transformer
14. Structure that doesn’t scale (yet): Capsule Networks*
Algorithmic bottlenecks for GPUs*:
“votes are ‘routed’ using the ExpectationMaximization algorithm”
ML Framework limitations*:
“[ml frameworks] are structured around calls to large monolithic kernels”
*Machine Learning Systems are Stuck in a Rut, Barham P. and Isard M, HotOS’19
ConvNet
CapsuleNet
CuDNN Kernels
TensorFlow
XLA
WARP Threads
SIMD Lanes (SMs)
User Programs
The TensorFlow/Cuda Stack
16. Searching for Structure
● We can also search for better
hyperparameters with Genetic algorithms,
reinforcement learning, etc
ImageNet SoTA, March 2018 (Quoc Le et Al)
17. The Bitter Lesson as a Research Roadmap
1. Scale out data and computation to reduce the amount of structure.
○ Learn as much structure as possible.
2. Structure we introduce should be minimal meta-methods that scale-
out on both accelerators and distributed systems.
18. Distributed Systems Research on ML at KTH/RISE
● Continuous Deep Analytics
○ ARCON (RISE, KTH – P. Carbone, S. Haridi)
● Distributed Deep Learning
○ Hopsworks (Logical Clocks AB, KTH –J. Dowling, V. Vlassov, A. Payberah)
● Scalable Data Management for ML
○ HopsFS and the Feature Store (Logical Clocks AB)
https://dcatkth.github.io/
26. Problem: PySpark is inefficient with Early Stopping
● PySpark’s bulk-synchronous execution model prevents efficient use of
early-stopping for hyperparameter optimization.
New Framework? Fix PySpark?
27. Solution: Long Running Tasks and a RPC framework
Trial11
Driver (Optimizer)
Trial12
Trial13
Trial1N
…
Barrier
Metrics
New Trial
31. Parallel Ablation Studies
PClassname survivesex sexname survive
Replacing the Maggy Optimizer with an Ablator:
● Feature Ablation using
the Feature Store
● Leave-One-Layer-Out Ablation
● Leave-One-Component-Out (LOCO)
Sina Sheikholeslami https://castor-software-days-2019.github.io/sina
34. Hopsworks End-to-End ML Pipelines
Data
Pipelines
Ingest & Prep
Feature
Store
Machine Learning Experiments
Data Parallel
Training
Model
Serving
Ablation
Studies
Hyperparameter
Optimization
Bottleneck, due to
• iterative nature
• human-in-the-loop
35. DataPrep Pipelines and Model Training Pipelines
Select
Features
Feature
Engineering
Validate &
Deploy Model
Experiment,
Train Model
Dataprep Pipeline Training and Deployment Pipeline
Feature
Store
Airflow Airflow
36. www.hops.site
RISE Data Center
1 PB storage,
24 GPUs
2000 CPUs
1500+ Users
Register for a free account with your student/work email address:
www.hops.site
37. Hopsworks
Efficiency & Performance Security & GovernanceDevelopment & Operations
Secure Multi-Tenancy
Project-based restricted access
Encryption At-Rest, In-Motion
TLS/SSL everywhere
AI-Asset Governance
Models, experiments, data, GPUs
Data/Model/Feature Lineage
Discover/track dependencies
Development Environment
First-class Python Support
Version Everything
Code, Infrastructure, Data
Model Serving on Kubernetes
TF Serving, SkLearn
End-to-End ML Pipelines
Orchestrated by Airflow
Feature Store
Data warehouse for ML
Distributed Deep Learning
Faster with more GPUs
HopsFS
NVMe speed with Big Data
Horizontally Scalable
Ingestion, DataPrep,
Training, Serving
FS
38. Acknowledgements and References
Slides and Diagrams from colleagues:
● Maggy: Moritz Meister, Sina Sheikholeslami, Robin Andersson, Kim Hammar
References
● HopsFS: Scaling hierarchical file system metadata …, USENIX FAST 2017.
● Size matters: Improving the performance of small files …, ACM Middleware 2018.
● ePipe: Near Real-Time Polyglot Persistence of HopsFS Metadata, CCGrid, 2019.
● Hopsworks Demo, SysML 2019.
40. WASP Course on Large Scale Machine Learning
● http://wasp-sweden.org/large-scale-machine-learning-6-credits/
○ Dr. Raazesh Sainudiin and Dr. Amir Payberah
○ Autumn 2020
41. Thank you!
Register for a free account at
www.hops.site
Twitter
@logicalclocks
@hopsworks
GitHub
https://github.com/logicalclocks/hopsworks
https://github.com/hopshadoop/hops