SlideShare une entreprise Scribd logo
1  sur  41
Télécharger pour lire hors ligne
The Bitter Lesson of
ML Pipelines
jim_dowling
CEO @ Logical Clocks
Assoc Prof @ KTH
Senior Research @ RISE
WASP4ALL – Future Computing Platforms for X
Nov 2019
“Methods that scale with computation
are the future of AI”*
Rich Sutton (Founding Father of Reinforcement Learning), May 2018
* https://www.youtube.com/watch?v=EeMCEQa85tw
Massive Increase in Compute for AI*
3/38
Distributed Systems
3.5 month-doubling time
*https://blog.openai.com/ai-and-compute
Distributed Systems are important for Deep Learning
Distributed
Deep Learning
Hyper
Parameter
Optimization
Distributed
Training
Larger
Training
Datasets
Elastic
Model
Serving
Parallel
Experiments
(Commodity)
GPU Clusters
Auto
ML
The Bitter Lesson
“The biggest lesson .. is that general methods that
leverage computation are ultimately the most
effective, and by a large margin…
The two (general purpose) methods that seem to
scale ... are search and learning.”
Rich Sutton, March 2019
http://www.incompleteideas.net/IncIdeas/BitterLesson.html
LEARNING
Learning needs structure
● In learning theory, the No Free
Lunch theorem* tells us that
without structure (innate priors), it
is very difficult to learn anything.
● Warning! Structure is not free - it
adds assumptions about the data
that may not hold for all of your
data.
*Free lunch today
for all WASP4ALL
attendees
What do you mean by Structure?
● By structure, we mean prior knowledge
○ Not just a prior probability
1. Innate priors
○ A linear model assumes the data is linear.
○ The convolution/pooling assumption in Convolutional
Neural Nets.
2. Some structure can be computed dynamically
○ Semi-/self-supervised learning
The Trend: Less Structure and More Data/Compute
● There is a trade off between the
amount of structure you need to put in
your learning systems and the amount
of training data and compute available.
● Recent increases in the amount of
available training data for supervised
ML and decreasing sample complexity
for some reinforcement learning
domains means you need less
structure.
Structure
Data
Compute
Self-supervised is SoTA in Image Classification
Pre-trained with 3.5B
weakly labeled
Instagram images
using 256 TPUs v3 for
3.5 days.
Self-supervised is SoTA in NLP
https://arxiv.org/pdf/1907.11692.pdf
Self-supervised is SoTA in AML
● 40 TB of data
● Massive class imbalance
● Semi-supervised
Not all Structure can be learned….
● We need the meta-methods that can find and capture complexity
● For Deep Learning, these meta-methods must scale on GPUs
○ Convolutional Neural Network
○ Transformer
Structure that doesn’t scale (yet): Capsule Networks*
Algorithmic bottlenecks for GPUs*:
“votes are ‘routed’ using the ExpectationMaximization algorithm”
ML Framework limitations*:
“[ml frameworks] are structured around calls to large monolithic kernels”
*Machine Learning Systems are Stuck in a Rut, Barham P. and Isard M, HotOS’19
ConvNet
CapsuleNet
CuDNN Kernels
TensorFlow
XLA
WARP Threads
SIMD Lanes (SMs)
User Programs
The TensorFlow/Cuda Stack
SEARCH
Searching for Structure
● We can also search for better
hyperparameters with Genetic algorithms,
reinforcement learning, etc
ImageNet SoTA, March 2018 (Quoc Le et Al)
The Bitter Lesson as a Research Roadmap
1. Scale out data and computation to reduce the amount of structure.
○ Learn as much structure as possible.
2. Structure we introduce should be minimal meta-methods that scale-
out on both accelerators and distributed systems.
Distributed Systems Research on ML at KTH/RISE
● Continuous Deep Analytics
○ ARCON (RISE, KTH – P. Carbone, S. Haridi)
● Distributed Deep Learning
○ Hopsworks (Logical Clocks AB, KTH –J. Dowling, V. Vlassov, A. Payberah)
● Scalable Data Management for ML
○ HopsFS and the Feature Store (Logical Clocks AB)
https://dcatkth.github.io/
Hopsworks – Award Winning AI Platform
SEARCH ON HOPSWORKS
Search: Parallel Hyperparameter Tuning with Maggy
Learning
Black Box
Metric
Meta-level
learning &
optimization Parallel
WorkersQueue
Trial
Trial
Search space
https://databricks.com/session_eu19/asynchronous-hyperparameter-optimization-with-apache-sparkMoritz Meister
Synchronous Parallel Trials with PySpark
Trial11
Driver
Trial12
Trial13
Trial1N
…
HDFS
Trial21
Trial22
Trial23
Trial2N…
Barrier
Barrier
Trial31
Trial32
Trial33
Trial3N
…
Barrier
Metrics1 Metrics2 Metrics3
Synchronous Parallel Trials with Early Stopping
Trial11
Driver
Trial12
Trial13
Trial1N
…
HDFS
Trial21
Trial22
Trial23
Trial2N…
Barrier
Barrier
Trial31
Trial32
Trial33
Trial3N
…
Barrier
Metrics1 Metrics2 Metrics3
Wasted Compute Wasted ComputeWasted Compute
Early Stop
Problem: PySpark is inefficient with Early Stopping
● PySpark’s bulk-synchronous execution model prevents efficient use of
early-stopping for hyperparameter optimization.
New Framework? Fix PySpark?
Solution: Long Running Tasks and a RPC framework
Trial11
Driver (Optimizer)
Trial12
Trial13
Trial1N
…
Barrier
Metrics
New Trial
Maggy User API
Develop your own Optimizer
Results
Hyperparameter Optimization Trial ASHA Validation Trial
ASHA
RS-ES
RS-NS
ASHA
RS-ES
RS-NS
Parallel Ablation Studies
PClassname survivesex sexname survive
Replacing the Maggy Optimizer with an Ablator:
● Feature Ablation using
the Feature Store
● Leave-One-Layer-Out Ablation
● Leave-One-Component-Out (LOCO)
Sina Sheikholeslami https://castor-software-days-2019.github.io/sina
Experiments on Hopsworks
Production ML Applications
are Pipelines.
Hopsworks End-to-End ML Pipelines
Data
Pipelines
Ingest & Prep
Feature
Store
Machine Learning Experiments
Data Parallel
Training
Model
Serving
Ablation
Studies
Hyperparameter
Optimization
Bottleneck, due to
• iterative nature
• human-in-the-loop
DataPrep Pipelines and Model Training Pipelines
Select
Features
Feature
Engineering
Validate &
Deploy Model
Experiment,
Train Model
Dataprep Pipeline Training and Deployment Pipeline
Feature
Store
Airflow Airflow
www.hops.site
RISE Data Center
1 PB storage,
24 GPUs
2000 CPUs
1500+ Users
Register for a free account with your student/work email address:
www.hops.site
Hopsworks
Efficiency & Performance Security & GovernanceDevelopment & Operations
Secure Multi-Tenancy
Project-based restricted access
Encryption At-Rest, In-Motion
TLS/SSL everywhere
AI-Asset Governance
Models, experiments, data, GPUs
Data/Model/Feature Lineage
Discover/track dependencies
Development Environment
First-class Python Support
Version Everything
Code, Infrastructure, Data
Model Serving on Kubernetes
TF Serving, SkLearn
End-to-End ML Pipelines
Orchestrated by Airflow
Feature Store
Data warehouse for ML
Distributed Deep Learning
Faster with more GPUs
HopsFS
NVMe speed with Big Data
Horizontally Scalable
Ingestion, DataPrep,
Training, Serving
FS
Acknowledgements and References
Slides and Diagrams from colleagues:
● Maggy: Moritz Meister, Sina Sheikholeslami, Robin Andersson, Kim Hammar
References
● HopsFS: Scaling hierarchical file system metadata …, USENIX FAST 2017.
● Size matters: Improving the performance of small files …, ACM Middleware 2018.
● ePipe: Near Real-Time Polyglot Persistence of HopsFS Metadata, CCGrid, 2019.
● Hopsworks Demo, SysML 2019.
Systems Conferences for Machine Learning
WASP Course on Large Scale Machine Learning
● http://wasp-sweden.org/large-scale-machine-learning-6-credits/
○ Dr. Raazesh Sainudiin and Dr. Amir Payberah
○ Autumn 2020
Thank you!
Register for a free account at
www.hops.site
Twitter
@logicalclocks
@hopsworks
GitHub
https://github.com/logicalclocks/hopsworks
https://github.com/hopshadoop/hops

Contenu connexe

Tendances

Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Jim Dowling
 
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019Kim Hammar
 
Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingJim Dowling
 
StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)Simba Khadder
 
Managed Feature Store for Machine Learning
Managed Feature Store for Machine LearningManaged Feature Store for Machine Learning
Managed Feature Store for Machine LearningLogical Clocks
 
Hopsworks data engineering melbourne april 2020
Hopsworks   data engineering melbourne april 2020Hopsworks   data engineering melbourne april 2020
Hopsworks data engineering melbourne april 2020Jim Dowling
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupJim Dowling
 
Feature store: Solving anti-patterns in ML-systems
Feature store: Solving anti-patterns in ML-systemsFeature store: Solving anti-patterns in ML-systems
Feature store: Solving anti-patterns in ML-systemsAndrzej Michałowski
 
END-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACKEND-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACKJan Wiegelmann
 
Flux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / PipelineFlux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / PipelineJan Wiegelmann
 
Jfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJim Dowling
 
Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Jim Dowling
 
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...Databricks
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingDatabricks
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureData Science Milan
 
Deep Learning for Autonomous Driving
Deep Learning for Autonomous DrivingDeep Learning for Autonomous Driving
Deep Learning for Autonomous DrivingJan Wiegelmann
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Jim Dowling
 

Tendances (20)

Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
 
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019
 
Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowling
 
StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)
 
Managed Feature Store for Machine Learning
Managed Feature Store for Machine LearningManaged Feature Store for Machine Learning
Managed Feature Store for Machine Learning
 
Hopsworks data engineering melbourne april 2020
Hopsworks   data engineering melbourne april 2020Hopsworks   data engineering melbourne april 2020
Hopsworks data engineering melbourne april 2020
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science Meetup
 
Feature store: Solving anti-patterns in ML-systems
Feature store: Solving anti-patterns in ML-systemsFeature store: Solving anti-patterns in ML-systems
Feature store: Solving anti-patterns in ML-systems
 
END-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACKEND-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACK
 
Flux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / PipelineFlux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / Pipeline
 
Jfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocks
 
Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21
 
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
 
Big Data Heterogeneous Mixture Learning on Spark
Big Data Heterogeneous Mixture Learning on SparkBig Data Heterogeneous Mixture Learning on Spark
Big Data Heterogeneous Mixture Learning on Spark
 
Spark ML Pipeline serving
Spark ML Pipeline servingSpark ML Pipeline serving
Spark ML Pipeline serving
 
Deploying Machine Learning Models to Production
Deploying Machine Learning Models to ProductionDeploying Machine Learning Models to Production
Deploying Machine Learning Models to Production
 
Deep Learning for Autonomous Driving
Deep Learning for Autonomous DrivingDeep Learning for Autonomous Driving
Deep Learning for Autonomous Driving
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks
 

Similaire à The Bitter Lesson of ML Pipelines

Asynchronous Hyperparameter Optimization with Apache Spark
Asynchronous Hyperparameter Optimization with Apache SparkAsynchronous Hyperparameter Optimization with Apache Spark
Asynchronous Hyperparameter Optimization with Apache SparkDatabricks
 
Everything you need to know about AutoML
Everything you need to know about AutoMLEverything you need to know about AutoML
Everything you need to know about AutoMLArpitha Gurumurthy
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsNick Pentreath
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabszekeLabs Technologies
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101QuantUniversity
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...Alok Singh
 
Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?Ahmed Kamal
 
RNNs for Recommendations and Personalization
RNNs for Recommendations and PersonalizationRNNs for Recommendations and Personalization
RNNs for Recommendations and PersonalizationNick Pentreath
 
Towards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning BenchmarkTowards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning BenchmarkTuri, Inc.
 
Deep Learning for Recommender Systems with Nick pentreath
Deep Learning for Recommender Systems with Nick pentreathDeep Learning for Recommender Systems with Nick pentreath
Deep Learning for Recommender Systems with Nick pentreathDatabricks
 
Ideas spracklen-final
Ideas spracklen-finalIdeas spracklen-final
Ideas spracklen-finalsupportlogic
 
Automated machine learning - Global AI night 2019
Automated machine learning - Global AI night 2019Automated machine learning - Global AI night 2019
Automated machine learning - Global AI night 2019Marco Zamana
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningEdunomica
 
Proposal for google summe of code 2016
Proposal for google summe of code 2016 Proposal for google summe of code 2016
Proposal for google summe of code 2016 Mahesh Dananjaya
 
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataLambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataTrieu Nguyen
 
HopsML Meetup: Maggy - 2019-06-04
HopsML Meetup: Maggy - 2019-06-04HopsML Meetup: Maggy - 2019-06-04
HopsML Meetup: Maggy - 2019-06-04Moritz Meister
 
Fms invited talk_2018 v5
Fms invited talk_2018 v5Fms invited talk_2018 v5
Fms invited talk_2018 v5Nisha Talagala
 
IRJET- Machine Learning Techniques for Code Optimization
IRJET-  	  Machine Learning Techniques for Code OptimizationIRJET-  	  Machine Learning Techniques for Code Optimization
IRJET- Machine Learning Techniques for Code OptimizationIRJET Journal
 
flowchart ON DEEP LEARNING SPP
flowchart ON DEEP LEARNING SPPflowchart ON DEEP LEARNING SPP
flowchart ON DEEP LEARNING SPPDishaAnkar
 

Similaire à The Bitter Lesson of ML Pipelines (20)

Asynchronous Hyperparameter Optimization with Apache Spark
Asynchronous Hyperparameter Optimization with Apache SparkAsynchronous Hyperparameter Optimization with Apache Spark
Asynchronous Hyperparameter Optimization with Apache Spark
 
Everything you need to know about AutoML
Everything you need to know about AutoMLEverything you need to know about AutoML
Everything you need to know about AutoML
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
 
Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?
 
RNNs for Recommendations and Personalization
RNNs for Recommendations and PersonalizationRNNs for Recommendations and Personalization
RNNs for Recommendations and Personalization
 
Towards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning BenchmarkTowards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning Benchmark
 
Deep Learning for Recommender Systems with Nick pentreath
Deep Learning for Recommender Systems with Nick pentreathDeep Learning for Recommender Systems with Nick pentreath
Deep Learning for Recommender Systems with Nick pentreath
 
Ai in finance
Ai in financeAi in finance
Ai in finance
 
Ideas spracklen-final
Ideas spracklen-finalIdeas spracklen-final
Ideas spracklen-final
 
Automated machine learning - Global AI night 2019
Automated machine learning - Global AI night 2019Automated machine learning - Global AI night 2019
Automated machine learning - Global AI night 2019
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
 
Proposal for google summe of code 2016
Proposal for google summe of code 2016 Proposal for google summe of code 2016
Proposal for google summe of code 2016
 
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataLambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
 
HopsML Meetup: Maggy - 2019-06-04
HopsML Meetup: Maggy - 2019-06-04HopsML Meetup: Maggy - 2019-06-04
HopsML Meetup: Maggy - 2019-06-04
 
Fms invited talk_2018 v5
Fms invited talk_2018 v5Fms invited talk_2018 v5
Fms invited talk_2018 v5
 
IRJET- Machine Learning Techniques for Code Optimization
IRJET-  	  Machine Learning Techniques for Code OptimizationIRJET-  	  Machine Learning Techniques for Code Optimization
IRJET- Machine Learning Techniques for Code Optimization
 
flowchart ON DEEP LEARNING SPP
flowchart ON DEEP LEARNING SPPflowchart ON DEEP LEARNING SPP
flowchart ON DEEP LEARNING SPP
 

Plus de Jim Dowling

ARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfJim Dowling
 
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfPyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfJim Dowling
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleJim Dowling
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfJim Dowling
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdfJim Dowling
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Jim Dowling
 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022Jim Dowling
 
GANs for Anti Money Laundering
GANs for Anti Money LaunderingGANs for Anti Money Laundering
GANs for Anti Money LaunderingJim Dowling
 
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityInvited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityJim Dowling
 
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019Jim Dowling
 
Berlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on HopsBerlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on HopsJim Dowling
 
All AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AIAll AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AIJim Dowling
 
Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Jim Dowling
 
End-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in FinanceEnd-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in FinanceJim Dowling
 
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa ClaraScaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa ClaraJim Dowling
 
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUsScaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUsJim Dowling
 
Odsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on HopsOdsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on HopsJim Dowling
 

Plus de Jim Dowling (17)

ARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdf
 
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfPyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdf
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData Seattle
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning
 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
 
GANs for Anti Money Laundering
GANs for Anti Money LaunderingGANs for Anti Money Laundering
GANs for Anti Money Laundering
 
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityInvited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
 
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
 
Berlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on HopsBerlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on Hops
 
All AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AIAll AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AI
 
Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)
 
End-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in FinanceEnd-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in Finance
 
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa ClaraScaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
 
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUsScaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
 
Odsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on HopsOdsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on Hops
 

Dernier

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Dernier (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

The Bitter Lesson of ML Pipelines

  • 1. The Bitter Lesson of ML Pipelines jim_dowling CEO @ Logical Clocks Assoc Prof @ KTH Senior Research @ RISE WASP4ALL – Future Computing Platforms for X Nov 2019
  • 2. “Methods that scale with computation are the future of AI”* Rich Sutton (Founding Father of Reinforcement Learning), May 2018 * https://www.youtube.com/watch?v=EeMCEQa85tw
  • 3. Massive Increase in Compute for AI* 3/38 Distributed Systems 3.5 month-doubling time *https://blog.openai.com/ai-and-compute
  • 4. Distributed Systems are important for Deep Learning Distributed Deep Learning Hyper Parameter Optimization Distributed Training Larger Training Datasets Elastic Model Serving Parallel Experiments (Commodity) GPU Clusters Auto ML
  • 5. The Bitter Lesson “The biggest lesson .. is that general methods that leverage computation are ultimately the most effective, and by a large margin… The two (general purpose) methods that seem to scale ... are search and learning.” Rich Sutton, March 2019 http://www.incompleteideas.net/IncIdeas/BitterLesson.html
  • 7. Learning needs structure ● In learning theory, the No Free Lunch theorem* tells us that without structure (innate priors), it is very difficult to learn anything. ● Warning! Structure is not free - it adds assumptions about the data that may not hold for all of your data. *Free lunch today for all WASP4ALL attendees
  • 8. What do you mean by Structure? ● By structure, we mean prior knowledge ○ Not just a prior probability 1. Innate priors ○ A linear model assumes the data is linear. ○ The convolution/pooling assumption in Convolutional Neural Nets. 2. Some structure can be computed dynamically ○ Semi-/self-supervised learning
  • 9. The Trend: Less Structure and More Data/Compute ● There is a trade off between the amount of structure you need to put in your learning systems and the amount of training data and compute available. ● Recent increases in the amount of available training data for supervised ML and decreasing sample complexity for some reinforcement learning domains means you need less structure. Structure Data Compute
  • 10. Self-supervised is SoTA in Image Classification Pre-trained with 3.5B weakly labeled Instagram images using 256 TPUs v3 for 3.5 days.
  • 11. Self-supervised is SoTA in NLP https://arxiv.org/pdf/1907.11692.pdf
  • 12. Self-supervised is SoTA in AML ● 40 TB of data ● Massive class imbalance ● Semi-supervised
  • 13. Not all Structure can be learned…. ● We need the meta-methods that can find and capture complexity ● For Deep Learning, these meta-methods must scale on GPUs ○ Convolutional Neural Network ○ Transformer
  • 14. Structure that doesn’t scale (yet): Capsule Networks* Algorithmic bottlenecks for GPUs*: “votes are ‘routed’ using the ExpectationMaximization algorithm” ML Framework limitations*: “[ml frameworks] are structured around calls to large monolithic kernels” *Machine Learning Systems are Stuck in a Rut, Barham P. and Isard M, HotOS’19 ConvNet CapsuleNet CuDNN Kernels TensorFlow XLA WARP Threads SIMD Lanes (SMs) User Programs The TensorFlow/Cuda Stack
  • 16. Searching for Structure ● We can also search for better hyperparameters with Genetic algorithms, reinforcement learning, etc ImageNet SoTA, March 2018 (Quoc Le et Al)
  • 17. The Bitter Lesson as a Research Roadmap 1. Scale out data and computation to reduce the amount of structure. ○ Learn as much structure as possible. 2. Structure we introduce should be minimal meta-methods that scale- out on both accelerators and distributed systems.
  • 18. Distributed Systems Research on ML at KTH/RISE ● Continuous Deep Analytics ○ ARCON (RISE, KTH – P. Carbone, S. Haridi) ● Distributed Deep Learning ○ Hopsworks (Logical Clocks AB, KTH –J. Dowling, V. Vlassov, A. Payberah) ● Scalable Data Management for ML ○ HopsFS and the Feature Store (Logical Clocks AB) https://dcatkth.github.io/
  • 19. Hopsworks – Award Winning AI Platform
  • 20.
  • 21.
  • 23. Search: Parallel Hyperparameter Tuning with Maggy Learning Black Box Metric Meta-level learning & optimization Parallel WorkersQueue Trial Trial Search space https://databricks.com/session_eu19/asynchronous-hyperparameter-optimization-with-apache-sparkMoritz Meister
  • 24. Synchronous Parallel Trials with PySpark Trial11 Driver Trial12 Trial13 Trial1N … HDFS Trial21 Trial22 Trial23 Trial2N… Barrier Barrier Trial31 Trial32 Trial33 Trial3N … Barrier Metrics1 Metrics2 Metrics3
  • 25. Synchronous Parallel Trials with Early Stopping Trial11 Driver Trial12 Trial13 Trial1N … HDFS Trial21 Trial22 Trial23 Trial2N… Barrier Barrier Trial31 Trial32 Trial33 Trial3N … Barrier Metrics1 Metrics2 Metrics3 Wasted Compute Wasted ComputeWasted Compute Early Stop
  • 26. Problem: PySpark is inefficient with Early Stopping ● PySpark’s bulk-synchronous execution model prevents efficient use of early-stopping for hyperparameter optimization. New Framework? Fix PySpark?
  • 27. Solution: Long Running Tasks and a RPC framework Trial11 Driver (Optimizer) Trial12 Trial13 Trial1N … Barrier Metrics New Trial
  • 29. Develop your own Optimizer
  • 30. Results Hyperparameter Optimization Trial ASHA Validation Trial ASHA RS-ES RS-NS ASHA RS-ES RS-NS
  • 31. Parallel Ablation Studies PClassname survivesex sexname survive Replacing the Maggy Optimizer with an Ablator: ● Feature Ablation using the Feature Store ● Leave-One-Layer-Out Ablation ● Leave-One-Component-Out (LOCO) Sina Sheikholeslami https://castor-software-days-2019.github.io/sina
  • 34. Hopsworks End-to-End ML Pipelines Data Pipelines Ingest & Prep Feature Store Machine Learning Experiments Data Parallel Training Model Serving Ablation Studies Hyperparameter Optimization Bottleneck, due to • iterative nature • human-in-the-loop
  • 35. DataPrep Pipelines and Model Training Pipelines Select Features Feature Engineering Validate & Deploy Model Experiment, Train Model Dataprep Pipeline Training and Deployment Pipeline Feature Store Airflow Airflow
  • 36. www.hops.site RISE Data Center 1 PB storage, 24 GPUs 2000 CPUs 1500+ Users Register for a free account with your student/work email address: www.hops.site
  • 37. Hopsworks Efficiency & Performance Security & GovernanceDevelopment & Operations Secure Multi-Tenancy Project-based restricted access Encryption At-Rest, In-Motion TLS/SSL everywhere AI-Asset Governance Models, experiments, data, GPUs Data/Model/Feature Lineage Discover/track dependencies Development Environment First-class Python Support Version Everything Code, Infrastructure, Data Model Serving on Kubernetes TF Serving, SkLearn End-to-End ML Pipelines Orchestrated by Airflow Feature Store Data warehouse for ML Distributed Deep Learning Faster with more GPUs HopsFS NVMe speed with Big Data Horizontally Scalable Ingestion, DataPrep, Training, Serving FS
  • 38. Acknowledgements and References Slides and Diagrams from colleagues: ● Maggy: Moritz Meister, Sina Sheikholeslami, Robin Andersson, Kim Hammar References ● HopsFS: Scaling hierarchical file system metadata …, USENIX FAST 2017. ● Size matters: Improving the performance of small files …, ACM Middleware 2018. ● ePipe: Near Real-Time Polyglot Persistence of HopsFS Metadata, CCGrid, 2019. ● Hopsworks Demo, SysML 2019.
  • 39. Systems Conferences for Machine Learning
  • 40. WASP Course on Large Scale Machine Learning ● http://wasp-sweden.org/large-scale-machine-learning-6-credits/ ○ Dr. Raazesh Sainudiin and Dr. Amir Payberah ○ Autumn 2020
  • 41. Thank you! Register for a free account at www.hops.site Twitter @logicalclocks @hopsworks GitHub https://github.com/logicalclocks/hopsworks https://github.com/hopshadoop/hops