SlideShare une entreprise Scribd logo
1  sur  74
Télécharger pour lire hors ligne
Hopsworks on GCP
Jim Dowling, CEO
Logical Clocks
8 August 2019
Google Sunnyvale
Hopsworks Technical Milestones
“If you’re working with big data and Hadoop, this one paper could repay your investment in
the Morning Paper many times over.... HopsFS is a huge win.”
- Adrian Colyer, The Morning Paper
World’s first Hadoop
platform to support
GPUs-as-a-Resource
World’s fastest
HDFS Published at
USENIX FAST with
Oracle and Spotify
World’s First
Open Source Feature
Store for Machine
Learning
World’s First
Distributed Filesystem to
store small files in
metadata on NVMe disks
Winner of IEEE
Scale
Challenge 2017
with HopsFS -
1.2m ops/sec
2017
World’s most scalable
POSIX-like Hierarchical
Filesystem with
Multi Data Center Availability
with 1.6m ops/sec on GCP
2018 2019
First non-Google ML
Platform with
TensorFlow Extended
(TFX) support through
Beam/Flink
World’s First
Unified Hyperparam
and Ablation Study
Parallel Prog.`
Framework
Evolution of Distributed Filesystems
POSIXPast
Present
NFS, HDFS S3 GCS HopsFS
Single DC,
Strongly
Consistent
Metadata
Multi-DC,
Eventually
Consistent
Metadata
Multi-DC,
Strongly
Consistent
Metadata
Object
Store
POSIX-like
Filesystem
Why HopsFS?
● Distributed FS
○ Needed for Parallel ML Experiments / Dist Training / FeatureStore
● Provenance/Free-text-search
○ Change Data Capture API
● Performance
○ > 1.6m ops/sec over 3 Azes on GCP using Spotify’s Hadoop workload
○ NVMe for small files stored in metadata
● HDFS API
○ TensorFlow/Keras/PySpark/Beam/Flink/PyTorch (Petastorm)
Hopsworks –
a platform for Data Intensive AI
built on HopsFS
Data validation
Distributed
Training
Model
Serving
A/B
Testing
Monitoring
Pipeline
Management
HyperParameter
Tuning
Feature Engineering
Data
Collection
Hardware
Management
Data Model Prediction
φ(x)
Hopsworks hides the Complexity of Deep Learning
Hopsworks
REST API
Hopsworks
Feature Store
[Adapted from Schulley et Al “Technical Debt of ML” ]
What is Hopsworks?
Elasticity & Performance Governance & ComplianceDevelopment & Operations
Secure Multi-Tenancy
Project-based restricted access
Encryption At-Rest, In-Motion
TLS/SSL everywhere
AI-Asset Governance
Models, experiments, data, GPUs
Data/Model/Feature Lineage
Discover/track dependencies
Notebooks for Development
First-class Python Support
Version Everything
Code, Infrastructure, Data
Model Serving on Kubernetes
TF Serving, MLeap, SkLearn
End-to-End ML Pipelines
Orchestrated by Airflow
Feature Store
Data warehouse for ML
Distributed Deep Learning
Faster with more GPUs
HopsFS
NVMe speed with Big Data
Horizontally Scalable
Ingestion, DataPrep,
Training, Serving
FS
Which services require Distributed Metadata (HopsFS)?
Elasticity & Performance Governance & ComplianceDevelopment & Operations
Secure Multi-Tenancy
Project-based restricted access
Encryption At-Rest, In-Motion
TLS/SSL everywhere
AI-Asset Governance
Models, experiments, data, GPUs
Data/Model/Feature Lineage
Discover/track dependencies
Notebooks for Development
First-class Python Support
Version Everything
Code, Infrastructure, Data
Model Serving on Kubernetes
TF Serving, MLeap, SkLearn
End-to-End ML Pipelines
Orchestrated by Airflow
Feature Store
Data warehouse for ML
Distributed Deep Learning
Faster with more GPUs
HopsFS
NVMe speed with Big Data
Horizontally Scalable
Ingestion, DataPrep,
Training, Serving
FS
End-to-End ML Pipelines
End-to-End ML Pipelines in Hopsworks
ML Pipelines with a
Feature Store
ML Pipelines with a Feature Store
Feature Store
These should be based on the same
feature engineering code.
Features
Training
Labels Model
Features
Inference
Model Labels
Feature Store
Application
Developer
ML
Developer
It’s not always trivial to ensure features are engineered
consistently between training and inference
Features
Training
Labels Model
Features
Inference
Model Labels
Feature Store
Feature
StorePut
Get
Get
Features
Training
Labels Model
Features
Inference
Model Labels
Feature Store
Feature
StorePut
Get
Get
Features
Training
Labels Model
Features
Inference
Model Labels
Batch App
Online App
Online or Offline Features?
On-Demand or Cached Features?
Hopsworks Feature Store
Feature Mgmt Storage Access
Statistics
Online
Features
Discovery
Offline
Features
Data Scientist
Online Apps
Data Engineer
MySQL Cluster
(Metadata,
Online Features)
Apache Hive
Columnar DB
(Offline Features)
Feature Data
Ingestion
Hopsworks Feature Store
Training Data
(S3, HDFS)
Batch Apps
Discover features,
create training data,
save models,
read online/offline/on-
demand features,
historical feature values.
Models
HopsFS
JDBC
(SAS, R, etc)
Feature
CRUD
Add/remove features,
access control,
feature data validation.
Access
Control
Time Travel
Data
Validation
Pandas or
PySpark
DataFrame
External DB
Feature Defn
select ..
FeatureStore Abstractions
Titanic
Passenger List
Feature
Groups
Features
Train/Test
Datasets
SexName PClass
Passenger
Bank Account
BalanceName
Sex BalancePClass survivename .tfrecords
.csv
.numpy
.hdf5, .petastorm, etc
Features, FeatureGroups, and Train/Test Datasets are versioned
Survive
Register a Feature Group with the Feature Store
titanic_df = # Spark or Pandas Dataframe
# Do feature engineering on ‘titanic_data’
# Register Dataframe as FeatureGroup
featurestore.create_featuregroup(titanic_df,
"titanic_data_passengers“)
Create Training Datasets using the Feature Store
sample_data = featurestore.get_features([“name”, “Pclass”,
“Sex”, “balance”])
featurestore.create_training_dataset(sample_data,
“titanic_training_dataset", data_format="tfrecords“,
training_dataset_version=1)
# Use the training dataset
dataset_dir = featurestore.get_training_dataset_path(
"titanic_training_dataset")
s = featurestore.get_training_dataset_tf_record_schema(
“titanic_training_dataset”)
End-to-End ML Pipelines in Hopsworks
TensorFlow Extended (TFX)
https://www.tensorflow.org/tfx
TensorFlow Extended Components in Hopsworks
TFX on a Flink Cluster with Portable RunnerTensorFlow Extended Components in Hopsworks
Apache Airflow to Orchestrate ML Pipelines
Apache Airflow to Orchestrate ML Pipelines
Airflow
Jobs REST API
Hopsworks Jobs:
PySpark, Spark,
Flink, Beam/Flink
What about TFX
metadata?
Explicit vs Implicit Provenance
● Explicit: TFX, MLFlow
○ Wrap existing code in components
that execute a stage in the pipeline
○ Interceptors in components inject
metadata to a metadata store as
data flows through the pipeline
○ Store metadata about artefacts and
executions
● Implicit: Hopsworks with ePipe*
DataPrep Train
HopsFS
Experiment
/Training_
Datasets
/Experiments /Models
Elasticsearch
ePipe: ChangeDataCapture API
*ePipe: Near Real-Time Polyglot Persistence of HopsFS Metadata, CCGrid, 2019.
Provenance in Hopsworks
● Implicit Provenance
○ Applications read/write files to HopsFS – infer artefacts from pathname conventions and xattrs
● Explicit Provenance in Hops API
○ Executions:
hops.experiment.grid_search(…), hops.experiment.collective_allreduce(…)
○ FeatureStore:
hops.featurestore.create_training_dataset(….)
○ Saving and deploying models:
hops.serving.export(export_path, “model_name", 1)
hops.serving.deploy(“…/model.pb”, destination)
/Experiments
● Executions add entries in /Experiments:
experiment.launch(…)
experiment.grid_search(…)
experiment.collective_allreduce(…)
experiment.lagom(…)
● /Experiments contains:
○ logs (application, tensorboard)
○ executed notebook file
○ conda environment used
○ checkpoints
/Projects/MyProj
└ Experiments
└ <app_id>
└ <type>
├─ checkpoints
├─ tensorboard_logs
├─ logfile
└─ versioned_resources
├─ notebook.ipynb
└─ conda_env.yml
Experiments
Individual Experiment Overview
/Models
● Named/versioned model
management for:
TensorFlow/Keras
Scikit Learn
● A Models dataset can be
securely shared with other
projects or the whole cluster
● The provenance API returns the
conda.yml and execution
used to train a given model
/Projects/MyProj
└ Models
└ <name>
└ <version>
├─ saved_model.pb
└─ variables/
...
Model Serving
● Model Serving on Google
Cloud AI Platform
○ Export models to Cloud
Storage buckets
● On-Premise
○ Serve models on
Kuberenetes
○ TensorFlow Serving,
Scikit-Learn
Jupyter Notebooks as
Jobs in Airflow Pipelines
Principles of Development with Notebooks
● No throwaway code
● Code can be run either in Notebooks or as Jobs (in Pipelines)
● Notebooks/jobs should be parameterizable
● No external configuration for program logic
○ HPARAMs are part of the program
● Core training loop should be the same for all stages of development
○ Data exploration, hparam tuning, dist training,
Problem in ML Pipeline Development Process?
Explore Data,
Train model
Hyperparam
Opt.
Distributed
Training
Notebook Python + YML in Gitport/rewrite code
Iteration is hard/impossible/a-bad-idea
Notebooks offer more than Prototyping
● Familiar web-based development
environment
● Interactive development/debugging
● Reporting platform for ML
applications (Papermill by Netflix)
● Parameterizable as part of Pipelines
(Papermill by Netflix)
Disclaimer: Notebooks are not for everyone
ML Pipelines of Jupyter Notebooks with Airflow
Select
Features,
File Format
Feature
Engineering
Validate &
Deploy Model
Experiment,
Train Model
Airflow
Dataprep Pipeline Training and Deployment Pipeline
Feature
Store
PySpark Notebooks as Jobs in ML Pipelines
Running
TensorFlow/Keras/PyTorch
Apps in PySpark
Warning: micro-exposure to PySpark may cure you of distributed programming phobia
End-to-End ML Pipelines in Hopsworks
GPU(s) in PySpark Executor, Driver coordinates
PySpark makes it easier to
write TensorFlow/Keras/
PyTorch code that can either
be run on a single GPU or
scale to run on lots of GPUS
for Parallel Experiments or
Distributed Training.
Executor Executor
Driver
Executor 1 Executor N Driver
HopsFS
• Training/Test Datasets
• Model checkpoints, Trained Models
• Experiment run data
• Provenance data
• Application logs
Need Distributed Filesystem for Coordination
Model
Serving
TensorBoard
PySpark Hello World
1
*
Executor
print(“Hello from GPU”)
Driver
experiment.launch(..)
PySpark – Hello World
Leave code unchanged, but configure 4 Executors
print(“Hello
from GPU”)
Driver
print(“Hello
from GPU”)
print(“Hello
from GPU”)
print(“Hello
from GPU”)
Driver with 4 Executors
Same/Replica Conda Environment on all Executors
conda_env
conda_envconda_envconda_envconda_env
A Conda Environment Per Project in Hopsworks
Use Pip or Conda to install Python libraries
TensorFlow Distributed Training with PySpark
def train():
# Separate shard of dataset per worker
# create Estimator w/ DistribStrategy
# as CollectiveAllReduce
# train model, evaluate
return loss
# Driver code below here
# builds TF_CONFIG and shares to workers
from hops import experiment
experiment.collective_allreduce(train)
More details: http//github.com/logicalclocks/hops-examples
Undirected Hyperparam Search with PySpark
def train(dropout):
# Same dataset for all workers
# create model and optimizer
# add this worker’s value of dropout
# train model and evaluate
return loss
# Driver code below here
from hops import experiment
args={“dropout”:[0.1, 0.4, 0.8]}
experiment.grid_search(train,args)
More details: http//github.com/logicalclocks/hops-examples
Directed Hyperparam Search with PySpark
def train(dropout):
# Same dataset for all workers
# create model and optimizer
optimizer.apply(dropout)
# train model and evaluate
return loss
from hops import experiment
args={“dropout”: “0.1-0.8”}
experiment.diff_ev(train,args)
More details: http//github.com/logicalclocks/hops-examples
Wasted Compute!
Parallel ML Experiments with
PySpark and Maggy
PClass Sexname survive
Iterative Model Development
Dataset
Machine
Learning
Model
Optimizer
Evaluate
Problem Definition
Data Preparation
Model Selection
Hyperparameters
Repeat if
needed
Model Training
Maggy: Unified Hparam Opt & Ablation Programming
Machine
Learning
System
Hyperparameter
Optimizer
New Hyperparameter Values
Evaluate
New Dataset/Model-Architecture
Ablation Study
Controller
Synchronous or
Asynchronous Trials
Directed or
Undirected Search
User-Defined
Search/Optimizers
Maggy – Async, Parallel ML experiments using PySpark
● Experiment-Driven, interactive
development of ML applications
● Parallel Experimentation
○ Hyperparameter Optimization
○ Ablation Studies
○ Leave-one-Feature-out
● Interactive Debugging
○ Experiment/executor logs shown both
in Jupyter notebooks and logging
Feature Ablation
● Uses the Feature Store to access the dataset metadata
● Generates Python callables that once called will create a dataset
○ Removes one-feature-at-a-time
Sexname survivePClass Sexname survive
Layer Ablation
● Uses a base model function
● Generates Python callables that once called with create a modified model
○ Uses the model configuration to find and remove layer(s)
○ Removes one-layer-at-a-time (or one-layer-group-at-a-time)
User API: Initialize the Study and Add Features
Driver Code
to setup Ablation
Experiment
User API: Base Model and Ablation Exp. Setup
User API: Wrap the Training Function
Interactive Debugging: Print Executor Logs in Jupyter
Notebooks
Printing Worker Logs in Jupyter Notebooks
Hopsworks on GCP
● Hopsworks available as an Image for GCP
○ Blog post coming soon with details
● Next Steps
○ Cloud Native Features
■ Elastic add/remove resources (compute/GPU)
■ API integration with Google Model serving
That was Hopsworks
Efficiency & Performance Security & GovernanceDevelopment & Operations
Secure Multi-Tenancy
Project-based restricted access
Encryption At-Rest, In-Motion
TLS/SSL everywhere
AI-Asset Governance
Models, experiments, data, GPUs
Data/Model/Feature Lineage
Discover/track dependencies
Development Environment
First-class Python Support
Version Everything
Code, Infrastructure, Data
Model Serving on Kubernetes
TF Serving, SkLearn
End-to-End ML Pipelines
Orchestrated by Airflow
Feature Store
Data warehouse for ML
Distributed Deep Learning
Faster with more GPUs
HopsFS
NVMe speed with Big Data
Horizontally Scalable
Ingestion, DataPrep,
Training, Serving
FS
Hopsworks 1.0 coming soon
Hopsworks-1.0
● Beam 2.14.0
● Flink 1.8.1
● Spark 2.4.3
● TensorFlow 1.14.0
● TFX 0.13
● TensorFlow Model Analysis 0.13.2
● PyTorch 1.1
● ROCm 2.6, Cuda 10.X
Acknowledgements and References
Slides and Diagrams from colleagues:
● Maggy: Moritz Meister and Sina Sheikholeslami
● Feature Store: Kim Hammar
● Beam/Flink on Hopsworks: Theofilos Kakantousis
References
● HopsFS: Scaling hierarchical file system metadata …, USENIX FAST 2017.
● Size matters: Improving the performance of small files …, ACM Middleware 2018.
● ePipe: Near Real-Time Polyglot Persistence of HopsFS Metadata, CCGrid, 2019.
● Hopsworks Demo, SysML 2019.
Thank you!
470 Ramona St
Palo Alto
https://www.logicalclocks.com
Register for a free account at
www.hops.site
Twitter
@logicalclocks
@hopsworks
GitHub
https://github.com/logicalclocks/hopsworks
https://github.com/hopshadoop/hops

Contenu connexe

Tendances

Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019Kim Hammar
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSatish Mohan
 
Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingJim Dowling
 
Hopsworks data engineering melbourne april 2020
Hopsworks   data engineering melbourne april 2020Hopsworks   data engineering melbourne april 2020
Hopsworks data engineering melbourne april 2020Jim Dowling
 
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptxDowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptxLex Avstreikh
 
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...Srivatsan Ramanujam
 
Flux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / PipelineFlux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / PipelineJan Wiegelmann
 
Kim Hammar - Feature Store: the missing data layer in ML pipelines? - HopsML ...
Kim Hammar - Feature Store: the missing data layer in ML pipelines? - HopsML ...Kim Hammar - Feature Store: the missing data layer in ML pipelines? - HopsML ...
Kim Hammar - Feature Store: the missing data layer in ML pipelines? - HopsML ...Kim Hammar
 
Jfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJim Dowling
 
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal Srivatsan Ramanujam
 
Functional Programming and Big Data
Functional Programming and Big DataFunctional Programming and Big Data
Functional Programming and Big DataDataWorks Summit
 
Parallel Linear Regression in Interative Reduce and YARN
Parallel Linear Regression in Interative Reduce and YARNParallel Linear Regression in Interative Reduce and YARN
Parallel Linear Regression in Interative Reduce and YARNDataWorks Summit
 
Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
Distributed Deep Learning with Apache Spark and TensorFlow with Jim DowlingDistributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
Distributed Deep Learning with Apache Spark and TensorFlow with Jim DowlingDatabricks
 
Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Jim Dowling
 
END-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACKEND-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACKJan Wiegelmann
 
Managed Feature Store for Machine Learning
Managed Feature Store for Machine LearningManaged Feature Store for Machine Learning
Managed Feature Store for Machine LearningLogical Clocks
 
Python Powered Data Science at Pivotal (PyData 2013)
Python Powered Data Science at Pivotal (PyData 2013)Python Powered Data Science at Pivotal (PyData 2013)
Python Powered Data Science at Pivotal (PyData 2013)Srivatsan Ramanujam
 
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...Srivatsan Ramanujam
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016MLconf
 

Tendances (20)

Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowling
 
Hopsworks data engineering melbourne april 2020
Hopsworks   data engineering melbourne april 2020Hopsworks   data engineering melbourne april 2020
Hopsworks data engineering melbourne april 2020
 
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptxDowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
 
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
 
Flux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / PipelineFlux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / Pipeline
 
Kim Hammar - Feature Store: the missing data layer in ML pipelines? - HopsML ...
Kim Hammar - Feature Store: the missing data layer in ML pipelines? - HopsML ...Kim Hammar - Feature Store: the missing data layer in ML pipelines? - HopsML ...
Kim Hammar - Feature Store: the missing data layer in ML pipelines? - HopsML ...
 
Jfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocks
 
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
 
Functional Programming and Big Data
Functional Programming and Big DataFunctional Programming and Big Data
Functional Programming and Big Data
 
Parallel Linear Regression in Interative Reduce and YARN
Parallel Linear Regression in Interative Reduce and YARNParallel Linear Regression in Interative Reduce and YARN
Parallel Linear Regression in Interative Reduce and YARN
 
Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
Distributed Deep Learning with Apache Spark and TensorFlow with Jim DowlingDistributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
 
Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)
 
20160908 hivemall meetup
20160908 hivemall meetup20160908 hivemall meetup
20160908 hivemall meetup
 
END-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACKEND-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACK
 
Managed Feature Store for Machine Learning
Managed Feature Store for Machine LearningManaged Feature Store for Machine Learning
Managed Feature Store for Machine Learning
 
Python Powered Data Science at Pivotal (PyData 2013)
Python Powered Data Science at Pivotal (PyData 2013)Python Powered Data Science at Pivotal (PyData 2013)
Python Powered Data Science at Pivotal (PyData 2013)
 
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
 

Similaire à Hopsworks at Google AI Huddle, Sunnyvale

Hopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AIHopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AIQAware GmbH
 
End-to-End ML pipelines with Beam, Flink, TensorFlow and Hopsworks.
End-to-End ML pipelines with Beam, Flink, TensorFlow and Hopsworks.End-to-End ML pipelines with Beam, Flink, TensorFlow and Hopsworks.
End-to-End ML pipelines with Beam, Flink, TensorFlow and Hopsworks.Theofilos Kakantousis
 
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsTensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsStijn Decubber
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Jason Dai
 
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...Bill Liu
 
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...Big Data Value Association
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Jim Dowling
 
running Tensorflow in Production
running Tensorflow in Productionrunning Tensorflow in Production
running Tensorflow in ProductionMatthias Feys
 
Building an MLOps Stack for Companies at Reasonable Scale
Building an MLOps Stack for Companies at Reasonable ScaleBuilding an MLOps Stack for Companies at Reasonable Scale
Building an MLOps Stack for Companies at Reasonable ScaleMerelda
 
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated SystemsPetapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated Systemsdairsie
 
ApacheCon 2021 Apache Deep Learning 302
ApacheCon 2021   Apache Deep Learning 302ApacheCon 2021   Apache Deep Learning 302
ApacheCon 2021 Apache Deep Learning 302Timothy Spann
 
Hopsworks Feature Store 2.0 a new paradigm
Hopsworks Feature Store  2.0   a new paradigmHopsworks Feature Store  2.0   a new paradigm
Hopsworks Feature Store 2.0 a new paradigmJim Dowling
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Hail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open sourceHail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open sourceTimothy Spann
 
Deep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDeep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDataWorks Summit
 
Metaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at NetflixMetaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at NetflixBill Liu
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowKaxil Naik
 
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Akash Tandon
 
Data Science und Machine Learning im Kubernetes-Ökosystem
Data Science und Machine Learning im Kubernetes-ÖkosystemData Science und Machine Learning im Kubernetes-Ökosystem
Data Science und Machine Learning im Kubernetes-Ökosysteminovex GmbH
 

Similaire à Hopsworks at Google AI Huddle, Sunnyvale (20)

Hopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AIHopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AI
 
End-to-End ML pipelines with Beam, Flink, TensorFlow and Hopsworks.
End-to-End ML pipelines with Beam, Flink, TensorFlow and Hopsworks.End-to-End ML pipelines with Beam, Flink, TensorFlow and Hopsworks.
End-to-End ML pipelines with Beam, Flink, TensorFlow and Hopsworks.
 
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsTensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
 
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
 
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks
 
running Tensorflow in Production
running Tensorflow in Productionrunning Tensorflow in Production
running Tensorflow in Production
 
Building an MLOps Stack for Companies at Reasonable Scale
Building an MLOps Stack for Companies at Reasonable ScaleBuilding an MLOps Stack for Companies at Reasonable Scale
Building an MLOps Stack for Companies at Reasonable Scale
 
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated SystemsPetapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
 
ApacheCon 2021 Apache Deep Learning 302
ApacheCon 2021   Apache Deep Learning 302ApacheCon 2021   Apache Deep Learning 302
ApacheCon 2021 Apache Deep Learning 302
 
Hopsworks Feature Store 2.0 a new paradigm
Hopsworks Feature Store  2.0   a new paradigmHopsworks Feature Store  2.0   a new paradigm
Hopsworks Feature Store 2.0 a new paradigm
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Hail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open sourceHail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open source
 
Deep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDeep Learning with Spark and GPUs
Deep Learning with Spark and GPUs
 
Metaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at NetflixMetaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at Netflix
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
 
03_aiops-1.pptx
03_aiops-1.pptx03_aiops-1.pptx
03_aiops-1.pptx
 
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
 
Data Science und Machine Learning im Kubernetes-Ökosystem
Data Science und Machine Learning im Kubernetes-ÖkosystemData Science und Machine Learning im Kubernetes-Ökosystem
Data Science und Machine Learning im Kubernetes-Ökosystem
 

Plus de Jim Dowling

ARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfJim Dowling
 
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfPyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfJim Dowling
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleJim Dowling
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfJim Dowling
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdfJim Dowling
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Jim Dowling
 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022Jim Dowling
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupJim Dowling
 
Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Jim Dowling
 
GANs for Anti Money Laundering
GANs for Anti Money LaunderingGANs for Anti Money Laundering
GANs for Anti Money LaunderingJim Dowling
 
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityInvited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityJim Dowling
 
Berlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on HopsBerlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on HopsJim Dowling
 
All AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AIAll AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AIJim Dowling
 
End-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in FinanceEnd-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in FinanceJim Dowling
 
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa ClaraScaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa ClaraJim Dowling
 
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUsScaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUsJim Dowling
 
Odsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on HopsOdsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on HopsJim Dowling
 

Plus de Jim Dowling (17)

ARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdf
 
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfPyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdf
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData Seattle
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning
 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science Meetup
 
Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21
 
GANs for Anti Money Laundering
GANs for Anti Money LaunderingGANs for Anti Money Laundering
GANs for Anti Money Laundering
 
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityInvited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
 
Berlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on HopsBerlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on Hops
 
All AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AIAll AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AI
 
End-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in FinanceEnd-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in Finance
 
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa ClaraScaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
 
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUsScaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
 
Odsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on HopsOdsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on Hops
 

Dernier

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 

Dernier (20)

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 

Hopsworks at Google AI Huddle, Sunnyvale

  • 1. Hopsworks on GCP Jim Dowling, CEO Logical Clocks 8 August 2019 Google Sunnyvale
  • 2. Hopsworks Technical Milestones “If you’re working with big data and Hadoop, this one paper could repay your investment in the Morning Paper many times over.... HopsFS is a huge win.” - Adrian Colyer, The Morning Paper World’s first Hadoop platform to support GPUs-as-a-Resource World’s fastest HDFS Published at USENIX FAST with Oracle and Spotify World’s First Open Source Feature Store for Machine Learning World’s First Distributed Filesystem to store small files in metadata on NVMe disks Winner of IEEE Scale Challenge 2017 with HopsFS - 1.2m ops/sec 2017 World’s most scalable POSIX-like Hierarchical Filesystem with Multi Data Center Availability with 1.6m ops/sec on GCP 2018 2019 First non-Google ML Platform with TensorFlow Extended (TFX) support through Beam/Flink World’s First Unified Hyperparam and Ablation Study Parallel Prog.` Framework
  • 3. Evolution of Distributed Filesystems POSIXPast Present NFS, HDFS S3 GCS HopsFS Single DC, Strongly Consistent Metadata Multi-DC, Eventually Consistent Metadata Multi-DC, Strongly Consistent Metadata Object Store POSIX-like Filesystem
  • 4. Why HopsFS? ● Distributed FS ○ Needed for Parallel ML Experiments / Dist Training / FeatureStore ● Provenance/Free-text-search ○ Change Data Capture API ● Performance ○ > 1.6m ops/sec over 3 Azes on GCP using Spotify’s Hadoop workload ○ NVMe for small files stored in metadata ● HDFS API ○ TensorFlow/Keras/PySpark/Beam/Flink/PyTorch (Petastorm)
  • 5. Hopsworks – a platform for Data Intensive AI built on HopsFS
  • 6. Data validation Distributed Training Model Serving A/B Testing Monitoring Pipeline Management HyperParameter Tuning Feature Engineering Data Collection Hardware Management Data Model Prediction φ(x) Hopsworks hides the Complexity of Deep Learning Hopsworks REST API Hopsworks Feature Store [Adapted from Schulley et Al “Technical Debt of ML” ]
  • 7.
  • 8.
  • 9.
  • 10.
  • 11. What is Hopsworks? Elasticity & Performance Governance & ComplianceDevelopment & Operations Secure Multi-Tenancy Project-based restricted access Encryption At-Rest, In-Motion TLS/SSL everywhere AI-Asset Governance Models, experiments, data, GPUs Data/Model/Feature Lineage Discover/track dependencies Notebooks for Development First-class Python Support Version Everything Code, Infrastructure, Data Model Serving on Kubernetes TF Serving, MLeap, SkLearn End-to-End ML Pipelines Orchestrated by Airflow Feature Store Data warehouse for ML Distributed Deep Learning Faster with more GPUs HopsFS NVMe speed with Big Data Horizontally Scalable Ingestion, DataPrep, Training, Serving FS
  • 12. Which services require Distributed Metadata (HopsFS)? Elasticity & Performance Governance & ComplianceDevelopment & Operations Secure Multi-Tenancy Project-based restricted access Encryption At-Rest, In-Motion TLS/SSL everywhere AI-Asset Governance Models, experiments, data, GPUs Data/Model/Feature Lineage Discover/track dependencies Notebooks for Development First-class Python Support Version Everything Code, Infrastructure, Data Model Serving on Kubernetes TF Serving, MLeap, SkLearn End-to-End ML Pipelines Orchestrated by Airflow Feature Store Data warehouse for ML Distributed Deep Learning Faster with more GPUs HopsFS NVMe speed with Big Data Horizontally Scalable Ingestion, DataPrep, Training, Serving FS
  • 14. End-to-End ML Pipelines in Hopsworks
  • 15. ML Pipelines with a Feature Store
  • 16. ML Pipelines with a Feature Store
  • 17. Feature Store These should be based on the same feature engineering code. Features Training Labels Model Features Inference Model Labels
  • 18. Feature Store Application Developer ML Developer It’s not always trivial to ensure features are engineered consistently between training and inference Features Training Labels Model Features Inference Model Labels
  • 20. Feature Store Feature StorePut Get Get Features Training Labels Model Features Inference Model Labels Batch App Online App Online or Offline Features? On-Demand or Cached Features?
  • 21. Hopsworks Feature Store Feature Mgmt Storage Access Statistics Online Features Discovery Offline Features Data Scientist Online Apps Data Engineer MySQL Cluster (Metadata, Online Features) Apache Hive Columnar DB (Offline Features) Feature Data Ingestion Hopsworks Feature Store Training Data (S3, HDFS) Batch Apps Discover features, create training data, save models, read online/offline/on- demand features, historical feature values. Models HopsFS JDBC (SAS, R, etc) Feature CRUD Add/remove features, access control, feature data validation. Access Control Time Travel Data Validation Pandas or PySpark DataFrame External DB Feature Defn select ..
  • 22. FeatureStore Abstractions Titanic Passenger List Feature Groups Features Train/Test Datasets SexName PClass Passenger Bank Account BalanceName Sex BalancePClass survivename .tfrecords .csv .numpy .hdf5, .petastorm, etc Features, FeatureGroups, and Train/Test Datasets are versioned Survive
  • 23. Register a Feature Group with the Feature Store titanic_df = # Spark or Pandas Dataframe # Do feature engineering on ‘titanic_data’ # Register Dataframe as FeatureGroup featurestore.create_featuregroup(titanic_df, "titanic_data_passengers“)
  • 24. Create Training Datasets using the Feature Store sample_data = featurestore.get_features([“name”, “Pclass”, “Sex”, “balance”]) featurestore.create_training_dataset(sample_data, “titanic_training_dataset", data_format="tfrecords“, training_dataset_version=1) # Use the training dataset dataset_dir = featurestore.get_training_dataset_path( "titanic_training_dataset") s = featurestore.get_training_dataset_tf_record_schema( “titanic_training_dataset”)
  • 25. End-to-End ML Pipelines in Hopsworks
  • 27. TFX on a Flink Cluster with Portable RunnerTensorFlow Extended Components in Hopsworks
  • 28. Apache Airflow to Orchestrate ML Pipelines
  • 29. Apache Airflow to Orchestrate ML Pipelines Airflow Jobs REST API Hopsworks Jobs: PySpark, Spark, Flink, Beam/Flink
  • 31. Explicit vs Implicit Provenance ● Explicit: TFX, MLFlow ○ Wrap existing code in components that execute a stage in the pipeline ○ Interceptors in components inject metadata to a metadata store as data flows through the pipeline ○ Store metadata about artefacts and executions ● Implicit: Hopsworks with ePipe* DataPrep Train HopsFS Experiment /Training_ Datasets /Experiments /Models Elasticsearch ePipe: ChangeDataCapture API *ePipe: Near Real-Time Polyglot Persistence of HopsFS Metadata, CCGrid, 2019.
  • 32. Provenance in Hopsworks ● Implicit Provenance ○ Applications read/write files to HopsFS – infer artefacts from pathname conventions and xattrs ● Explicit Provenance in Hops API ○ Executions: hops.experiment.grid_search(…), hops.experiment.collective_allreduce(…) ○ FeatureStore: hops.featurestore.create_training_dataset(….) ○ Saving and deploying models: hops.serving.export(export_path, “model_name", 1) hops.serving.deploy(“…/model.pb”, destination)
  • 33. /Experiments ● Executions add entries in /Experiments: experiment.launch(…) experiment.grid_search(…) experiment.collective_allreduce(…) experiment.lagom(…) ● /Experiments contains: ○ logs (application, tensorboard) ○ executed notebook file ○ conda environment used ○ checkpoints /Projects/MyProj └ Experiments └ <app_id> └ <type> ├─ checkpoints ├─ tensorboard_logs ├─ logfile └─ versioned_resources ├─ notebook.ipynb └─ conda_env.yml
  • 36. /Models ● Named/versioned model management for: TensorFlow/Keras Scikit Learn ● A Models dataset can be securely shared with other projects or the whole cluster ● The provenance API returns the conda.yml and execution used to train a given model /Projects/MyProj └ Models └ <name> └ <version> ├─ saved_model.pb └─ variables/ ...
  • 37. Model Serving ● Model Serving on Google Cloud AI Platform ○ Export models to Cloud Storage buckets ● On-Premise ○ Serve models on Kuberenetes ○ TensorFlow Serving, Scikit-Learn
  • 38. Jupyter Notebooks as Jobs in Airflow Pipelines
  • 39. Principles of Development with Notebooks ● No throwaway code ● Code can be run either in Notebooks or as Jobs (in Pipelines) ● Notebooks/jobs should be parameterizable ● No external configuration for program logic ○ HPARAMs are part of the program ● Core training loop should be the same for all stages of development ○ Data exploration, hparam tuning, dist training,
  • 40. Problem in ML Pipeline Development Process? Explore Data, Train model Hyperparam Opt. Distributed Training Notebook Python + YML in Gitport/rewrite code Iteration is hard/impossible/a-bad-idea
  • 41. Notebooks offer more than Prototyping ● Familiar web-based development environment ● Interactive development/debugging ● Reporting platform for ML applications (Papermill by Netflix) ● Parameterizable as part of Pipelines (Papermill by Netflix) Disclaimer: Notebooks are not for everyone
  • 42. ML Pipelines of Jupyter Notebooks with Airflow Select Features, File Format Feature Engineering Validate & Deploy Model Experiment, Train Model Airflow Dataprep Pipeline Training and Deployment Pipeline Feature Store
  • 43. PySpark Notebooks as Jobs in ML Pipelines
  • 44. Running TensorFlow/Keras/PyTorch Apps in PySpark Warning: micro-exposure to PySpark may cure you of distributed programming phobia
  • 45. End-to-End ML Pipelines in Hopsworks
  • 46. GPU(s) in PySpark Executor, Driver coordinates PySpark makes it easier to write TensorFlow/Keras/ PyTorch code that can either be run on a single GPU or scale to run on lots of GPUS for Parallel Experiments or Distributed Training. Executor Executor Driver
  • 47. Executor 1 Executor N Driver HopsFS • Training/Test Datasets • Model checkpoints, Trained Models • Experiment run data • Provenance data • Application logs Need Distributed Filesystem for Coordination Model Serving TensorBoard
  • 50. Leave code unchanged, but configure 4 Executors print(“Hello from GPU”) Driver print(“Hello from GPU”) print(“Hello from GPU”) print(“Hello from GPU”)
  • 51. Driver with 4 Executors
  • 52. Same/Replica Conda Environment on all Executors conda_env conda_envconda_envconda_envconda_env
  • 53. A Conda Environment Per Project in Hopsworks
  • 54. Use Pip or Conda to install Python libraries
  • 55. TensorFlow Distributed Training with PySpark def train(): # Separate shard of dataset per worker # create Estimator w/ DistribStrategy # as CollectiveAllReduce # train model, evaluate return loss # Driver code below here # builds TF_CONFIG and shares to workers from hops import experiment experiment.collective_allreduce(train) More details: http//github.com/logicalclocks/hops-examples
  • 56. Undirected Hyperparam Search with PySpark def train(dropout): # Same dataset for all workers # create model and optimizer # add this worker’s value of dropout # train model and evaluate return loss # Driver code below here from hops import experiment args={“dropout”:[0.1, 0.4, 0.8]} experiment.grid_search(train,args) More details: http//github.com/logicalclocks/hops-examples
  • 57. Directed Hyperparam Search with PySpark def train(dropout): # Same dataset for all workers # create model and optimizer optimizer.apply(dropout) # train model and evaluate return loss from hops import experiment args={“dropout”: “0.1-0.8”} experiment.diff_ev(train,args) More details: http//github.com/logicalclocks/hops-examples
  • 59. Parallel ML Experiments with PySpark and Maggy
  • 60. PClass Sexname survive Iterative Model Development Dataset Machine Learning Model Optimizer Evaluate Problem Definition Data Preparation Model Selection Hyperparameters Repeat if needed Model Training
  • 61. Maggy: Unified Hparam Opt & Ablation Programming Machine Learning System Hyperparameter Optimizer New Hyperparameter Values Evaluate New Dataset/Model-Architecture Ablation Study Controller Synchronous or Asynchronous Trials Directed or Undirected Search User-Defined Search/Optimizers
  • 62. Maggy – Async, Parallel ML experiments using PySpark ● Experiment-Driven, interactive development of ML applications ● Parallel Experimentation ○ Hyperparameter Optimization ○ Ablation Studies ○ Leave-one-Feature-out ● Interactive Debugging ○ Experiment/executor logs shown both in Jupyter notebooks and logging
  • 63. Feature Ablation ● Uses the Feature Store to access the dataset metadata ● Generates Python callables that once called will create a dataset ○ Removes one-feature-at-a-time Sexname survivePClass Sexname survive
  • 64. Layer Ablation ● Uses a base model function ● Generates Python callables that once called with create a modified model ○ Uses the model configuration to find and remove layer(s) ○ Removes one-layer-at-a-time (or one-layer-group-at-a-time)
  • 65. User API: Initialize the Study and Add Features
  • 66. Driver Code to setup Ablation Experiment User API: Base Model and Ablation Exp. Setup
  • 67. User API: Wrap the Training Function
  • 68. Interactive Debugging: Print Executor Logs in Jupyter Notebooks
  • 69. Printing Worker Logs in Jupyter Notebooks
  • 70. Hopsworks on GCP ● Hopsworks available as an Image for GCP ○ Blog post coming soon with details ● Next Steps ○ Cloud Native Features ■ Elastic add/remove resources (compute/GPU) ■ API integration with Google Model serving
  • 71. That was Hopsworks Efficiency & Performance Security & GovernanceDevelopment & Operations Secure Multi-Tenancy Project-based restricted access Encryption At-Rest, In-Motion TLS/SSL everywhere AI-Asset Governance Models, experiments, data, GPUs Data/Model/Feature Lineage Discover/track dependencies Development Environment First-class Python Support Version Everything Code, Infrastructure, Data Model Serving on Kubernetes TF Serving, SkLearn End-to-End ML Pipelines Orchestrated by Airflow Feature Store Data warehouse for ML Distributed Deep Learning Faster with more GPUs HopsFS NVMe speed with Big Data Horizontally Scalable Ingestion, DataPrep, Training, Serving FS
  • 72. Hopsworks 1.0 coming soon Hopsworks-1.0 ● Beam 2.14.0 ● Flink 1.8.1 ● Spark 2.4.3 ● TensorFlow 1.14.0 ● TFX 0.13 ● TensorFlow Model Analysis 0.13.2 ● PyTorch 1.1 ● ROCm 2.6, Cuda 10.X
  • 73. Acknowledgements and References Slides and Diagrams from colleagues: ● Maggy: Moritz Meister and Sina Sheikholeslami ● Feature Store: Kim Hammar ● Beam/Flink on Hopsworks: Theofilos Kakantousis References ● HopsFS: Scaling hierarchical file system metadata …, USENIX FAST 2017. ● Size matters: Improving the performance of small files …, ACM Middleware 2018. ● ePipe: Near Real-Time Polyglot Persistence of HopsFS Metadata, CCGrid, 2019. ● Hopsworks Demo, SysML 2019.
  • 74. Thank you! 470 Ramona St Palo Alto https://www.logicalclocks.com Register for a free account at www.hops.site Twitter @logicalclocks @hopsworks GitHub https://github.com/logicalclocks/hopsworks https://github.com/hopshadoop/hops