SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
Distributed Deep
Learning with Docker
at Salesforce
Jeff Hajewski
Software Engineer,
Salesforce.
github.com/j-haj
jeff-hajewski-3a1b5a29
Caveats
● These my own views and opinions, not those of Salesforce
● This is how one team at Salesforce deploys deep learning
models
● When I use the term Docker I am referring to the
technology, not the company
● Some of these designs are simplified
● What is deep learning and why is it difficult?
● Deep learning at Salesforce
● Challenges
○ Designing for team specialization
○ Interacting with the model server
○ Testing
● Key takeaways
About this talk
The core task of deep learning is function approximation.
Neural networks can approximate any function.
Neural networks are expensive to evaluate.
● Linear regression: ~1,000 parameters
● Deep neural network: 100M - 1B parameters (100,000 - 1M x linear reg.)
Deep Learning Review
How should we design
distributed systems
for deep learning?
high latency tasks
We use deep learning models to provide our customers
useful information about their sales process.
They send us this data as a firehose of streaming data.
The faster we get this data to our customers, the more
useful and actionable it is for their sales teams.
Deep Learning at Salesforce
There are three steps to this process
1. Preprocessing - cleaning and formatting the data
2. Inference - running the data through the model
3. Postprocessing - interpreting the output from the model
Deep Learning at Salesforce
Deep Learning at Salesforce
Discusses
cat
preprocess
[0.2, 0.71, 0.89, 0.6]
[0.85, 0.15]
inference
postprocess“Hello! My
cat is
friendly.”
Deep Learning at Salesforce
Discusses
cat
preprocess
[0.2, 0.71, 0.89, 0.6]
[0.85, 0.15]
inference
postprocess“Hello! My
cat is
friendly.”
Data Science Team
Deep Learning at Salesforce
Discusses
cat
preprocess
[0.2, 0.71, 0.89, 0.6]
[0.85, 0.15]
inference
postprocess“Hello! My
cat is
friendly.”
Data Science Team Systems Team
Challenge 1:
designing
for team
specialization
Requirements
1. The data science team shouldn’t need to know
about the system. They just want to define a
sequence of computation.
2. The systems engineers shouldn’t need to know
anything about the computation. They just want to
scale the system.
Designing for team specialization
Challenges
1. Some functions takes longer to execute than others
(e.g., model inference)
2. The order of execution is important
Designing for team specialization
Solution: map functions to containers
Designing for team specialization
postprocess(inference(preprocess(x)))
preprocess inference postprocess
What about throughput?
preprocess inference postprocess
0110010
1001111
0110010
1000011
It’s a cat!
1,000
QPS
500
QPS
300
QPS
1,000
QPS
Max
throughput
inference
inferenceinferencepreprocess
What about throughput?
preprocess inference postprocess
0110010
1001111
0110010
1000011
It’s a cat!
1,000
QPS
2x
500
QPS
4x
300
QPS
1,000
QPS
Max
throughput
Docker enables us to easily scale out each individual stage
inferenceinferencepreprocess
What about throughput?
preprocess inference postprocess
0110010
1001111
0110010
1000011
It’s a cat!
1,000
QPS
2x
500
QPS
2x
300
QPS
1,000
QPS
Max
throughput
Kafka gives stage-wise checkpointing
Challenge 2:
interacting
with the
model servers
Model servers provide a way to query the model,
typically via gRPC or HTTP.
What is the best way to deploy and interact with these
model servers?
Serving deep learning models
Challenge:
1. Model servers are designed as a standalone process.
2. How should we best utilize multiple GPUs?
3. What about networking?
Interacting with the model server
We want to keep deployment simple!
Solution: Deploy model server images as part of a “pod” or
“group” with a coordinator service
Interacting with the model server
JVM
Manager
Model Server Model Server Model Server...
Pod
1. Who owns the model server?
2. How should we handle model versions? Where are they
stored locally?
3. What are the addresses of the model servers?
This solves additional challenges
Data science team
Docker shared volume
http://localhost via Docker private networking
Challenge 3:
testing
Challenge: how should we test these systems?
1. Deep learning models are probabilistic
2. Interservice interactions can be quite complex
Testing
Solution: Docker Compose
● Makes it easy to swap out the model server with a mock
service
● Deploying the entire system locally is easy
● Integrates well with Maven and Gradle
Testing
We haven’t spent a lot of time
discussing the details of Docker
That is precisely the point!
● Docker allows us to simplify many aspects of our design.
● Docker stays out of the way.
● Docker provides a simple alternative to a much more
complex solution.
Docker simplifies our lives

Contenu connexe

Tendances

Predicting Space Weather with Docker
Predicting Space Weather with DockerPredicting Space Weather with Docker
Predicting Space Weather with DockerDocker, Inc.
 
DCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at NetflixDCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at NetflixDocker, Inc.
 
Using the SDACK Architecture on Security Event Inspection by Yu-Lun Chen and ...
Using the SDACK Architecture on Security Event Inspection by Yu-Lun Chen and ...Using the SDACK Architecture on Security Event Inspection by Yu-Lun Chen and ...
Using the SDACK Architecture on Security Event Inspection by Yu-Lun Chen and ...Docker, Inc.
 
Puppet plugin for vRealize Automation (vRA)
Puppet plugin for vRealize Automation (vRA)Puppet plugin for vRealize Automation (vRA)
Puppet plugin for vRealize Automation (vRA)Puppet
 
Simple tweaks to get the most out of your JVM
Simple tweaks to get the most out of your JVMSimple tweaks to get the most out of your JVM
Simple tweaks to get the most out of your JVMJamie Coleman
 
Webinar: Accelerate Your Inner Dev Loop for Kubernetes Services
Webinar: Accelerate Your Inner Dev Loop for Kubernetes Services Webinar: Accelerate Your Inner Dev Loop for Kubernetes Services
Webinar: Accelerate Your Inner Dev Loop for Kubernetes Services Ambassador Labs
 
DCSF 19 How Entergy is Mitigating Legacy Windows Operating System Vulnerabili...
DCSF 19 How Entergy is Mitigating Legacy Windows Operating System Vulnerabili...DCSF 19 How Entergy is Mitigating Legacy Windows Operating System Vulnerabili...
DCSF 19 How Entergy is Mitigating Legacy Windows Operating System Vulnerabili...Docker, Inc.
 
DCSF 19 Microservices API: Routing Across Any Infrastructure
DCSF 19 Microservices API: Routing Across Any InfrastructureDCSF 19 Microservices API: Routing Across Any Infrastructure
DCSF 19 Microservices API: Routing Across Any InfrastructureDocker, Inc.
 
DCSF 19 Developing Apps with Containers, Functions and Cloud Services
DCSF 19 Developing Apps with Containers, Functions and Cloud ServicesDCSF 19 Developing Apps with Containers, Functions and Cloud Services
DCSF 19 Developing Apps with Containers, Functions and Cloud ServicesDocker, Inc.
 
Serverless java
Serverless   javaServerless   java
Serverless javaVishwas N
 
DCSF 19 Modernizing Insurance with Docker Enterprise: The Physicians Mutual ...
DCSF 19 Modernizing Insurance with Docker Enterprise:  The Physicians Mutual ...DCSF 19 Modernizing Insurance with Docker Enterprise:  The Physicians Mutual ...
DCSF 19 Modernizing Insurance with Docker Enterprise: The Physicians Mutual ...Docker, Inc.
 
DCEU 18: From Monolith to Microservices
DCEU 18: From Monolith to MicroservicesDCEU 18: From Monolith to Microservices
DCEU 18: From Monolith to MicroservicesDocker, Inc.
 
Accessible hpc for everyone with docker and containers
Accessible hpc for everyone with docker and containersAccessible hpc for everyone with docker and containers
Accessible hpc for everyone with docker and containersDocker, Inc.
 
DCEU 18: 5 Patterns for Success in Application Transformation
DCEU 18: 5 Patterns for Success in Application TransformationDCEU 18: 5 Patterns for Success in Application Transformation
DCEU 18: 5 Patterns for Success in Application TransformationDocker, Inc.
 
Networking in Docker EE 2.0 with Kubernetes and Swarm
Networking in Docker EE 2.0 with Kubernetes and SwarmNetworking in Docker EE 2.0 with Kubernetes and Swarm
Networking in Docker EE 2.0 with Kubernetes and SwarmAbhinandan P.b
 
Application Deployment and Management at Scale with 1&1 by Matt Baldwin
Application Deployment and Management at Scale with 1&1 by Matt BaldwinApplication Deployment and Management at Scale with 1&1 by Matt Baldwin
Application Deployment and Management at Scale with 1&1 by Matt BaldwinDocker, Inc.
 
Container on azure
Container on azureContainer on azure
Container on azureVishwas N
 
Immutable Awesomeness by John Willis and Josh Corman
Immutable Awesomeness by John Willis and Josh CormanImmutable Awesomeness by John Willis and Josh Corman
Immutable Awesomeness by John Willis and Josh CormanDocker, Inc.
 

Tendances (20)

Predicting Space Weather with Docker
Predicting Space Weather with DockerPredicting Space Weather with Docker
Predicting Space Weather with Docker
 
DCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at NetflixDCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at Netflix
 
Hands-on Helm
Hands-on Helm Hands-on Helm
Hands-on Helm
 
Using the SDACK Architecture on Security Event Inspection by Yu-Lun Chen and ...
Using the SDACK Architecture on Security Event Inspection by Yu-Lun Chen and ...Using the SDACK Architecture on Security Event Inspection by Yu-Lun Chen and ...
Using the SDACK Architecture on Security Event Inspection by Yu-Lun Chen and ...
 
Puppet plugin for vRealize Automation (vRA)
Puppet plugin for vRealize Automation (vRA)Puppet plugin for vRealize Automation (vRA)
Puppet plugin for vRealize Automation (vRA)
 
Simple tweaks to get the most out of your JVM
Simple tweaks to get the most out of your JVMSimple tweaks to get the most out of your JVM
Simple tweaks to get the most out of your JVM
 
Webinar: Accelerate Your Inner Dev Loop for Kubernetes Services
Webinar: Accelerate Your Inner Dev Loop for Kubernetes Services Webinar: Accelerate Your Inner Dev Loop for Kubernetes Services
Webinar: Accelerate Your Inner Dev Loop for Kubernetes Services
 
DCSF 19 How Entergy is Mitigating Legacy Windows Operating System Vulnerabili...
DCSF 19 How Entergy is Mitigating Legacy Windows Operating System Vulnerabili...DCSF 19 How Entergy is Mitigating Legacy Windows Operating System Vulnerabili...
DCSF 19 How Entergy is Mitigating Legacy Windows Operating System Vulnerabili...
 
DCSF 19 Microservices API: Routing Across Any Infrastructure
DCSF 19 Microservices API: Routing Across Any InfrastructureDCSF 19 Microservices API: Routing Across Any Infrastructure
DCSF 19 Microservices API: Routing Across Any Infrastructure
 
DCSF 19 Developing Apps with Containers, Functions and Cloud Services
DCSF 19 Developing Apps with Containers, Functions and Cloud ServicesDCSF 19 Developing Apps with Containers, Functions and Cloud Services
DCSF 19 Developing Apps with Containers, Functions and Cloud Services
 
Serverless java
Serverless   javaServerless   java
Serverless java
 
DCSF 19 Modernizing Insurance with Docker Enterprise: The Physicians Mutual ...
DCSF 19 Modernizing Insurance with Docker Enterprise:  The Physicians Mutual ...DCSF 19 Modernizing Insurance with Docker Enterprise:  The Physicians Mutual ...
DCSF 19 Modernizing Insurance with Docker Enterprise: The Physicians Mutual ...
 
DCEU 18: From Monolith to Microservices
DCEU 18: From Monolith to MicroservicesDCEU 18: From Monolith to Microservices
DCEU 18: From Monolith to Microservices
 
Accessible hpc for everyone with docker and containers
Accessible hpc for everyone with docker and containersAccessible hpc for everyone with docker and containers
Accessible hpc for everyone with docker and containers
 
DCEU 18: 5 Patterns for Success in Application Transformation
DCEU 18: 5 Patterns for Success in Application TransformationDCEU 18: 5 Patterns for Success in Application Transformation
DCEU 18: 5 Patterns for Success in Application Transformation
 
Networking in Docker EE 2.0 with Kubernetes and Swarm
Networking in Docker EE 2.0 with Kubernetes and SwarmNetworking in Docker EE 2.0 with Kubernetes and Swarm
Networking in Docker EE 2.0 with Kubernetes and Swarm
 
Application Deployment and Management at Scale with 1&1 by Matt Baldwin
Application Deployment and Management at Scale with 1&1 by Matt BaldwinApplication Deployment and Management at Scale with 1&1 by Matt Baldwin
Application Deployment and Management at Scale with 1&1 by Matt Baldwin
 
JEEconf 2017
JEEconf 2017JEEconf 2017
JEEconf 2017
 
Container on azure
Container on azureContainer on azure
Container on azure
 
Immutable Awesomeness by John Willis and Josh Corman
Immutable Awesomeness by John Willis and Josh CormanImmutable Awesomeness by John Willis and Josh Corman
Immutable Awesomeness by John Willis and Josh Corman
 

Similaire à Distributed Deep Learning with Docker at Salesforce

Deep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesDeep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesTuri, Inc.
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsSanghamitra Deb
 
Deep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformDeep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformShivaji Dutta
 
DX@Scale: Optimizing Salesforce Development and Deployment for large scale pr...
DX@Scale: Optimizing Salesforce Development and Deployment for large scale pr...DX@Scale: Optimizing Salesforce Development and Deployment for large scale pr...
DX@Scale: Optimizing Salesforce Development and Deployment for large scale pr...Paris Salesforce Developer Group
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabszekeLabs Technologies
 
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher ScientificEnabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher ScientificDatabricks
 
Why is dev ops for machine learning so different - dataxdays
Why is dev ops for machine learning so different  - dataxdaysWhy is dev ops for machine learning so different  - dataxdays
Why is dev ops for machine learning so different - dataxdaysRyan Dawson
 
How to Build Deep Learning Models
How to Build Deep Learning ModelsHow to Build Deep Learning Models
How to Build Deep Learning ModelsJosh Patterson
 
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018Apache MXNet
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning InfrastructureSigOpt
 
[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...
[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...
[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...Ambassador Labs
 
ITARC15 Workshop - Architecting a Large Software Project - Lessons Learned
ITARC15 Workshop - Architecting a Large Software Project - Lessons LearnedITARC15 Workshop - Architecting a Large Software Project - Lessons Learned
ITARC15 Workshop - Architecting a Large Software Project - Lessons LearnedJoão Pedro Martins
 
Selena Deckelmann - Sane Schema Management with Alembic and SQLAlchemy @ Pos...
Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Pos...Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Pos...
Selena Deckelmann - Sane Schema Management with Alembic and SQLAlchemy @ Pos...PostgresOpen
 
Agile2015: Introduction to DevOps with Chocolate and Lego Game
Agile2015: Introduction to DevOps with Chocolate and Lego GameAgile2015: Introduction to DevOps with Chocolate and Lego Game
Agile2015: Introduction to DevOps with Chocolate and Lego GameDana Pylayeva
 
Testing with laravel
Testing with laravelTesting with laravel
Testing with laravelDerek Binkley
 
DLT UNIT-3.docx
DLT  UNIT-3.docxDLT  UNIT-3.docx
DLT UNIT-3.docx0567Padma
 
Super Sizing Youtube with Python
Super Sizing Youtube with PythonSuper Sizing Youtube with Python
Super Sizing Youtube with Pythondidip
 

Similaire à Distributed Deep Learning with Docker at Salesforce (20)

Deep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesDeep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep Features
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
 
Deep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformDeep Learning on Qubole Data Platform
Deep Learning on Qubole Data Platform
 
DX@Scale: Optimizing Salesforce Development and Deployment for large scale pr...
DX@Scale: Optimizing Salesforce Development and Deployment for large scale pr...DX@Scale: Optimizing Salesforce Development and Deployment for large scale pr...
DX@Scale: Optimizing Salesforce Development and Deployment for large scale pr...
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
 
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher ScientificEnabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
 
Why is dev ops for machine learning so different - dataxdays
Why is dev ops for machine learning so different  - dataxdaysWhy is dev ops for machine learning so different  - dataxdays
Why is dev ops for machine learning so different - dataxdays
 
How to Build Deep Learning Models
How to Build Deep Learning ModelsHow to Build Deep Learning Models
How to Build Deep Learning Models
 
What DevOps Isn't
What DevOps Isn'tWhat DevOps Isn't
What DevOps Isn't
 
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning Infrastructure
 
Deep Domain
Deep DomainDeep Domain
Deep Domain
 
[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...
[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...
[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...
 
ITARC15 Workshop - Architecting a Large Software Project - Lessons Learned
ITARC15 Workshop - Architecting a Large Software Project - Lessons LearnedITARC15 Workshop - Architecting a Large Software Project - Lessons Learned
ITARC15 Workshop - Architecting a Large Software Project - Lessons Learned
 
Selena Deckelmann - Sane Schema Management with Alembic and SQLAlchemy @ Pos...
Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Pos...Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Pos...
Selena Deckelmann - Sane Schema Management with Alembic and SQLAlchemy @ Pos...
 
Agile2015: Introduction to DevOps with Chocolate and Lego Game
Agile2015: Introduction to DevOps with Chocolate and Lego GameAgile2015: Introduction to DevOps with Chocolate and Lego Game
Agile2015: Introduction to DevOps with Chocolate and Lego Game
 
Testing with laravel
Testing with laravelTesting with laravel
Testing with laravel
 
DLT UNIT-3.docx
DLT  UNIT-3.docxDLT  UNIT-3.docx
DLT UNIT-3.docx
 
Os Solomon
Os SolomonOs Solomon
Os Solomon
 
Super Sizing Youtube with Python
Super Sizing Youtube with PythonSuper Sizing Youtube with Python
Super Sizing Youtube with Python
 

Plus de Docker, Inc.

Containerize Your Game Server for the Best Multiplayer Experience
Containerize Your Game Server for the Best Multiplayer Experience Containerize Your Game Server for the Best Multiplayer Experience
Containerize Your Game Server for the Best Multiplayer Experience Docker, Inc.
 
How to Improve Your Image Builds Using Advance Docker Build
How to Improve Your Image Builds Using Advance Docker BuildHow to Improve Your Image Builds Using Advance Docker Build
How to Improve Your Image Builds Using Advance Docker BuildDocker, Inc.
 
Build & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWSBuild & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWSDocker, Inc.
 
Securing Your Containerized Applications with NGINX
Securing Your Containerized Applications with NGINXSecuring Your Containerized Applications with NGINX
Securing Your Containerized Applications with NGINXDocker, Inc.
 
How To Build and Run Node Apps with Docker and Compose
How To Build and Run Node Apps with Docker and ComposeHow To Build and Run Node Apps with Docker and Compose
How To Build and Run Node Apps with Docker and ComposeDocker, Inc.
 
The First 10M Pulls: Building The Official Curl Image for Docker Hub
The First 10M Pulls: Building The Official Curl Image for Docker HubThe First 10M Pulls: Building The Official Curl Image for Docker Hub
The First 10M Pulls: Building The Official Curl Image for Docker HubDocker, Inc.
 
COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...
COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...
COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...Docker, Inc.
 
Become a Docker Power User With Microsoft Visual Studio Code
Become a Docker Power User With Microsoft Visual Studio CodeBecome a Docker Power User With Microsoft Visual Studio Code
Become a Docker Power User With Microsoft Visual Studio CodeDocker, Inc.
 
How to Use Mirroring and Caching to Optimize your Container Registry
How to Use Mirroring and Caching to Optimize your Container RegistryHow to Use Mirroring and Caching to Optimize your Container Registry
How to Use Mirroring and Caching to Optimize your Container RegistryDocker, Inc.
 
Kubernetes at Datadog Scale
Kubernetes at Datadog ScaleKubernetes at Datadog Scale
Kubernetes at Datadog ScaleDocker, Inc.
 
Labels, Labels, Labels
Labels, Labels, Labels Labels, Labels, Labels
Labels, Labels, Labels Docker, Inc.
 
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment ModelUsing Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment ModelDocker, Inc.
 
Build & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWSBuild & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWSDocker, Inc.
 
From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...
From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...
From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...Docker, Inc.
 
Developing with Docker for the Arm Architecture
Developing with Docker for the Arm ArchitectureDeveloping with Docker for the Arm Architecture
Developing with Docker for the Arm ArchitectureDocker, Inc.
 
Sharing is Caring: How to Begin Speaking at Conferences
Sharing is Caring: How to Begin Speaking at ConferencesSharing is Caring: How to Begin Speaking at Conferences
Sharing is Caring: How to Begin Speaking at ConferencesDocker, Inc.
 
Virtual Meetup Docker + Arm: Building Multi-arch Apps with Buildx
Virtual Meetup Docker + Arm: Building Multi-arch Apps with BuildxVirtual Meetup Docker + Arm: Building Multi-arch Apps with Buildx
Virtual Meetup Docker + Arm: Building Multi-arch Apps with BuildxDocker, Inc.
 
DCSF 19 eBPF Superpowers
DCSF 19 eBPF SuperpowersDCSF 19 eBPF Superpowers
DCSF 19 eBPF SuperpowersDocker, Inc.
 
DCSF 19 Zero Trust Networks Come to Enterprise Kubernetes
DCSF 19 Zero Trust Networks Come to Enterprise KubernetesDCSF 19 Zero Trust Networks Come to Enterprise Kubernetes
DCSF 19 Zero Trust Networks Come to Enterprise KubernetesDocker, Inc.
 
DCSF 19 Node.js Rocks in Docker for Dev and Ops
DCSF 19 Node.js Rocks in Docker for Dev and OpsDCSF 19 Node.js Rocks in Docker for Dev and Ops
DCSF 19 Node.js Rocks in Docker for Dev and OpsDocker, Inc.
 

Plus de Docker, Inc. (20)

Containerize Your Game Server for the Best Multiplayer Experience
Containerize Your Game Server for the Best Multiplayer Experience Containerize Your Game Server for the Best Multiplayer Experience
Containerize Your Game Server for the Best Multiplayer Experience
 
How to Improve Your Image Builds Using Advance Docker Build
How to Improve Your Image Builds Using Advance Docker BuildHow to Improve Your Image Builds Using Advance Docker Build
How to Improve Your Image Builds Using Advance Docker Build
 
Build & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWSBuild & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWS
 
Securing Your Containerized Applications with NGINX
Securing Your Containerized Applications with NGINXSecuring Your Containerized Applications with NGINX
Securing Your Containerized Applications with NGINX
 
How To Build and Run Node Apps with Docker and Compose
How To Build and Run Node Apps with Docker and ComposeHow To Build and Run Node Apps with Docker and Compose
How To Build and Run Node Apps with Docker and Compose
 
The First 10M Pulls: Building The Official Curl Image for Docker Hub
The First 10M Pulls: Building The Official Curl Image for Docker HubThe First 10M Pulls: Building The Official Curl Image for Docker Hub
The First 10M Pulls: Building The Official Curl Image for Docker Hub
 
COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...
COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...
COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...
 
Become a Docker Power User With Microsoft Visual Studio Code
Become a Docker Power User With Microsoft Visual Studio CodeBecome a Docker Power User With Microsoft Visual Studio Code
Become a Docker Power User With Microsoft Visual Studio Code
 
How to Use Mirroring and Caching to Optimize your Container Registry
How to Use Mirroring and Caching to Optimize your Container RegistryHow to Use Mirroring and Caching to Optimize your Container Registry
How to Use Mirroring and Caching to Optimize your Container Registry
 
Kubernetes at Datadog Scale
Kubernetes at Datadog ScaleKubernetes at Datadog Scale
Kubernetes at Datadog Scale
 
Labels, Labels, Labels
Labels, Labels, Labels Labels, Labels, Labels
Labels, Labels, Labels
 
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment ModelUsing Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
 
Build & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWSBuild & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWS
 
From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...
From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...
From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...
 
Developing with Docker for the Arm Architecture
Developing with Docker for the Arm ArchitectureDeveloping with Docker for the Arm Architecture
Developing with Docker for the Arm Architecture
 
Sharing is Caring: How to Begin Speaking at Conferences
Sharing is Caring: How to Begin Speaking at ConferencesSharing is Caring: How to Begin Speaking at Conferences
Sharing is Caring: How to Begin Speaking at Conferences
 
Virtual Meetup Docker + Arm: Building Multi-arch Apps with Buildx
Virtual Meetup Docker + Arm: Building Multi-arch Apps with BuildxVirtual Meetup Docker + Arm: Building Multi-arch Apps with Buildx
Virtual Meetup Docker + Arm: Building Multi-arch Apps with Buildx
 
DCSF 19 eBPF Superpowers
DCSF 19 eBPF SuperpowersDCSF 19 eBPF Superpowers
DCSF 19 eBPF Superpowers
 
DCSF 19 Zero Trust Networks Come to Enterprise Kubernetes
DCSF 19 Zero Trust Networks Come to Enterprise KubernetesDCSF 19 Zero Trust Networks Come to Enterprise Kubernetes
DCSF 19 Zero Trust Networks Come to Enterprise Kubernetes
 
DCSF 19 Node.js Rocks in Docker for Dev and Ops
DCSF 19 Node.js Rocks in Docker for Dev and OpsDCSF 19 Node.js Rocks in Docker for Dev and Ops
DCSF 19 Node.js Rocks in Docker for Dev and Ops
 

Dernier

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Dernier (20)

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Distributed Deep Learning with Docker at Salesforce

  • 1. Distributed Deep Learning with Docker at Salesforce
  • 3. Caveats ● These my own views and opinions, not those of Salesforce ● This is how one team at Salesforce deploys deep learning models ● When I use the term Docker I am referring to the technology, not the company ● Some of these designs are simplified
  • 4. ● What is deep learning and why is it difficult? ● Deep learning at Salesforce ● Challenges ○ Designing for team specialization ○ Interacting with the model server ○ Testing ● Key takeaways About this talk
  • 5. The core task of deep learning is function approximation. Neural networks can approximate any function. Neural networks are expensive to evaluate. ● Linear regression: ~1,000 parameters ● Deep neural network: 100M - 1B parameters (100,000 - 1M x linear reg.) Deep Learning Review
  • 6. How should we design distributed systems for deep learning? high latency tasks
  • 7. We use deep learning models to provide our customers useful information about their sales process. They send us this data as a firehose of streaming data. The faster we get this data to our customers, the more useful and actionable it is for their sales teams. Deep Learning at Salesforce
  • 8. There are three steps to this process 1. Preprocessing - cleaning and formatting the data 2. Inference - running the data through the model 3. Postprocessing - interpreting the output from the model Deep Learning at Salesforce
  • 9. Deep Learning at Salesforce Discusses cat preprocess [0.2, 0.71, 0.89, 0.6] [0.85, 0.15] inference postprocess“Hello! My cat is friendly.”
  • 10. Deep Learning at Salesforce Discusses cat preprocess [0.2, 0.71, 0.89, 0.6] [0.85, 0.15] inference postprocess“Hello! My cat is friendly.” Data Science Team
  • 11. Deep Learning at Salesforce Discusses cat preprocess [0.2, 0.71, 0.89, 0.6] [0.85, 0.15] inference postprocess“Hello! My cat is friendly.” Data Science Team Systems Team
  • 13. Requirements 1. The data science team shouldn’t need to know about the system. They just want to define a sequence of computation. 2. The systems engineers shouldn’t need to know anything about the computation. They just want to scale the system. Designing for team specialization
  • 14. Challenges 1. Some functions takes longer to execute than others (e.g., model inference) 2. The order of execution is important Designing for team specialization
  • 15. Solution: map functions to containers Designing for team specialization postprocess(inference(preprocess(x))) preprocess inference postprocess
  • 16. What about throughput? preprocess inference postprocess 0110010 1001111 0110010 1000011 It’s a cat! 1,000 QPS 500 QPS 300 QPS 1,000 QPS Max throughput
  • 17. inference inferenceinferencepreprocess What about throughput? preprocess inference postprocess 0110010 1001111 0110010 1000011 It’s a cat! 1,000 QPS 2x 500 QPS 4x 300 QPS 1,000 QPS Max throughput Docker enables us to easily scale out each individual stage
  • 18. inferenceinferencepreprocess What about throughput? preprocess inference postprocess 0110010 1001111 0110010 1000011 It’s a cat! 1,000 QPS 2x 500 QPS 2x 300 QPS 1,000 QPS Max throughput Kafka gives stage-wise checkpointing
  • 20. Model servers provide a way to query the model, typically via gRPC or HTTP. What is the best way to deploy and interact with these model servers? Serving deep learning models
  • 21. Challenge: 1. Model servers are designed as a standalone process. 2. How should we best utilize multiple GPUs? 3. What about networking? Interacting with the model server We want to keep deployment simple!
  • 22. Solution: Deploy model server images as part of a “pod” or “group” with a coordinator service Interacting with the model server JVM Manager Model Server Model Server Model Server... Pod
  • 23. 1. Who owns the model server? 2. How should we handle model versions? Where are they stored locally? 3. What are the addresses of the model servers? This solves additional challenges Data science team Docker shared volume http://localhost via Docker private networking
  • 25. Challenge: how should we test these systems? 1. Deep learning models are probabilistic 2. Interservice interactions can be quite complex Testing
  • 26. Solution: Docker Compose ● Makes it easy to swap out the model server with a mock service ● Deploying the entire system locally is easy ● Integrates well with Maven and Gradle Testing
  • 27. We haven’t spent a lot of time discussing the details of Docker That is precisely the point!
  • 28. ● Docker allows us to simplify many aspects of our design. ● Docker stays out of the way. ● Docker provides a simple alternative to a much more complex solution. Docker simplifies our lives