SlideShare une entreprise Scribd logo
1  sur  72
Télécharger pour lire hors ligne
Distributed Deep Learning
with Hadoop and TensorFlow
Image Classification- 2016
Human Performance AI Performance
https://arxiv.org/pdf/1602.07261.pdf
95% 97%
The ability to understand the content of an image by using machine learning
4
AI beats human in games - 2016
Komodo beasts H. Nakamura in 2016AlphaGo beats L. Sedols in 2016
Go 4:1 Chess 2:1
Breast Cancer Diagnoses - 2017
Pathologist Performance AI Performance
https://research.googleblog.com/2017/03/assisting-pathologists-in-detecting.html
73% 92%
Doctors often use additional tests to find or diagnose breast cancer
The pathologist ended up
spending 30 hours on this
task on 130 slides
A closeup of a lymph node biopsy.
Google TPU
The power of 12 GB HBM2 memory and 640 Tensor
Cores, delivering 110 TeraFLOPS of performance.
AI history à Perceptron
1958 F. Rosenblatt,
“Perceptron” model,
neuronal networks
1943 W. McCulloch,
W. Pitts, “Neuron” as
logical element
OR function XOR function
1969 M. Minsky,
S. Papert, triggers
first AI winter
feed forward
AI history à AI winter
1958 F. Rosenblatt,
Perzeptron model,
neuronal networks
1987-1993 the second
AI winter, desktop
computer, LISP
machines expensive
1943 W. McCulloch,
W. Pitts, neuron as
logical element
1980 Boom expert
systems, Q&A using
logical rules, Prolog
1969 M. Minsky,
S. Papert, trigger
first AI winter
1993-2001
Moore’s law, Deep
blue chess-
playing, Standford
DARPA challenge
12
Machine Learning Problem Types
Structured data
80% of world’s data is unstructured
Fishing in the sea versus fishing in the lake
Data Warehouse Data Lake
Business Intellingence helps find
answers to questions you know.
Data Science helps you find the
question itself.
Any kind of data & schema-on-readStructured data & schema-on-write
Parallel processing on big dataSQL-ish queries on database tables
Extract, Transform, Load Extract, Load, Transform-on-the-fly
Low cost on commodity hardwareExpensive for large data
More Data + Bigger Models
Accuracy
Scale (data size, model size)
other approaches
neural networks
1990s
https://www.scribd.com/document/355752799/Jeff-Dean-s-Lecture-for-YC-AI
More Data + Bigger Models + More Computation
Accuracy
Scale (data size, model size)
other approaches
neural networks
Now
https://www.scribd.com/document/355752799/Jeff-Dean-s-Lecture-for-YC-AI
more compute
More Data + Bigger Models + More Computation
= Better Results in Machine Learning
Millions of “trip”
events each day globally
400+ billion viewing-
related events per day
Five billion data points
for Price Tip feature
Movie
recommendation
Price
optimization
Routing and price
optimization
How to start?
Single machineML specialist Small data
Single machineML specialist Small data
Single machineML specialist Small data
Single machineML specialist Small data
Single machineML specialist Small data
X X
Single machineML specialist Big data
Single machineML specialist Big data
X X
Train and evaluate machine learning models at scale
Single machine Data center
How to run more experiments faster and in parallel?
How to share and reproduce research?
How to go from research to real products?
Distributed Machine Learning
Data Size
Model Size
Model parallelism
Single machine
Data center
Data
parallelism
training very large models exploring several model
architectures, hyper-
parameter optimization,
training several
independent models
speeds up the training
Compute Workload for Training and Evaluation
I/O intensive
Compute
intensive
Single machine
Data center
I/O Workload for Simulation and Testing
I/O intensive
Compute
intensive
Single machine
Data center
Distributed Machine Learning
Distributed Machine Learning
X
The new rising star
12/19/17 31
TensorFlow
Standalone
TensorFlow
On YARN
TensorFlow
On multi-
colored YARN
TensorFlow
On Spark
TensorFrames
TensorFlow
On
Kubernetes
TensorFlow
On Mesos
Distributed TensorFlow on
Hadoop, Mesos, Kubernetes,
Spark
https://www.slideshare.net/jwiegelmann/distributed
-tensorflow-on-hadoop-mesos-kubernetes-spark
Data Parallel vs. Model Parallel
http://books.nips.cc/papers/files/nips25/NIPS2012_0598.pdf
Between-Graph Replication In-Graph Replication
Data Shards vs. Data Combined
https://static.googleusercontent.com/media/research.google.com/en//archive/large_deep_networks_nips2012.pdf
Synchronous vs. Asynchronous
https://arxiv.org/pdf/1603.04467.pdf
TensorFlow Standalone
https://www.tensorflow.org/
TensorFlow Standalone
Dedicated cluster
Short & long running jobs
Flexibility
Manual scheduling of workers
No shared resources
Hard to share data with other
applications
No data locality
TensorFlow On YARN (Intel) v3
https://github.com/Intel-bigdata/TensorFlowOnYARN
released March 12, 2017 / YARN-6043
TensorFlow On YARN (Intel)
Shared cluster and data
Optimised long running jobs
Scheduling
Data locality (not yet implemented)
Not easy to have rapid adoption
from upstream
Fault tolerance not yet implemented
GPU still not seen as a “native”
resource on yarn
No use of yarn elasticity
TensorFlow On multi-colored YARN (Hortonworks)
v3
Not yet implemented!
https://hortonworks.com/blog/distributed-tensorflow-assembly-hadoop-yarn/
TensorFlow On multi-colored YARN (Hortonworks)
Shared cluster
GPUs shared by multiple tenants
and applications
Centralised scheduling
YARN-3611 Docker support
YARN-4793 Native processes
Needs YARN wrapper of NVIDIA
Docker (GPU driver)
Not implemented yet!
TensorFlow On Spark (Yahoo) v2
https://github.com/yahoo/TensorFlowOnSpark
released January 22, 2017
TensorFlow On Spark (Yahoo)
Shared cluster and data
Data locality through HDFS or
other Spark sources
Add-hoc training and evaluation
Slice and dice data with Spark
distributed transformations
Scheduling not optimal
Necessary to “convert” existing
TensorFlow application, although
simple process
Might need to restart Spark cluster
No GPU resource management
TensorFrames (Databricks) v2
Scala binding to TF via JNI https://github.com/databricks/tensorframes
released Feb 28, 2016
TensorFrames (Databricks)
Possible shared cluster
TensorFrame infers the shapes
for small tensors (no analyse
required)
Data locality via RDD
Experimental
Still not centralised scheduling, TF
and Spark need to be deployed
and scheduled separately
TF and Spark might not be
collocated
Might need data transfer between
some nodes
TensorFlow On Kubernetes
https://github.com/tensorflow/ecosystem
TensorFlow On Kubernetes
Shared cluster
Centralised scheduling by
Kubernetes
Solved network orchestration,
federation etc.
Experimental support for
managing NVIDIA GPUs (at this
time better than yarn however)
Fault tolerance
Data locality
TensorFlow On Mesos
Marathon
https://github.com/douban/tfmesos
TensorFlow On Mesos
Shared cluster
GPU-based scheduling
Short and long running jobs
Memory footprint
Number of services relative to
Kubernetes
Fault tolerance
Data locality
Hidden Technical Debt in Machine Learning Systems
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
Google, 2015
Hidden Technical Debt in Machine Learning Systems
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
Google, 2015
http://stevenwhang.com/tfx_paper.pdf
TFX: A TensorFlow-Based Production-Scale
Machine Learning Platform
Google, 2017
https://eng.uber.com/michelangelo/
Michelangelo: Uber’s Machine Learning Platform
http://searchbusinessanalytics.techtarget.com/feature/Machine-learning-platforms-comparison-Amazon-Azure-Google-IBM
Pricing for 890,000 real-time predictions w/o training
AWS:
Compute Fees + Prediction Fees = $8.40 + $96.44
= $104.84 per month
Google:
Prediction $0.10 per thousand predictions, plus $0.40 per hour
= $377 per month
Azure:
Packages $0, $100,13, $1.000,06, $9.999,98
= $1.000 per month
Q3, 2017
LESSONS LEARNED
High-level Development Process for Autonomous Vehicles
1 Collect
sensors data
3 Autonomous
Driving
2 Model
Engineering
Data Logger Control Unit
Big Data Trained Model
Data Center
Agenda
Sensors Udacity Lincoln MKZ
Camera 3x Blackfly GigE Camera, 20 Hz
Lidar Velodyne HDL-32E, 9.5 Hz
IMU Xsens, 400 Hz
GPS 2x fixed, 1 Hz
CAN bus, 1,1 kHz
Robot Operating System
Data 3 GB per minute
https://github.com/udacity/self-driving-car
Sensors Spec
Sensor blinding,
sunlight,
darkness
rain, fog,
snow
non-metal
objects
wind/ high
velocity
resolution range data
Ultrasonic yes yes yes no + + +
Lidar yes no yes yes +++ ++ +
Radar yes yes no yes ++ +++ +
Camera no no yes yes +++ +++ +++
Machine Learning 101
Observations
State
Estimation
Modeling &
Prediction
Planning
Controls
f(x)
Controls
Observations
Machine Learning for Autonomous Driving
+ Sensor Fusion clustering, segmentation, pattern recognition
+ Road ego-motion, image processing and pattern recognition
+ Localization simultaneous localization and mapping
+ Situation Understanding detection and classification
+ Trajectory Planning motion planning and control
+ Control Strategy reinforcement and supervised learning
+ Driver Model image processing and pattern recognition
Machine Learning Cycle
Data collection
for training/test
Feature
engineering
I/O workload
Model development
and architecture
Compute workload I/O workload
Training and
evaluation
Re- Simulation
and Testing
Scaling and
monitoring
Model deployment
versioning
1 2 3
Model tuning
Flux – Open Machine Learning Stack
Training & Test data
Compute + Network + Storage
Deploy model
ML Development & Catalog & REST API
ML-Specialists
Feature
Engineering
Training
Evaluation
Re-Simulation
Testing
CaffeOnSpark
Sample Model Prediction Batch Regression Cluster
Dataset Correlation Centroid Anomaly Test Scores
ü Mainly open source
ü No vendor lock in
ü Scale-out architecture
ü Multi user support
ü Resource management
ü Job scheduling
ü Speed-up training
ü Speed-up simulation
Feature Engineering
+ Hadoop InputFormat and
Record Reader for Rosbag
+ Process Rosbag with Spark,
Yarn, MapReduce, Hadoop
Streaming API, …
+ Spark RDD are cached and
optimized for analysis
Ros
bag
Processing
Engine
Computer
Network
Storage
Advanced
Analytics
RDD
Record
Reader
RDD
DataFrame, DataSet
SQL, Spark APIs
NumPy
Ros
Msg
Training & Evaluation
+ Tensorflow ROSRecordDataset
+ Protocol Buffers to serialize
records
+ Save time because data
conversion not needed
+ Save storage because data
duplication not needed
Training
Engine
Machine
Learning
Ros
bag
Computer
Network
Storage
ROS
Dataset
Ros
msg
Re-Simulation & Testing
+ Use Spark for preprocessing,
transformation, cleansing,
aggregation, time window
selection before publish to ROS
topics
+ Use Re-Simulation framework
of choice to subscribe to the
ROS topics
Engine
Re-Simulation
with framework
of choice
Computer
Network
Storage
Ros
bag
Ros
topic
core
subscribe
publish
Time Travel
fold(left)
t
fold(right)
reduce/
shuffle
HOW TO START?
Think Big Business Strategy
Data Strategy
Technology Strategy
Agile Delivery Model
Business Case Validation
Prototypes, MVPs
Data Exploration
Data AcquisitionStart Small
Value
Proposition
+ Classification, Regression, Clustering,
Collaborative Filtering, Anomaly Detection
+ Supervised/Unsupervised Reinforcement
Learning, Deep Learning, CNN
+ Model Training, Evaluation, Testing,
Simulation, Inference
+ Big Data Strategy, Consulting, Data
Lab, Data Science as a Service
+ Data Collection, Cleaning, Analyzing,
Modeling, Validation, Visualization
+ Business Case Validation,
Prototyping, MVPs, Dashboards
Data Science Machine Learning
+ Architecture, DevOps, Cloud Building
+ App. Management Hadoop Ecosystem
+ Managed Infrastructure Services
+ Compute, Network, Storage, Firewall,
Loadbalancer, DDoS, Protection
+ Continuous Integration and Deployment
+ Data Pipelines (Acquisition,
Ingestion, Analytics, Visualization)
+ Distributed Data Architectures
+ Data Processing Backend
+ Hadoop Ecosystem
+ Test Automation and Testing
Data Engineering Data Operations
“Culture eats strategy for breakfast,
technology for lunch, and products for dinner,
and soon thereafter everything else too.”
Peter Drucker
thank you

Contenu connexe

Tendances

Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersJulien SIMON
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflowDatabricks
 
ONNX and MLflow
ONNX and MLflowONNX and MLflow
ONNX and MLflowamesar0
 
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...Databricks
 
ML Infrastracture @ Dropbox
ML Infrastracture @ Dropbox ML Infrastracture @ Dropbox
ML Infrastracture @ Dropbox Tsahi Glik
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsDatabricks
 
MLOps with Azure DevOps
MLOps with Azure DevOpsMLOps with Azure DevOps
MLOps with Azure DevOpsMarco Parenzan
 
Drifting Away: Testing ML Models in Production
Drifting Away: Testing ML Models in ProductionDrifting Away: Testing ML Models in Production
Drifting Away: Testing ML Models in ProductionDatabricks
 
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...Dataconomy Media
 
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMsSylvainGugger
 
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...Databricks
 
Learn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleLearn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleDatabricks
 
Building a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache SparkBuilding a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache SparkDatabricks
 
ML-Ops how to bring your data science to production
ML-Ops  how to bring your data science to productionML-Ops  how to bring your data science to production
ML-Ops how to bring your data science to productionHerman Wu
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsMárton Kodok
 
Kdd 2014 Tutorial - the recommender problem revisited
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisitedXavier Amatriain
 

Tendances (20)

Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face Transformers
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
 
ONNX and MLflow
ONNX and MLflowONNX and MLflow
ONNX and MLflow
 
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
 
ML Infrastracture @ Dropbox
ML Infrastracture @ Dropbox ML Infrastracture @ Dropbox
ML Infrastracture @ Dropbox
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
MLOps with Azure DevOps
MLOps with Azure DevOpsMLOps with Azure DevOps
MLOps with Azure DevOps
 
Drifting Away: Testing ML Models in Production
Drifting Away: Testing ML Models in ProductionDrifting Away: Testing ML Models in Production
Drifting Away: Testing ML Models in Production
 
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
 
What is MLOps
What is MLOpsWhat is MLOps
What is MLOps
 
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMs
 
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
 
Apache spark
Apache sparkApache spark
Apache spark
 
Learn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleLearn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML Lifecycle
 
Building a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache SparkBuilding a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache Spark
 
ML-Ops how to bring your data science to production
ML-Ops  how to bring your data science to productionML-Ops  how to bring your data science to production
ML-Ops how to bring your data science to production
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
 
MLOps.pptx
MLOps.pptxMLOps.pptx
MLOps.pptx
 
Azure Synapse Analytics
Azure Synapse AnalyticsAzure Synapse Analytics
Azure Synapse Analytics
 
Kdd 2014 Tutorial - the recommender problem revisited
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisited
 

Similaire à Distributed Deep Learning with Hadoop and TensorFlow

Deep Learning for Autonomous Driving
Deep Learning for Autonomous DrivingDeep Learning for Autonomous Driving
Deep Learning for Autonomous DrivingJan Wiegelmann
 
DN 2017 | Machine Learning for Self-Driving Cars | Jan Wiegelmann | Valtech
DN 2017 |  Machine Learning for Self-Driving Cars | Jan Wiegelmann | ValtechDN 2017 |  Machine Learning for Self-Driving Cars | Jan Wiegelmann | Valtech
DN 2017 | Machine Learning for Self-Driving Cars | Jan Wiegelmann | ValtechDataconomy Media
 
Machine Learning for Self-Driving Cars
Machine Learning for Self-Driving CarsMachine Learning for Self-Driving Cars
Machine Learning for Self-Driving CarsJan Wiegelmann
 
END-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACKEND-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACKJan Wiegelmann
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioAlluxio, Inc.
 
Petascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big AnalyticsPetascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big AnalyticsHeiko Joerg Schick
 
Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0Ganesan Narayanasamy
 
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and HadoopJosh Patterson
 
Big Data - Need of Converged Data Platform
Big Data - Need of Converged Data PlatformBig Data - Need of Converged Data Platform
Big Data - Need of Converged Data PlatformGeekNightHyderabad
 
Democratizing AI with Apache Spark
Democratizing AI with Apache SparkDemocratizing AI with Apache Spark
Democratizing AI with Apache SparkSpark Summit
 
NVIDIA Deep Learning Institute 2017 基調講演
NVIDIA Deep Learning Institute 2017 基調講演NVIDIA Deep Learning Institute 2017 基調講演
NVIDIA Deep Learning Institute 2017 基調講演NVIDIA Japan
 
GPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 KeynoteGPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 KeynoteNVIDIA
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1gauravsc36
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big DataOmnia Safaan
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridEvert Lammerts
 
Data Science und Machine Learning im Kubernetes-Ökosystem
Data Science und Machine Learning im Kubernetes-ÖkosystemData Science und Machine Learning im Kubernetes-Ökosystem
Data Science und Machine Learning im Kubernetes-Ökosysteminovex GmbH
 

Similaire à Distributed Deep Learning with Hadoop and TensorFlow (20)

Deep Learning for Autonomous Driving
Deep Learning for Autonomous DrivingDeep Learning for Autonomous Driving
Deep Learning for Autonomous Driving
 
AI meets Big Data
AI meets Big DataAI meets Big Data
AI meets Big Data
 
DN 2017 | Machine Learning for Self-Driving Cars | Jan Wiegelmann | Valtech
DN 2017 |  Machine Learning for Self-Driving Cars | Jan Wiegelmann | ValtechDN 2017 |  Machine Learning for Self-Driving Cars | Jan Wiegelmann | Valtech
DN 2017 | Machine Learning for Self-Driving Cars | Jan Wiegelmann | Valtech
 
Machine Learning for Self-Driving Cars
Machine Learning for Self-Driving CarsMachine Learning for Self-Driving Cars
Machine Learning for Self-Driving Cars
 
END-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACKEND-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACK
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
Petascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big AnalyticsPetascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big Analytics
 
BigData
BigDataBigData
BigData
 
Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0
 
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and Hadoop
 
Big Data - Need of Converged Data Platform
Big Data - Need of Converged Data PlatformBig Data - Need of Converged Data Platform
Big Data - Need of Converged Data Platform
 
Democratizing AI with Apache Spark
Democratizing AI with Apache SparkDemocratizing AI with Apache Spark
Democratizing AI with Apache Spark
 
The Revolution of Deep Learning
The Revolution of Deep LearningThe Revolution of Deep Learning
The Revolution of Deep Learning
 
NVIDIA Deep Learning Institute 2017 基調講演
NVIDIA Deep Learning Institute 2017 基調講演NVIDIA Deep Learning Institute 2017 基調講演
NVIDIA Deep Learning Institute 2017 基調講演
 
GPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 KeynoteGPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 Keynote
 
Microsoft Dryad
Microsoft DryadMicrosoft Dryad
Microsoft Dryad
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big Data
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
 
Data Science und Machine Learning im Kubernetes-Ökosystem
Data Science und Machine Learning im Kubernetes-ÖkosystemData Science und Machine Learning im Kubernetes-Ökosystem
Data Science und Machine Learning im Kubernetes-Ökosystem
 

Plus de Jan Wiegelmann

Analytics for Autonomous Driving with ROS
Analytics for Autonomous Driving with ROSAnalytics for Autonomous Driving with ROS
Analytics for Autonomous Driving with ROSJan Wiegelmann
 
Challenges of Deep Learning in the Automotive Industry and Autonomous Driving
Challenges of Deep Learning in the Automotive Industry and Autonomous DrivingChallenges of Deep Learning in the Automotive Industry and Autonomous Driving
Challenges of Deep Learning in the Automotive Industry and Autonomous DrivingJan Wiegelmann
 
Data Analytics and Artificial Intelligence in the era of Digital Transformation
Data Analytics and Artificial Intelligence in the era of Digital TransformationData Analytics and Artificial Intelligence in the era of Digital Transformation
Data Analytics and Artificial Intelligence in the era of Digital TransformationJan Wiegelmann
 
Flux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / PipelineFlux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / PipelineJan Wiegelmann
 
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, SparkDistributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, SparkJan Wiegelmann
 
How Artificial Intelligence is Reducing Costs and Improving Outcomes in Pharm...
How Artificial Intelligence is Reducing Costs and Improving Outcomes in Pharm...How Artificial Intelligence is Reducing Costs and Improving Outcomes in Pharm...
How Artificial Intelligence is Reducing Costs and Improving Outcomes in Pharm...Jan Wiegelmann
 
10 things A.I. can do better than you
10 things A.I. can do better than you10 things A.I. can do better than you
10 things A.I. can do better than youJan Wiegelmann
 

Plus de Jan Wiegelmann (7)

Analytics for Autonomous Driving with ROS
Analytics for Autonomous Driving with ROSAnalytics for Autonomous Driving with ROS
Analytics for Autonomous Driving with ROS
 
Challenges of Deep Learning in the Automotive Industry and Autonomous Driving
Challenges of Deep Learning in the Automotive Industry and Autonomous DrivingChallenges of Deep Learning in the Automotive Industry and Autonomous Driving
Challenges of Deep Learning in the Automotive Industry and Autonomous Driving
 
Data Analytics and Artificial Intelligence in the era of Digital Transformation
Data Analytics and Artificial Intelligence in the era of Digital TransformationData Analytics and Artificial Intelligence in the era of Digital Transformation
Data Analytics and Artificial Intelligence in the era of Digital Transformation
 
Flux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / PipelineFlux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / Pipeline
 
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, SparkDistributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
 
How Artificial Intelligence is Reducing Costs and Improving Outcomes in Pharm...
How Artificial Intelligence is Reducing Costs and Improving Outcomes in Pharm...How Artificial Intelligence is Reducing Costs and Improving Outcomes in Pharm...
How Artificial Intelligence is Reducing Costs and Improving Outcomes in Pharm...
 
10 things A.I. can do better than you
10 things A.I. can do better than you10 things A.I. can do better than you
10 things A.I. can do better than you
 

Dernier

Buy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy Verified Accounts
 
2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis UsageNeil Kimberley
 
Organizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessOrganizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessSeta Wicaksana
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Kirill Klimov
 
IoT Insurance Observatory: summary 2024
IoT Insurance Observatory:  summary 2024IoT Insurance Observatory:  summary 2024
IoT Insurance Observatory: summary 2024Matteo Carbone
 
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfpollardmorgan
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCRashishs7044
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaoncallgirls2057
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfJos Voskuil
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menzaictsugar
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfRbc Rbcua
 
Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Anamaria Contreras
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCRashishs7044
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?Olivia Kresic
 
Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Riya Pathan
 
Market Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMarket Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMintel Group
 

Dernier (20)

Buy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail Accounts
 
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
 
2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage
 
Organizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessOrganizational Structure Running A Successful Business
Organizational Structure Running A Successful Business
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024
 
IoT Insurance Observatory: summary 2024
IoT Insurance Observatory:  summary 2024IoT Insurance Observatory:  summary 2024
IoT Insurance Observatory: summary 2024
 
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdf
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdf
 
Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?
 
Corporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information TechnologyCorporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information Technology
 
Call Us ➥9319373153▻Call Girls In North Goa
Call Us ➥9319373153▻Call Girls In North GoaCall Us ➥9319373153▻Call Girls In North Goa
Call Us ➥9319373153▻Call Girls In North Goa
 
Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737
 
Japan IT Week 2024 Brochure by 47Billion (English)
Japan IT Week 2024 Brochure by 47Billion (English)Japan IT Week 2024 Brochure by 47Billion (English)
Japan IT Week 2024 Brochure by 47Billion (English)
 
Market Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMarket Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 Edition
 

Distributed Deep Learning with Hadoop and TensorFlow

  • 1. Distributed Deep Learning with Hadoop and TensorFlow
  • 2.
  • 3. Image Classification- 2016 Human Performance AI Performance https://arxiv.org/pdf/1602.07261.pdf 95% 97% The ability to understand the content of an image by using machine learning
  • 4. 4 AI beats human in games - 2016 Komodo beasts H. Nakamura in 2016AlphaGo beats L. Sedols in 2016 Go 4:1 Chess 2:1
  • 5. Breast Cancer Diagnoses - 2017 Pathologist Performance AI Performance https://research.googleblog.com/2017/03/assisting-pathologists-in-detecting.html 73% 92% Doctors often use additional tests to find or diagnose breast cancer The pathologist ended up spending 30 hours on this task on 130 slides A closeup of a lymph node biopsy.
  • 7.
  • 8. The power of 12 GB HBM2 memory and 640 Tensor Cores, delivering 110 TeraFLOPS of performance.
  • 9.
  • 10. AI history à Perceptron 1958 F. Rosenblatt, “Perceptron” model, neuronal networks 1943 W. McCulloch, W. Pitts, “Neuron” as logical element OR function XOR function 1969 M. Minsky, S. Papert, triggers first AI winter feed forward
  • 11. AI history à AI winter 1958 F. Rosenblatt, Perzeptron model, neuronal networks 1987-1993 the second AI winter, desktop computer, LISP machines expensive 1943 W. McCulloch, W. Pitts, neuron as logical element 1980 Boom expert systems, Q&A using logical rules, Prolog 1969 M. Minsky, S. Papert, trigger first AI winter 1993-2001 Moore’s law, Deep blue chess- playing, Standford DARPA challenge
  • 13. Structured data 80% of world’s data is unstructured
  • 14. Fishing in the sea versus fishing in the lake Data Warehouse Data Lake Business Intellingence helps find answers to questions you know. Data Science helps you find the question itself. Any kind of data & schema-on-readStructured data & schema-on-write Parallel processing on big dataSQL-ish queries on database tables Extract, Transform, Load Extract, Load, Transform-on-the-fly Low cost on commodity hardwareExpensive for large data
  • 15. More Data + Bigger Models Accuracy Scale (data size, model size) other approaches neural networks 1990s https://www.scribd.com/document/355752799/Jeff-Dean-s-Lecture-for-YC-AI
  • 16. More Data + Bigger Models + More Computation Accuracy Scale (data size, model size) other approaches neural networks Now https://www.scribd.com/document/355752799/Jeff-Dean-s-Lecture-for-YC-AI more compute
  • 17. More Data + Bigger Models + More Computation = Better Results in Machine Learning
  • 18. Millions of “trip” events each day globally 400+ billion viewing- related events per day Five billion data points for Price Tip feature Movie recommendation Price optimization Routing and price optimization
  • 21. Single machineML specialist Small data Single machineML specialist Small data
  • 22. Single machineML specialist Small data Single machineML specialist Small data X X
  • 23. Single machineML specialist Big data Single machineML specialist Big data X X
  • 24. Train and evaluate machine learning models at scale Single machine Data center How to run more experiments faster and in parallel? How to share and reproduce research? How to go from research to real products?
  • 25. Distributed Machine Learning Data Size Model Size Model parallelism Single machine Data center Data parallelism training very large models exploring several model architectures, hyper- parameter optimization, training several independent models speeds up the training
  • 26. Compute Workload for Training and Evaluation I/O intensive Compute intensive Single machine Data center
  • 27. I/O Workload for Simulation and Testing I/O intensive Compute intensive Single machine Data center
  • 31. 12/19/17 31 TensorFlow Standalone TensorFlow On YARN TensorFlow On multi- colored YARN TensorFlow On Spark TensorFrames TensorFlow On Kubernetes TensorFlow On Mesos Distributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark https://www.slideshare.net/jwiegelmann/distributed -tensorflow-on-hadoop-mesos-kubernetes-spark
  • 32. Data Parallel vs. Model Parallel http://books.nips.cc/papers/files/nips25/NIPS2012_0598.pdf Between-Graph Replication In-Graph Replication
  • 33. Data Shards vs. Data Combined https://static.googleusercontent.com/media/research.google.com/en//archive/large_deep_networks_nips2012.pdf
  • 36. TensorFlow Standalone Dedicated cluster Short & long running jobs Flexibility Manual scheduling of workers No shared resources Hard to share data with other applications No data locality
  • 37. TensorFlow On YARN (Intel) v3 https://github.com/Intel-bigdata/TensorFlowOnYARN released March 12, 2017 / YARN-6043
  • 38. TensorFlow On YARN (Intel) Shared cluster and data Optimised long running jobs Scheduling Data locality (not yet implemented) Not easy to have rapid adoption from upstream Fault tolerance not yet implemented GPU still not seen as a “native” resource on yarn No use of yarn elasticity
  • 39. TensorFlow On multi-colored YARN (Hortonworks) v3 Not yet implemented! https://hortonworks.com/blog/distributed-tensorflow-assembly-hadoop-yarn/
  • 40. TensorFlow On multi-colored YARN (Hortonworks) Shared cluster GPUs shared by multiple tenants and applications Centralised scheduling YARN-3611 Docker support YARN-4793 Native processes Needs YARN wrapper of NVIDIA Docker (GPU driver) Not implemented yet!
  • 41. TensorFlow On Spark (Yahoo) v2 https://github.com/yahoo/TensorFlowOnSpark released January 22, 2017
  • 42. TensorFlow On Spark (Yahoo) Shared cluster and data Data locality through HDFS or other Spark sources Add-hoc training and evaluation Slice and dice data with Spark distributed transformations Scheduling not optimal Necessary to “convert” existing TensorFlow application, although simple process Might need to restart Spark cluster No GPU resource management
  • 43. TensorFrames (Databricks) v2 Scala binding to TF via JNI https://github.com/databricks/tensorframes released Feb 28, 2016
  • 44. TensorFrames (Databricks) Possible shared cluster TensorFrame infers the shapes for small tensors (no analyse required) Data locality via RDD Experimental Still not centralised scheduling, TF and Spark need to be deployed and scheduled separately TF and Spark might not be collocated Might need data transfer between some nodes
  • 46. TensorFlow On Kubernetes Shared cluster Centralised scheduling by Kubernetes Solved network orchestration, federation etc. Experimental support for managing NVIDIA GPUs (at this time better than yarn however) Fault tolerance Data locality
  • 48. TensorFlow On Mesos Shared cluster GPU-based scheduling Short and long running jobs Memory footprint Number of services relative to Kubernetes Fault tolerance Data locality
  • 49. Hidden Technical Debt in Machine Learning Systems https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf Google, 2015
  • 50. Hidden Technical Debt in Machine Learning Systems https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf Google, 2015
  • 51. http://stevenwhang.com/tfx_paper.pdf TFX: A TensorFlow-Based Production-Scale Machine Learning Platform Google, 2017
  • 54. Pricing for 890,000 real-time predictions w/o training AWS: Compute Fees + Prediction Fees = $8.40 + $96.44 = $104.84 per month Google: Prediction $0.10 per thousand predictions, plus $0.40 per hour = $377 per month Azure: Packages $0, $100,13, $1.000,06, $9.999,98 = $1.000 per month Q3, 2017
  • 56. High-level Development Process for Autonomous Vehicles 1 Collect sensors data 3 Autonomous Driving 2 Model Engineering Data Logger Control Unit Big Data Trained Model Data Center Agenda
  • 57. Sensors Udacity Lincoln MKZ Camera 3x Blackfly GigE Camera, 20 Hz Lidar Velodyne HDL-32E, 9.5 Hz IMU Xsens, 400 Hz GPS 2x fixed, 1 Hz CAN bus, 1,1 kHz Robot Operating System Data 3 GB per minute https://github.com/udacity/self-driving-car
  • 58. Sensors Spec Sensor blinding, sunlight, darkness rain, fog, snow non-metal objects wind/ high velocity resolution range data Ultrasonic yes yes yes no + + + Lidar yes no yes yes +++ ++ + Radar yes yes no yes ++ +++ + Camera no no yes yes +++ +++ +++
  • 59. Machine Learning 101 Observations State Estimation Modeling & Prediction Planning Controls f(x) Controls Observations
  • 60. Machine Learning for Autonomous Driving + Sensor Fusion clustering, segmentation, pattern recognition + Road ego-motion, image processing and pattern recognition + Localization simultaneous localization and mapping + Situation Understanding detection and classification + Trajectory Planning motion planning and control + Control Strategy reinforcement and supervised learning + Driver Model image processing and pattern recognition
  • 61. Machine Learning Cycle Data collection for training/test Feature engineering I/O workload Model development and architecture Compute workload I/O workload Training and evaluation Re- Simulation and Testing Scaling and monitoring Model deployment versioning 1 2 3 Model tuning
  • 62. Flux – Open Machine Learning Stack Training & Test data Compute + Network + Storage Deploy model ML Development & Catalog & REST API ML-Specialists Feature Engineering Training Evaluation Re-Simulation Testing CaffeOnSpark Sample Model Prediction Batch Regression Cluster Dataset Correlation Centroid Anomaly Test Scores ü Mainly open source ü No vendor lock in ü Scale-out architecture ü Multi user support ü Resource management ü Job scheduling ü Speed-up training ü Speed-up simulation
  • 63. Feature Engineering + Hadoop InputFormat and Record Reader for Rosbag + Process Rosbag with Spark, Yarn, MapReduce, Hadoop Streaming API, … + Spark RDD are cached and optimized for analysis Ros bag Processing Engine Computer Network Storage Advanced Analytics RDD Record Reader RDD DataFrame, DataSet SQL, Spark APIs NumPy Ros Msg
  • 64. Training & Evaluation + Tensorflow ROSRecordDataset + Protocol Buffers to serialize records + Save time because data conversion not needed + Save storage because data duplication not needed Training Engine Machine Learning Ros bag Computer Network Storage ROS Dataset Ros msg
  • 65. Re-Simulation & Testing + Use Spark for preprocessing, transformation, cleansing, aggregation, time window selection before publish to ROS topics + Use Re-Simulation framework of choice to subscribe to the ROS topics Engine Re-Simulation with framework of choice Computer Network Storage Ros bag Ros topic core subscribe publish
  • 68. Think Big Business Strategy Data Strategy Technology Strategy Agile Delivery Model Business Case Validation Prototypes, MVPs Data Exploration Data AcquisitionStart Small Value Proposition
  • 69. + Classification, Regression, Clustering, Collaborative Filtering, Anomaly Detection + Supervised/Unsupervised Reinforcement Learning, Deep Learning, CNN + Model Training, Evaluation, Testing, Simulation, Inference + Big Data Strategy, Consulting, Data Lab, Data Science as a Service + Data Collection, Cleaning, Analyzing, Modeling, Validation, Visualization + Business Case Validation, Prototyping, MVPs, Dashboards Data Science Machine Learning
  • 70. + Architecture, DevOps, Cloud Building + App. Management Hadoop Ecosystem + Managed Infrastructure Services + Compute, Network, Storage, Firewall, Loadbalancer, DDoS, Protection + Continuous Integration and Deployment + Data Pipelines (Acquisition, Ingestion, Analytics, Visualization) + Distributed Data Architectures + Data Processing Backend + Hadoop Ecosystem + Test Automation and Testing Data Engineering Data Operations
  • 71. “Culture eats strategy for breakfast, technology for lunch, and products for dinner, and soon thereafter everything else too.” Peter Drucker