Training deep neural nets can take long time and heavy resources. By leveraging an existing distributed versions of TensorFlow and Hadoop can train neural nets quickly and efficiently.
3. Image Classification- 2016
Human Performance AI Performance
https://arxiv.org/pdf/1602.07261.pdf
95% 97%
The ability to understand the content of an image by using machine learning
4. 4
AI beats human in games - 2016
Komodo beasts H. Nakamura in 2016AlphaGo beats L. Sedols in 2016
Go 4:1 Chess 2:1
5. Breast Cancer Diagnoses - 2017
Pathologist Performance AI Performance
https://research.googleblog.com/2017/03/assisting-pathologists-in-detecting.html
73% 92%
Doctors often use additional tests to find or diagnose breast cancer
The pathologist ended up
spending 30 hours on this
task on 130 slides
A closeup of a lymph node biopsy.
8. The power of 12 GB HBM2 memory and 640 Tensor
Cores, delivering 110 TeraFLOPS of performance.
9.
10. AI history à Perceptron
1958 F. Rosenblatt,
“Perceptron” model,
neuronal networks
1943 W. McCulloch,
W. Pitts, “Neuron” as
logical element
OR function XOR function
1969 M. Minsky,
S. Papert, triggers
first AI winter
feed forward
11. AI history à AI winter
1958 F. Rosenblatt,
Perzeptron model,
neuronal networks
1987-1993 the second
AI winter, desktop
computer, LISP
machines expensive
1943 W. McCulloch,
W. Pitts, neuron as
logical element
1980 Boom expert
systems, Q&A using
logical rules, Prolog
1969 M. Minsky,
S. Papert, trigger
first AI winter
1993-2001
Moore’s law, Deep
blue chess-
playing, Standford
DARPA challenge
14. Fishing in the sea versus fishing in the lake
Data Warehouse Data Lake
Business Intellingence helps find
answers to questions you know.
Data Science helps you find the
question itself.
Any kind of data & schema-on-readStructured data & schema-on-write
Parallel processing on big dataSQL-ish queries on database tables
Extract, Transform, Load Extract, Load, Transform-on-the-fly
Low cost on commodity hardwareExpensive for large data
15. More Data + Bigger Models
Accuracy
Scale (data size, model size)
other approaches
neural networks
1990s
https://www.scribd.com/document/355752799/Jeff-Dean-s-Lecture-for-YC-AI
16. More Data + Bigger Models + More Computation
Accuracy
Scale (data size, model size)
other approaches
neural networks
Now
https://www.scribd.com/document/355752799/Jeff-Dean-s-Lecture-for-YC-AI
more compute
17. More Data + Bigger Models + More Computation
= Better Results in Machine Learning
18. Millions of “trip”
events each day globally
400+ billion viewing-
related events per day
Five billion data points
for Price Tip feature
Movie
recommendation
Price
optimization
Routing and price
optimization
24. Train and evaluate machine learning models at scale
Single machine Data center
How to run more experiments faster and in parallel?
How to share and reproduce research?
How to go from research to real products?
25. Distributed Machine Learning
Data Size
Model Size
Model parallelism
Single machine
Data center
Data
parallelism
training very large models exploring several model
architectures, hyper-
parameter optimization,
training several
independent models
speeds up the training
26. Compute Workload for Training and Evaluation
I/O intensive
Compute
intensive
Single machine
Data center
27. I/O Workload for Simulation and Testing
I/O intensive
Compute
intensive
Single machine
Data center
36. TensorFlow Standalone
Dedicated cluster
Short & long running jobs
Flexibility
Manual scheduling of workers
No shared resources
Hard to share data with other
applications
No data locality
37. TensorFlow On YARN (Intel) v3
https://github.com/Intel-bigdata/TensorFlowOnYARN
released March 12, 2017 / YARN-6043
38. TensorFlow On YARN (Intel)
Shared cluster and data
Optimised long running jobs
Scheduling
Data locality (not yet implemented)
Not easy to have rapid adoption
from upstream
Fault tolerance not yet implemented
GPU still not seen as a “native”
resource on yarn
No use of yarn elasticity
39. TensorFlow On multi-colored YARN (Hortonworks)
v3
Not yet implemented!
https://hortonworks.com/blog/distributed-tensorflow-assembly-hadoop-yarn/
40. TensorFlow On multi-colored YARN (Hortonworks)
Shared cluster
GPUs shared by multiple tenants
and applications
Centralised scheduling
YARN-3611 Docker support
YARN-4793 Native processes
Needs YARN wrapper of NVIDIA
Docker (GPU driver)
Not implemented yet!
41. TensorFlow On Spark (Yahoo) v2
https://github.com/yahoo/TensorFlowOnSpark
released January 22, 2017
42. TensorFlow On Spark (Yahoo)
Shared cluster and data
Data locality through HDFS or
other Spark sources
Add-hoc training and evaluation
Slice and dice data with Spark
distributed transformations
Scheduling not optimal
Necessary to “convert” existing
TensorFlow application, although
simple process
Might need to restart Spark cluster
No GPU resource management
44. TensorFrames (Databricks)
Possible shared cluster
TensorFrame infers the shapes
for small tensors (no analyse
required)
Data locality via RDD
Experimental
Still not centralised scheduling, TF
and Spark need to be deployed
and scheduled separately
TF and Spark might not be
collocated
Might need data transfer between
some nodes
46. TensorFlow On Kubernetes
Shared cluster
Centralised scheduling by
Kubernetes
Solved network orchestration,
federation etc.
Experimental support for
managing NVIDIA GPUs (at this
time better than yarn however)
Fault tolerance
Data locality
48. TensorFlow On Mesos
Shared cluster
GPU-based scheduling
Short and long running jobs
Memory footprint
Number of services relative to
Kubernetes
Fault tolerance
Data locality
49. Hidden Technical Debt in Machine Learning Systems
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
Google, 2015
50. Hidden Technical Debt in Machine Learning Systems
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
Google, 2015
56. High-level Development Process for Autonomous Vehicles
1 Collect
sensors data
3 Autonomous
Driving
2 Model
Engineering
Data Logger Control Unit
Big Data Trained Model
Data Center
Agenda
57. Sensors Udacity Lincoln MKZ
Camera 3x Blackfly GigE Camera, 20 Hz
Lidar Velodyne HDL-32E, 9.5 Hz
IMU Xsens, 400 Hz
GPS 2x fixed, 1 Hz
CAN bus, 1,1 kHz
Robot Operating System
Data 3 GB per minute
https://github.com/udacity/self-driving-car
58. Sensors Spec
Sensor blinding,
sunlight,
darkness
rain, fog,
snow
non-metal
objects
wind/ high
velocity
resolution range data
Ultrasonic yes yes yes no + + +
Lidar yes no yes yes +++ ++ +
Radar yes yes no yes ++ +++ +
Camera no no yes yes +++ +++ +++
60. Machine Learning for Autonomous Driving
+ Sensor Fusion clustering, segmentation, pattern recognition
+ Road ego-motion, image processing and pattern recognition
+ Localization simultaneous localization and mapping
+ Situation Understanding detection and classification
+ Trajectory Planning motion planning and control
+ Control Strategy reinforcement and supervised learning
+ Driver Model image processing and pattern recognition
61. Machine Learning Cycle
Data collection
for training/test
Feature
engineering
I/O workload
Model development
and architecture
Compute workload I/O workload
Training and
evaluation
Re- Simulation
and Testing
Scaling and
monitoring
Model deployment
versioning
1 2 3
Model tuning
62. Flux – Open Machine Learning Stack
Training & Test data
Compute + Network + Storage
Deploy model
ML Development & Catalog & REST API
ML-Specialists
Feature
Engineering
Training
Evaluation
Re-Simulation
Testing
CaffeOnSpark
Sample Model Prediction Batch Regression Cluster
Dataset Correlation Centroid Anomaly Test Scores
ü Mainly open source
ü No vendor lock in
ü Scale-out architecture
ü Multi user support
ü Resource management
ü Job scheduling
ü Speed-up training
ü Speed-up simulation
63. Feature Engineering
+ Hadoop InputFormat and
Record Reader for Rosbag
+ Process Rosbag with Spark,
Yarn, MapReduce, Hadoop
Streaming API, …
+ Spark RDD are cached and
optimized for analysis
Ros
bag
Processing
Engine
Computer
Network
Storage
Advanced
Analytics
RDD
Record
Reader
RDD
DataFrame, DataSet
SQL, Spark APIs
NumPy
Ros
Msg
64. Training & Evaluation
+ Tensorflow ROSRecordDataset
+ Protocol Buffers to serialize
records
+ Save time because data
conversion not needed
+ Save storage because data
duplication not needed
Training
Engine
Machine
Learning
Ros
bag
Computer
Network
Storage
ROS
Dataset
Ros
msg
65. Re-Simulation & Testing
+ Use Spark for preprocessing,
transformation, cleansing,
aggregation, time window
selection before publish to ROS
topics
+ Use Re-Simulation framework
of choice to subscribe to the
ROS topics
Engine
Re-Simulation
with framework
of choice
Computer
Network
Storage
Ros
bag
Ros
topic
core
subscribe
publish
68. Think Big Business Strategy
Data Strategy
Technology Strategy
Agile Delivery Model
Business Case Validation
Prototypes, MVPs
Data Exploration
Data AcquisitionStart Small
Value
Proposition
69. + Classification, Regression, Clustering,
Collaborative Filtering, Anomaly Detection
+ Supervised/Unsupervised Reinforcement
Learning, Deep Learning, CNN
+ Model Training, Evaluation, Testing,
Simulation, Inference
+ Big Data Strategy, Consulting, Data
Lab, Data Science as a Service
+ Data Collection, Cleaning, Analyzing,
Modeling, Validation, Visualization
+ Business Case Validation,
Prototyping, MVPs, Dashboards
Data Science Machine Learning
70. + Architecture, DevOps, Cloud Building
+ App. Management Hadoop Ecosystem
+ Managed Infrastructure Services
+ Compute, Network, Storage, Firewall,
Loadbalancer, DDoS, Protection
+ Continuous Integration and Deployment
+ Data Pipelines (Acquisition,
Ingestion, Analytics, Visualization)
+ Distributed Data Architectures
+ Data Processing Backend
+ Hadoop Ecosystem
+ Test Automation and Testing
Data Engineering Data Operations
71. “Culture eats strategy for breakfast,
technology for lunch, and products for dinner,
and soon thereafter everything else too.”
Peter Drucker