SlideShare a Scribd company logo
1 of 29
Deep Learning
Frameworks with
Spark and GPUs
Abstract
Spark is a powerful, scalable, real-time data analytics engine that is fast
becoming the de facto hub for data science and big data. However, in
parallel, GPU clusters is fast becoming the default way to quickly
develop and train deep learning models. As data science teams and
data savvy companies mature, they'll need to invest in both platforms if
they intend to leverage both big data and artificial intelligence for
competitive advantage. We'll discuss and show in action an
examination of TensorflowOnSpark, CaffeOnSpark, DeepLearning4J,
IBM's SystemML, and Intel's BigDL and distributed versions of various
deep learning frameworks, namely TensorFlow, Caffe, and Torch.
Agenda
1. The Current Deep Learning Landscape
2. Small Case Study
3. Practical Considerations
4. Future Work and Things on the Horizon
The Deep Learning Landscape
Frameworks - Tensorflow, Caffe2, PyTorch, MXNet, Caffe, Torch, Theano, etc
Boutique Frameworks - TensorflowOnSpark, CaffeOnSpark
Data Backends - File system, Amazon EFS, Spark, Hadoop, etc, etc.
Cloud Ecosystems - AWS, GCP, IBM Cloud, etc, etc
Why Spark
Ecosystem
● Many…
○ Data sources
○ Environments
○ Applications
Real-time data
● In-memory RDDs
(resilient distributed
data sets)
● HDFS integration
Data Scientist Workflow
● DataFrames
● SQL
● APIs
● Pipelining w/ job chains or Spark
Approaches to Deep Learning + Spark
Yahoo: CaffeOnSpark, TensorFlowOnSpark
Designed to run on existing Spark and
Hadoop clusters, and use existing
Spark libraries like SparkSQL or Spark’s
MLlib machine learning libraries
Should be noted that the Caffe and
TensorFlow versions used by these lag
the release version by about 4-6 weeks
Source: Yahoo
Skymind.ai: DeepLearning4J
Deeplearning4j is an open-source,
distributed deep-learning library written
for Java and Scala.
DL4J can import neural net models
from most major frameworks via
Keras, including TensorFlow, Caffe,
Torch and Theano. Keras is employed
as Deeplearning4j's Python API.
Source: deeplearning4j.org
IBM: SystemML
Apache SystemML runs on top of
Apache Spark, where it automatically
scales your data, line by line,
determining whether your code should
be run on the driver or an Apache Spark
cluster.
● Algorithm customizability via R-like and
Python-like languages.
● Multiple execution modes, including Spark
MLContext, Spark Batch, Hadoop Batch,
Standalone, and JMLC.
● Automatic optimization based on data and
cluster characteristics to ensure both
efficiency and scalability.
● Limited set of algorithms supported so far Source: Niketan Panesar
Intel: BigDL
Modeled after Torch, supports Scala
and Python programs
Scales out via Spark
Leverages IBM MKL (Math Kernel
Library)
Source: MSDN
Google: TensorFlow
Flexible, powerful deep learning
framework that supports CPU, GPU,
multi-GPU, and multi-server GPU with
Tensorflow Distributed
Keras support
Strong ecosystem (we’ll talk more
about this)
Source:
Google
Why We Chose TensorFlow for Our Study
TensorFlow’s Web Popularity
We decided to focus on TensorFlow as it represent the majority the deep
learning framework usage in the market right now.
Source: http://redmonk.com/fryan/2016/06/06/a-look-at-popular-machine-learning-frameworks/
TensorFlow’s Academic Popularity
It is also worth noting that TensorFlow
represents a large portion of what the
research community is currently interested in.
Source: https://medium.com/@karpathy/a-peek-at-trends-in-machine-learning-ab8a1085a106
A Quick Case Study
Model and Benchmark Information
Our model uses the following:
● Dataset: CIFAR10
● Architecture: Pictured to the right
● Optimizer: Adam
For benchmarking, we will consider:
● Images per second
● Time to 85% accuracy (averaged across
10 runs)
● All models run on p2 AWS instances
● Hardware is homogenous!!!
Model Configurations:
● Single server - CPU
● Single server - GPU
● Single server - multi-GPU
● Multi server distributed TensorFlow
● Multi server TensorFlow on Spark
Input 2 Convolution Module
3X3
Convolution
3X3
Convolution
2X2 (2) Max
Pooling
4 Convolution Module
3X3
Convolution
3X3
Convolution
2X2 (2) Max
Pooling
3X3
Convolution
3X3
Convolution
2 Convolution
Module (64 depth)
2 Convolution
Module (128 depth)
Fully Connected
Layer (1024 depth)
4 Convolution
Module (256 depth)
Fully Connected
Layer (1024 depth)
Output Layer (10
depth)
TensorFlow - Single Server CPU and GPU
● This is really well documented and the basis
for why most of the frameworks were created.
Using GPUs for deep learning creates high
returns quickly.
● Managing dependencies for GPU-enabled
deep learning frameworks can be tedious
(cuda drivers, cuda versions, cudnn versions,
framework versions). Bitfusion can help
alleviate a lot of these issues with our AMIs
and docker containers
● When going from CPU to GPU, it can be hugely
beneficial to explicitly put data tasks on CPU
and number crunching (gradient calculations)
on GPUs with tf.device().
Performance
CPU:
• Images per second ~ 130
• Time to 85% accuracy ~ 16570
GPU:
• Images per second ~ 1420
• Time to 85% accuracy ~ 1545 sec
Note: CPU can be sped up on more CPU focused machines.
TensorFlow - Single Server Multi-GPU
Implementation Details
● For this example we used 3 GPUs on
a single machine (p2.8xlarge)
Code Changes
● Need to write code to define device
placement with tf.device()
● Write code to calculate gradients on
each GPU and then calculates an
average gradient for the update
Performance
• Images per second ~ 3200
• Time to 85% accuracy ~ 735 sec
Distributed TensorFlow
Implementation Details
For this example we used 1 parameter server and 3 worker
servers.
We used HDFS, but other shared file systems or native file
systems could be used
Code Changes
Need to write code to define a parameter server and worker
server
Need to implement special session: tf.MonitoredSession()
Either a cluster manager needs to be set up or jobs need to be
kicked off individually on workers and parameter servers
Need to balance batch size and learning rate to optimize
Performance
• Images per second ~ 1630
• Time to 85% accuracy ~ 1460
TensorFlow on Spark
Implementation Details
For this example we used 1 parameter server and 3 worker
servers.
Requires data to be placed in HDFS
Code Changes
TensorFlow on Spark requires renaming a few TF variables
from a vanilla distributed TF implementation.
(tf.train.Server() becomes
TFNode.start_cluster_server(), etc.)
Need to understand multiple utils for reading/writing to
HDFS
Performance
• Images per second ~ 1970
• Time to 85% accuracy ~ 1210
Performance Summary
● It should be noted again that the CPU
performance is on a p2.xlarge
● For this example, local communication
through multi-GPU is much more
efficient than distributed
communication
● TensorFlow on Spark provided
speedups in the distributed setting
(most likely from RDMA)
● Tweaking of the batch size and learning
rate were required to optimize
throughput
When to Use What and Practical Considerations
Practical Considerations Continued
Number of Updates
● If the number of updates and the size of the updates are considerable, multiple
GPU configurations on a single server should be
Hardware Configurations
● Network speeds play a crucial role in distributed model settings
● GPUs connected with RDMA over Infiniband have shown significant speedup
over more basic gRPC over ethernet
● IB was 2X faster for 2 nodes, 10X for 4 nodes versus TCP (close in
performance to gRPC)
gRPC vs RDMA vs MPI
Continued
gRPC and MPI are possible transports for distributed TF, however:
● gRPC latency is 100-200us/msg due to userspace <-> kernel switches
● MPI latency is 1-3us/message due to OS bypass, but requires communications
to be serialized to a single thread. Otherwise, 100-200us latency
● TensorFlow typically spawns ~100 threads making MPI suboptimal
Our approach, powered by Bitfusion Core compute virtualization engine, is to use
native IB verbs + RDMA for the primary transport
● 1-3 us per message across all threads of execution
● Use multiple channels simultaneously where applicable: PCIe, multiple IB ports
Practical Considerations
Size of Data
● If the size of the input data is especially large (large images for example) the
entire model might not fit onto a single GPU
● If the number of records is prohibitively large a shared file system or database
might be required
● If the number of records is large convergence can be sped up using multiple
GPUs or distributed models
Size of Model
● If the size of the network is larger than your GPU’s memory, splitting the model
across multi-GPUs (model parallel)
Key Takeaways
● GPUs are almost always better
● Hardware/Network setup matters a lot in distributed settings!
● Saturating GPUs locally should be a priority before going to a distributed setting
● If your data is already built on Spark, TensorFlow on Spark provides an easy
way to integrate with your current data stack
● RDMA and infiniband is natively supported by TensorFlow on Spark which can
provide significant network communication speedups
Future Work
● Assess caffe2, and specifically the strategies outlined in Facebook’s new paper:
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour, and the benefits
of their Gloo communication backend
● Look into the new Tensorflow MPI integration in tf.contrib and see how their
implementation looks compared to our results
● Show how Distributed TF and TFonSpark scale on specialized IB hardware and
compare results
● For up-to-date studies on scaling deep learning workloads, you can find out
more at bitfusion.io
Bitfusion Flex
End-to-end, elastic infrastructure for building deep learning and AI applications
● Can utilize RDMA and Infiniband when remote GPUs are attached
● Never have to transition beyond multi-GPU code environment (remote servers look like local GPUs and
can be utilized without code changes)
● Rich CLI & GUI, interactive Jupyter workspaces, batch scheduling, smart shared resourcing for max
efficiency, and much more
Questions?

More Related Content

What's hot

Scalable Data Science with SparkR
Scalable Data Science with SparkRScalable Data Science with SparkR
Scalable Data Science with SparkRDataWorks Summit
 
Integrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache SparkIntegrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache SparkDatabricks
 
Spark Summit 2016: Connecting Python to the Spark Ecosystem
Spark Summit 2016: Connecting Python to the Spark EcosystemSpark Summit 2016: Connecting Python to the Spark Ecosystem
Spark Summit 2016: Connecting Python to the Spark EcosystemDaniel Rodriguez
 
Speeding Up Spark with Data Compression on Xeon+FPGA with David Ojika
Speeding Up Spark with Data Compression on Xeon+FPGA with David OjikaSpeeding Up Spark with Data Compression on Xeon+FPGA with David Ojika
Speeding Up Spark with Data Compression on Xeon+FPGA with David OjikaDatabricks
 
Don't Let the Spark Burn Your House: Perspectives on Securing Spark
Don't Let the Spark Burn Your House: Perspectives on Securing SparkDon't Let the Spark Burn Your House: Perspectives on Securing Spark
Don't Let the Spark Burn Your House: Perspectives on Securing SparkDataWorks Summit
 
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...Spark Summit
 
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...Spark Summit
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersDataWorks Summit/Hadoop Summit
 
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache SparkDatabricks
 
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache SparkDatabricks
 
Stories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi TorresStories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi TorresSpark Summit
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityJen Aman
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkDatabricks
 
What's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You CareWhat's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You CareDatabricks
 
Elastify Cloud-Native Spark Application with Persistent Memory
Elastify Cloud-Native Spark Application with Persistent MemoryElastify Cloud-Native Spark Application with Persistent Memory
Elastify Cloud-Native Spark Application with Persistent MemoryDatabricks
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Sparkdatamantra
 
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)Uri Laserson
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark Hubert Fan Chiang
 

What's hot (20)

Dask: Scaling Python
Dask: Scaling PythonDask: Scaling Python
Dask: Scaling Python
 
Scalable Data Science with SparkR
Scalable Data Science with SparkRScalable Data Science with SparkR
Scalable Data Science with SparkR
 
Integrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache SparkIntegrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache Spark
 
Spark Summit 2016: Connecting Python to the Spark Ecosystem
Spark Summit 2016: Connecting Python to the Spark EcosystemSpark Summit 2016: Connecting Python to the Spark Ecosystem
Spark Summit 2016: Connecting Python to the Spark Ecosystem
 
Speeding Up Spark with Data Compression on Xeon+FPGA with David Ojika
Speeding Up Spark with Data Compression on Xeon+FPGA with David OjikaSpeeding Up Spark with Data Compression on Xeon+FPGA with David Ojika
Speeding Up Spark with Data Compression on Xeon+FPGA with David Ojika
 
Don't Let the Spark Burn Your House: Perspectives on Securing Spark
Don't Let the Spark Burn Your House: Perspectives on Securing SparkDon't Let the Spark Burn Your House: Perspectives on Securing Spark
Don't Let the Spark Burn Your House: Perspectives on Securing Spark
 
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
 
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
 
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
 
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
 
Stories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi TorresStories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi Torres
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
 
What's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You CareWhat's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You Care
 
Elastify Cloud-Native Spark Application with Persistent Memory
Elastify Cloud-Native Spark Application with Persistent MemoryElastify Cloud-Native Spark Application with Persistent Memory
Elastify Cloud-Native Spark Application with Persistent Memory
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
 
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
 

Similar to Deep Learning with Spark and GPUs

Deep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce SpitlerDeep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce SpitlerDatabricks
 
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsHow to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsAltoros
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Ganesh Raju
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterLinaro
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterLinaro
 
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...MLconf
 
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...E-Commerce Brasil
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceData Works MD
 
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache SparkNear Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache SparkAhsan Javed Awan
 
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...Spark Summit
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practicesLior Sidi
 
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Tyrone Systems
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Databricks
 
Stream Processing
Stream ProcessingStream Processing
Stream Processingarnamoy10
 
Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad ranaData Con LA
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleJim Dowling
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningDataWorks Summit
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Ahsan Javed Awan
 
RAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature EngineeringRAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature EngineeringKeith Kraus
 
Large Model support and Distribute deep learning
Large Model support and Distribute deep learningLarge Model support and Distribute deep learning
Large Model support and Distribute deep learningGanesan Narayanasamy
 

Similar to Deep Learning with Spark and GPUs (20)

Deep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce SpitlerDeep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce Spitler
 
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsHow to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
 
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
 
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data Science
 
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache SparkNear Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
 
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practices
 
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
 
Stream Processing
Stream ProcessingStream Processing
Stream Processing
 
Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad rana
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, Sunnyvale
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learning
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
 
RAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature EngineeringRAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature Engineering
 
Large Model support and Distribute deep learning
Large Model support and Distribute deep learningLarge Model support and Distribute deep learning
Large Model support and Distribute deep learning
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Recently uploaded (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

Deep Learning with Spark and GPUs

  • 2. Abstract Spark is a powerful, scalable, real-time data analytics engine that is fast becoming the de facto hub for data science and big data. However, in parallel, GPU clusters is fast becoming the default way to quickly develop and train deep learning models. As data science teams and data savvy companies mature, they'll need to invest in both platforms if they intend to leverage both big data and artificial intelligence for competitive advantage. We'll discuss and show in action an examination of TensorflowOnSpark, CaffeOnSpark, DeepLearning4J, IBM's SystemML, and Intel's BigDL and distributed versions of various deep learning frameworks, namely TensorFlow, Caffe, and Torch.
  • 3. Agenda 1. The Current Deep Learning Landscape 2. Small Case Study 3. Practical Considerations 4. Future Work and Things on the Horizon
  • 4. The Deep Learning Landscape Frameworks - Tensorflow, Caffe2, PyTorch, MXNet, Caffe, Torch, Theano, etc Boutique Frameworks - TensorflowOnSpark, CaffeOnSpark Data Backends - File system, Amazon EFS, Spark, Hadoop, etc, etc. Cloud Ecosystems - AWS, GCP, IBM Cloud, etc, etc
  • 5. Why Spark Ecosystem ● Many… ○ Data sources ○ Environments ○ Applications Real-time data ● In-memory RDDs (resilient distributed data sets) ● HDFS integration Data Scientist Workflow ● DataFrames ● SQL ● APIs ● Pipelining w/ job chains or Spark
  • 6. Approaches to Deep Learning + Spark
  • 7. Yahoo: CaffeOnSpark, TensorFlowOnSpark Designed to run on existing Spark and Hadoop clusters, and use existing Spark libraries like SparkSQL or Spark’s MLlib machine learning libraries Should be noted that the Caffe and TensorFlow versions used by these lag the release version by about 4-6 weeks Source: Yahoo
  • 8. Skymind.ai: DeepLearning4J Deeplearning4j is an open-source, distributed deep-learning library written for Java and Scala. DL4J can import neural net models from most major frameworks via Keras, including TensorFlow, Caffe, Torch and Theano. Keras is employed as Deeplearning4j's Python API. Source: deeplearning4j.org
  • 9. IBM: SystemML Apache SystemML runs on top of Apache Spark, where it automatically scales your data, line by line, determining whether your code should be run on the driver or an Apache Spark cluster. ● Algorithm customizability via R-like and Python-like languages. ● Multiple execution modes, including Spark MLContext, Spark Batch, Hadoop Batch, Standalone, and JMLC. ● Automatic optimization based on data and cluster characteristics to ensure both efficiency and scalability. ● Limited set of algorithms supported so far Source: Niketan Panesar
  • 10. Intel: BigDL Modeled after Torch, supports Scala and Python programs Scales out via Spark Leverages IBM MKL (Math Kernel Library) Source: MSDN
  • 11. Google: TensorFlow Flexible, powerful deep learning framework that supports CPU, GPU, multi-GPU, and multi-server GPU with Tensorflow Distributed Keras support Strong ecosystem (we’ll talk more about this) Source: Google
  • 12. Why We Chose TensorFlow for Our Study
  • 13. TensorFlow’s Web Popularity We decided to focus on TensorFlow as it represent the majority the deep learning framework usage in the market right now. Source: http://redmonk.com/fryan/2016/06/06/a-look-at-popular-machine-learning-frameworks/
  • 14. TensorFlow’s Academic Popularity It is also worth noting that TensorFlow represents a large portion of what the research community is currently interested in. Source: https://medium.com/@karpathy/a-peek-at-trends-in-machine-learning-ab8a1085a106
  • 15. A Quick Case Study
  • 16. Model and Benchmark Information Our model uses the following: ● Dataset: CIFAR10 ● Architecture: Pictured to the right ● Optimizer: Adam For benchmarking, we will consider: ● Images per second ● Time to 85% accuracy (averaged across 10 runs) ● All models run on p2 AWS instances ● Hardware is homogenous!!! Model Configurations: ● Single server - CPU ● Single server - GPU ● Single server - multi-GPU ● Multi server distributed TensorFlow ● Multi server TensorFlow on Spark Input 2 Convolution Module 3X3 Convolution 3X3 Convolution 2X2 (2) Max Pooling 4 Convolution Module 3X3 Convolution 3X3 Convolution 2X2 (2) Max Pooling 3X3 Convolution 3X3 Convolution 2 Convolution Module (64 depth) 2 Convolution Module (128 depth) Fully Connected Layer (1024 depth) 4 Convolution Module (256 depth) Fully Connected Layer (1024 depth) Output Layer (10 depth)
  • 17. TensorFlow - Single Server CPU and GPU ● This is really well documented and the basis for why most of the frameworks were created. Using GPUs for deep learning creates high returns quickly. ● Managing dependencies for GPU-enabled deep learning frameworks can be tedious (cuda drivers, cuda versions, cudnn versions, framework versions). Bitfusion can help alleviate a lot of these issues with our AMIs and docker containers ● When going from CPU to GPU, it can be hugely beneficial to explicitly put data tasks on CPU and number crunching (gradient calculations) on GPUs with tf.device(). Performance CPU: • Images per second ~ 130 • Time to 85% accuracy ~ 16570 GPU: • Images per second ~ 1420 • Time to 85% accuracy ~ 1545 sec Note: CPU can be sped up on more CPU focused machines.
  • 18. TensorFlow - Single Server Multi-GPU Implementation Details ● For this example we used 3 GPUs on a single machine (p2.8xlarge) Code Changes ● Need to write code to define device placement with tf.device() ● Write code to calculate gradients on each GPU and then calculates an average gradient for the update Performance • Images per second ~ 3200 • Time to 85% accuracy ~ 735 sec
  • 19. Distributed TensorFlow Implementation Details For this example we used 1 parameter server and 3 worker servers. We used HDFS, but other shared file systems or native file systems could be used Code Changes Need to write code to define a parameter server and worker server Need to implement special session: tf.MonitoredSession() Either a cluster manager needs to be set up or jobs need to be kicked off individually on workers and parameter servers Need to balance batch size and learning rate to optimize Performance • Images per second ~ 1630 • Time to 85% accuracy ~ 1460
  • 20. TensorFlow on Spark Implementation Details For this example we used 1 parameter server and 3 worker servers. Requires data to be placed in HDFS Code Changes TensorFlow on Spark requires renaming a few TF variables from a vanilla distributed TF implementation. (tf.train.Server() becomes TFNode.start_cluster_server(), etc.) Need to understand multiple utils for reading/writing to HDFS Performance • Images per second ~ 1970 • Time to 85% accuracy ~ 1210
  • 21. Performance Summary ● It should be noted again that the CPU performance is on a p2.xlarge ● For this example, local communication through multi-GPU is much more efficient than distributed communication ● TensorFlow on Spark provided speedups in the distributed setting (most likely from RDMA) ● Tweaking of the batch size and learning rate were required to optimize throughput
  • 22. When to Use What and Practical Considerations
  • 23. Practical Considerations Continued Number of Updates ● If the number of updates and the size of the updates are considerable, multiple GPU configurations on a single server should be Hardware Configurations ● Network speeds play a crucial role in distributed model settings ● GPUs connected with RDMA over Infiniband have shown significant speedup over more basic gRPC over ethernet ● IB was 2X faster for 2 nodes, 10X for 4 nodes versus TCP (close in performance to gRPC)
  • 24. gRPC vs RDMA vs MPI Continued gRPC and MPI are possible transports for distributed TF, however: ● gRPC latency is 100-200us/msg due to userspace <-> kernel switches ● MPI latency is 1-3us/message due to OS bypass, but requires communications to be serialized to a single thread. Otherwise, 100-200us latency ● TensorFlow typically spawns ~100 threads making MPI suboptimal Our approach, powered by Bitfusion Core compute virtualization engine, is to use native IB verbs + RDMA for the primary transport ● 1-3 us per message across all threads of execution ● Use multiple channels simultaneously where applicable: PCIe, multiple IB ports
  • 25. Practical Considerations Size of Data ● If the size of the input data is especially large (large images for example) the entire model might not fit onto a single GPU ● If the number of records is prohibitively large a shared file system or database might be required ● If the number of records is large convergence can be sped up using multiple GPUs or distributed models Size of Model ● If the size of the network is larger than your GPU’s memory, splitting the model across multi-GPUs (model parallel)
  • 26. Key Takeaways ● GPUs are almost always better ● Hardware/Network setup matters a lot in distributed settings! ● Saturating GPUs locally should be a priority before going to a distributed setting ● If your data is already built on Spark, TensorFlow on Spark provides an easy way to integrate with your current data stack ● RDMA and infiniband is natively supported by TensorFlow on Spark which can provide significant network communication speedups
  • 27. Future Work ● Assess caffe2, and specifically the strategies outlined in Facebook’s new paper: Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour, and the benefits of their Gloo communication backend ● Look into the new Tensorflow MPI integration in tf.contrib and see how their implementation looks compared to our results ● Show how Distributed TF and TFonSpark scale on specialized IB hardware and compare results ● For up-to-date studies on scaling deep learning workloads, you can find out more at bitfusion.io
  • 28. Bitfusion Flex End-to-end, elastic infrastructure for building deep learning and AI applications ● Can utilize RDMA and Infiniband when remote GPUs are attached ● Never have to transition beyond multi-GPU code environment (remote servers look like local GPUs and can be utilized without code changes) ● Rich CLI & GUI, interactive Jupyter workspaces, batch scheduling, smart shared resourcing for max efficiency, and much more