Soumettre la recherche
Mettre en ligne
A Scaleable Implemenation of Deep Leaning on Spark- Alexander Ulanov
•
1 j'aime
•
678 vues
Spark Summit
Suivre
A Scaleable Implemenation of Deep Leaning on Spark- Alexander Ulanov
Lire moins
Lire la suite
Données & analyses
Signaler
Partager
Signaler
Partager
1 sur 14
Télécharger maintenant
Télécharger pour lire hors ligne
Recommandé
A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)
A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)
Alexander Ulanov
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
MLconf
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
MLconf
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Spark Summit
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache Spark
Databricks
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
MLconf
Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!
DataWorks Summit
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
MLconf
Contenu connexe
Tendances
Implementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on Spark
Dalei Li
Josh Patterson MLconf slides
Josh Patterson MLconf slides
MLconf
Modeling with Hadoop kdd2011
Modeling with Hadoop kdd2011
Milind Bhandarkar
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
Emanuel Di Nardo
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
Impetus Technologies
Caffe framework tutorial2
Caffe framework tutorial2
Park Chunduck
Introduction to apache horn (incubating)
Introduction to apache horn (incubating)
Edward Yoon
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
SigOpt
First steps with Keras 2: A tutorial with Examples
First steps with Keras 2: A tutorial with Examples
Felipe
Generalized Linear Models with H2O
Generalized Linear Models with H2O
Sri Ambati
How to win data science competitions with Deep Learning
How to win data science competitions with Deep Learning
Sri Ambati
Mahoney mlconf-nov13
Mahoney mlconf-nov13
MLconf
Scalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNet
Amazon Web Services
Tokyo Webmining Talk1
Tokyo Webmining Talk1
Kenta Oono
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher Ré
Jen Aman
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
MLconf
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow
Jen Aman
Language translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlow
S N
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
MLconf
[243] turning data into value
[243] turning data into value
NAVER D2
Tendances
(20)
Implementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on Spark
Josh Patterson MLconf slides
Josh Patterson MLconf slides
Modeling with Hadoop kdd2011
Modeling with Hadoop kdd2011
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
Caffe framework tutorial2
Caffe framework tutorial2
Introduction to apache horn (incubating)
Introduction to apache horn (incubating)
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
First steps with Keras 2: A tutorial with Examples
First steps with Keras 2: A tutorial with Examples
Generalized Linear Models with H2O
Generalized Linear Models with H2O
How to win data science competitions with Deep Learning
How to win data science competitions with Deep Learning
Mahoney mlconf-nov13
Mahoney mlconf-nov13
Scalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNet
Tokyo Webmining Talk1
Tokyo Webmining Talk1
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher Ré
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlow
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
[243] turning data into value
[243] turning data into value
En vedette
Netflix branding stumbles
Netflix branding stumbles
Mayur Verma
Distributed Heterogeneous Mixture Learning On Spark
Distributed Heterogeneous Mixture Learning On Spark
Spark Summit
Distributed Data Processing using Spark by Panos Labropoulos_and Sarod Yataw...
Distributed Data Processing using Spark by Panos Labropoulos_and Sarod Yataw...
Spark Summit
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark Summit
Natural Sparksmanship – The Art of Making an Analytics Enterprise Cross the C...
Natural Sparksmanship – The Art of Making an Analytics Enterprise Cross the C...
Spark Summit
Breaking Down Analytical and Computational Barriers Across the Energy Industr...
Breaking Down Analytical and Computational Barriers Across the Energy Industr...
Spark Summit
Sparkling Random Ferns by P Dendek and M Fedoryszak
Sparkling Random Ferns by P Dendek and M Fedoryszak
Spark Summit
Data Science at Scale by Sarah Guido
Data Science at Scale by Sarah Guido
Spark Summit
Netflix in France
Netflix in France
Selenia Furnari
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
DataStax
Distributed Heterogeneous Mixture Learning On Spark
Distributed Heterogeneous Mixture Learning On Spark
Spark Summit
Sparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya Hristakeva
Spark Summit
ベンダーロックインフリーのビジネスクラウドの世界
ベンダーロックインフリーのビジネスクラウドの世界
ミランティスジャパン株式会社
Inside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick Reiss
Spark Summit
Student Presentation Sample (Netflix) -- Information Security 365/765 -- UW-M...
Student Presentation Sample (Netflix) -- Information Security 365/765 -- UW-M...
Nicholas Davis
PowerStream: Propelling Energy Innovation with Predictive Analytics
PowerStream: Propelling Energy Innovation with Predictive Analytics
Spark Summit
Shifting Data Science into High Gear
Shifting Data Science into High Gear
Spark Summit
Netflix - Book de Campanha
Netflix - Book de Campanha
Rafael Brandani
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Spark Summit
The Internet of Everywhere—How IBM The Weather Company Scales
The Internet of Everywhere—How IBM The Weather Company Scales
Spark Summit
En vedette
(20)
Netflix branding stumbles
Netflix branding stumbles
Distributed Heterogeneous Mixture Learning On Spark
Distributed Heterogeneous Mixture Learning On Spark
Distributed Data Processing using Spark by Panos Labropoulos_and Sarod Yataw...
Distributed Data Processing using Spark by Panos Labropoulos_and Sarod Yataw...
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Natural Sparksmanship – The Art of Making an Analytics Enterprise Cross the C...
Natural Sparksmanship – The Art of Making an Analytics Enterprise Cross the C...
Breaking Down Analytical and Computational Barriers Across the Energy Industr...
Breaking Down Analytical and Computational Barriers Across the Energy Industr...
Sparkling Random Ferns by P Dendek and M Fedoryszak
Sparkling Random Ferns by P Dendek and M Fedoryszak
Data Science at Scale by Sarah Guido
Data Science at Scale by Sarah Guido
Netflix in France
Netflix in France
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
Distributed Heterogeneous Mixture Learning On Spark
Distributed Heterogeneous Mixture Learning On Spark
Sparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya Hristakeva
ベンダーロックインフリーのビジネスクラウドの世界
ベンダーロックインフリーのビジネスクラウドの世界
Inside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick Reiss
Student Presentation Sample (Netflix) -- Information Security 365/765 -- UW-M...
Student Presentation Sample (Netflix) -- Information Security 365/765 -- UW-M...
PowerStream: Propelling Energy Innovation with Predictive Analytics
PowerStream: Propelling Energy Innovation with Predictive Analytics
Shifting Data Science into High Gear
Shifting Data Science into High Gear
Netflix - Book de Campanha
Netflix - Book de Campanha
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
The Internet of Everywhere—How IBM The Weather Company Scales
The Internet of Everywhere—How IBM The Weather Company Scales
Similaire à A Scaleable Implemenation of Deep Leaning on Spark- Alexander Ulanov
Distributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNet
Amazon Web Services
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Databricks
Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)
Julien SIMON
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
Alpine Data
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
Spark Summit
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache Spark
Martin Zapletal
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Jen Aman
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Databricks
Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
Databricks
Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications
Intel Nervana
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
Ganesan Narayanasamy
eam2
eam2
butest
StackNet Meta-Modelling framework
StackNet Meta-Modelling framework
Sri Ambati
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Databricks
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Kundjanasith Thonglek
Computer Vision for Beginners
Computer Vision for Beginners
Sanghamitra Deb
Sparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with Spark
felixcss
C3 w1
C3 w1
Ajay Taneja
Clustering
Clustering
Meme Hei
Similaire à A Scaleable Implemenation of Deep Leaning on Spark- Alexander Ulanov
(20)
Distributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNet
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache Spark
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
eam2
eam2
StackNet Meta-Modelling framework
StackNet Meta-Modelling framework
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Computer Vision for Beginners
Computer Vision for Beginners
Sparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with Spark
C3 w1
C3 w1
Clustering
Clustering
Plus de Spark Summit
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
Spark Summit
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
Plus de Spark Summit
(20)
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Dernier
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
sonikadigital1
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Guido X Jansen
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
sanjaymuralee1
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
JasonViviers2
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
DataArchiva
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
Pavel Šabatka
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
aigil2
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
Vladislav Solodkiy
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
Data & Analytics Magazin
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
Giorgio Carbone
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
Becky Burwell
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
sonikadigital1
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
Venkatasubramani13
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
DwiAyuSitiHartinah
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Aggregage
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
PrithaVashisht1
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
ajayrajaganeshkayala
Dernier
(17)
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
A Scaleable Implemenation of Deep Leaning on Spark- Alexander Ulanov
1.
© Copyright 2013
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. A Scalable Implementation of Deep Learning on Spark Alexander Ulanov 1 Joint work with Xiangrui Meng2, Bert Greevenbosch3 With the help from Guoqiang Li4, Andrey Simanovsky1 1Hewlett-Packard Labs 2Databricks 3Huawei & Jules Energy 4Spark community
2.
© Copyright 2013
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.2 Outline • Artificial neural network basics • Implementation of Multilayer Perceptron (MLP) in Spark • Optimization & parallelization • Experiments • Future work
3.
© Copyright 2013
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.3 Artificial neural network Basics • Statistical model that approximates a function of multiple inputs • Consists of interconnected “neurons” which exchange messages – “Neuron” produces an output by applying a transformation function on its inputs • Network with more than 3 layers of neurons is called “deep”, instance of deep learning Layer types & learning • A layer type is defined by a transformation function – Affine: 𝑦𝑗 = 𝒘𝒊𝒋 ∙ 𝑥𝑖 + 𝑏𝑗, Sigmoid: 𝑦𝑖 = 1 + 𝑒−𝑥 𝑖 −1 , Convolution, Softmax, etc. • Multilayer perceptron (MLP) – a network with several pairs of Affine & Sigmoid layers • Model parameters – weights that “neurons” use for transformations • Parameters are iteratively estimated with the backpropagation algorithm Multilayer perceptron • Speech recognition (phoneme classification), computer vision 𝑥 𝑦 input output hidden layer
4.
© Copyright 2013
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.4 Example of MLP in Spark Handwritten digits recognition • Dataset MNIST [LeCun et al. 1998] • 28x28 greyscale images of handwritten digits 0-9 • MLP with 784 inputs, 10 outputs and two hidden layers of 300 and 100 neurons val digits: DataFrame = sqlContext.read.format("libsvm").load("/data/mnist") val mlp = new MultilayerPerceptronClassifier() .setLayers(Array(784, 300, 100, 10)) .setBlockSize(128) val model = mlp.fit(digits) 784 inputs 300 neurons 100 neurons 10 neurons 1st hidden layer 2nd hidden layer Output layer digits = sqlContext.read.format("libsvm").load("/data/mnist") mlp = MultilayerPerceptronClassifier(layers=[784, 300, 100, 10], blockSize=128) model = mlp.fit(digits) Scala Python
5.
© Copyright 2013
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.5 Pipeline with PCA+MLP in Spark val digits: DataFrame = sqlContext.read.format(“libsvm”).load(“/data/mnist”) val pca = new PCA() .setInputCol(“features”) .setK(20) .setOutPutCol(“features20”) val mlp = new MultilayerPerceptronClassifier() .setFeaturesCol(“features20”) .setLayers(Array(20, 50, 10)) .setBlockSize(128) val pipeline = new Pipeline() .setStages(Array(pca, mlp)) val model = pipeline.fit(digits) digits = sqlContext.read.format("libsvm").load("/data/mnist8m") pca = PCA(inputCol="features", k=20, outputCol="features20") mlp = MultilayerPerceptronClassifier(featuresCol="features20", layers=[20, 50, 10], blockSize=128) pipeline = Pipeline(stages=[pca, mlp]) model = pipeline.fit(digits) Scala Python
6.
© Copyright 2013
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.6 MLP implementation in Spark Requirements • Conform to Spark APIs • Extensible interface (deep learning API) • Efficient and scalable (single node & cluster) Why conform to Spark APIs? • Spark can call any Java, Python or Scala library, not necessary designed for Spark – Results with expensive data movement from Spark RDD to the library – Prohibits from using for Spark ML Pipelines Extensible interface • Our implementation processes each layer as a black box with backpropagation in general form – Allows further introduction of new layers and features • CNN, Autoencoder, RBM are currently under dev. by community
7.
© Copyright 2013
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.7 Efficiency Batch processing • Layer’s affine transformations can be represented in vector form: 𝒚 = 𝑊 𝑇 𝒙 + 𝒃 – 𝒚 – output from the layer, vector of size 𝑛 – 𝑊 – the matrix of layer weights 𝑚 × 𝑛 , 𝒃 – bias, vector of size 𝑚 – 𝒙 – input to the layer, vector of size 𝑚 • Vector-matrix multiplications are not as efficient as matrix-matrix – Stack 𝑠 input vectors (into batch) to perform matrices multiplication: 𝒀 = 𝑊 𝑇 𝑿 + 𝑩 – 𝑿 is 𝑚 × 𝑠 , 𝒀 is 𝑛 × 𝑠 , – 𝑩 is 𝑛 × 𝑠 , each column contains a copy of 𝒃 • We implemented batch processing in matrix form – Enabled the use of optimized native BLAS libraries – Memory is reused to limit GC overhead = * + = * +
8.
© Copyright 2013
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8 1.00E-04 1.00E-03 1.00E-02 1.00E-01 1.00E+00 1.00E+01 1.00E+02 1.00E+03 1.00E+04 (1x1)*(1x1) (10x10)*(10x1) (10x10)*(10x10) (100x100)*(100x1) (100x100)*(100x10) (100x100)*(100x100) (1000x1000)*(1000x100) (1000x1000)*(1000x1000) (10000x10000)*(10000x1000) (10000x10000)*(10000x10000) dgemm performance netlib-NVBLAS netlib-MKL netlib OpenBLAS netlib-f2jblas Single node BLAS BLAS in Spark • BLAS – Basic Linear Algebra Subprograms • Hardware optimized native in C & Fortran – CPU: MKL, OpenBLAS etc. – GPU: NVBLAS (F-BLAS interface to CUDA) • Use in Spark through Netlib-java Experiments • Huge benefit from native BLAS vs pure Java f2jblas • GPU is faster (2x) only for large matrices – When compute is larger than copy to/from GPU • More details: – https://github.com/avulanov/scala-blas – “linalg: Matrix Computations in Apache Spark” Reza et al., 2015 CPU: 2x Xeon X5650 @ 2.67GHz, 32GB RAM GPU: Tesla M2050 3GB, 575MHz, 448 CUDA cores seconds Matrices size
9.
© Copyright 2013
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.9 Scalability Parallelization • Each iteration 𝑘, each node 𝑖 – 1. Gets parameters 𝑤 𝑘 from master – 2. Computes a gradient 𝛻𝑖 𝑘 𝐹(𝑑𝑎𝑡𝑎𝑖) – 3. Sends a gradient to master – 4. Master computes 𝑤 𝑘+1 based on gradients • Gradient type – Batch – process all data on each iteration – Stochastic – random point – Mini-batch – random batch • How many workers to use? – Less workers – less compute – More workers – more communication 𝑤 𝑘 𝑤 𝑘+1 ≔ 𝑌 𝛻𝑖 𝑘 𝐹 Master Executor 1 Executor N Partition 1 Partition 2 Partition P Executor 1 Executor N V V v 𝛻1 𝑘 𝐹(𝑑𝑎𝑡𝑎1) 𝛻 𝑁 𝑘 𝐹(𝑑𝑎𝑡𝑎 𝑁) 𝛻1 𝑘 𝐹 Master Executor 1 Executor N Master V V v 1. 2. 3. 4. GoTo #1
10.
© Copyright 2013
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.10 Communication and computation trade-off Parallelization of batch gradient • There are 𝑑 data points, 𝑓 features and 𝑘 classes – Assume, we want to train logistic regression, it has 𝑓𝑘 parameters • Communication: 𝑛 workers get/receive 𝑓𝑘 64 bit parameters through the network with bandwidth 𝑏 and software overhead 𝑐. Use all-reduce: – 𝑡 𝑐𝑚 = 2 64𝑓𝑘 𝑏 + 𝑐 log2 𝑛 • Computation: each worker has 𝑝 FLOPS and processes 𝑑 𝑛 of data, that needs 𝑓𝑘 operations – 𝑡 𝑐𝑝~ 𝑑 𝑛 𝑓𝑘 𝑝 • What is the optimal number of workers? – min 𝑛 𝑡 𝑐𝑚 + 𝑡 𝑐𝑝 ⇒ 𝑛 = 𝑚𝑎𝑥 𝑑𝑓𝑘 ln 2 𝑝 128𝑓𝑘 𝑏+2𝑐 , 1 – 𝑚𝑎𝑥 𝑑∙𝑤∙ln 2 𝑝 128𝑤 𝑏+2𝑐 , 1 , if 𝑤 is the number of model parameters and floating point operations
11.
© Copyright 2013
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.11 Analysis of the trade-off Optimal number of workers for batch gradient • Parallelism in a cluster – 𝑛 = 𝑚𝑎𝑥 𝑑∙𝑤∙ln 2 𝑝 128𝑤 𝑏+2𝑐 , 1 • Analysis – More FLOPS 𝑝 means lower degree of batch gradient parallelism in a cluster – More operations, i.e. more features and classes 𝑤 = 𝑓𝑘 (or a deep network) means higher degree – Small 𝑐 overhead for get/receive a message means higher degree • Example: MNIST8M handwritten digit recognition dataset – 8.1M documents, 784 features, 10 classes, logistic regression – 32GFlops double precision CPU, 1Gbit network, overhead ~ 0.1s – 𝑛 = 𝑚𝑎𝑥 8.1𝑀∙784∙10∙0.69 32𝐺 128∙784∙10 1𝐺+2∙0.1 , 1 = 6
12.
© Copyright 2013
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.12 0 20 40 60 80 100 0 1 2 3 4 5 6 Spark MLP vs Caffe MLP MLP (total) MLP (compute) Caffe CPU Caffe GPU Scalability testing Setup • MNIST character recognition 60K samples • 6-layer MLP (784,2500,2000,1500,1000,500,10) • 12M parameters • CPU: Xeon X5650 @ 2.67GHz • GPU: Tesla M2050 3GB, 575MHz • Caffe (Deep Learning from Berkeley): 1 node • Spark: 1 master + 5 workers Results per iteration • Single node (both tools double precision) – 1.6 slower than Caffe CPU (Scala vs C++) • Scalability – 5 nodes give 4.7x speedup, beats Caffe, close to GPU Seconds Workers Communication cost 𝑛 = 𝑚𝑎𝑥 60𝐾 ∙ 12𝑀 ∙ 0.69 64𝐺 128 ∙ 12𝑀 950𝑀 + 2 ∙ 0.1 , 1 = 𝟒
13.
© Copyright 2013
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.13 Conclusions & future work Conclusions • Scalable multilayer perceptron is available in Spark 1.5.0 • Extensible internal API for Artificial Neural Networks – Further contributions are welcome! • Native BLAS (and GPU) speeds up Spark • Heuristics for parallelization of batch gradient Work in progress [SPARK-5575] • Autoencoder(s) • Restricted Boltzmann Machines • Drop-out • Convolutional neural networks Future work • SGD & parameter server
14.
© Copyright 2013
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Thank you
Télécharger maintenant