SlideShare une entreprise Scribd logo
1  sur  61
Télécharger pour lire hors ligne
Spark Meetup @ Netflix, 05/19/2015
Spark and GraphX in the Netflix
Recommender System
Ehtsham Elahi and Yves Raimond
(@EhtshamElahi) (@moustaki)
Algorithms Engineering
Netflix
Machine Learning @ Netflix
Recommendations @ Netflix
● Goal: Help members find
content that they’ll enjoy
to maximize satisfaction
and retention
● Core part of product
○ Every impression is a
recommendation
5
▪ Regression (Linear, logistic, elastic net)
▪ SVD and other Matrix Factorizations
▪ Factorization Machines
▪ Restricted Boltzmann Machines
▪ Deep Neural Networks
▪ Markov Models and Graph Algorithms
▪ Clustering
▪ Latent Dirichlet Allocation
▪ Gradient Boosted Decision Trees/Random Forests
▪ Gaussian Processes
▪ …
Models & Algorithms
Main Challenge - Scale
● Algorithms @ Netflix Scale
○ > 62 M Members
○ > 50 Countries
○ > 1000 device types
○ > 100M Hours / day
● Can distributed Machine
Learning algorithms help with
Scale?
Spark and GraphX
Spark and GraphX
● Spark - Distributed in-memory computational engine
using Resilient Distributed Datasets (RDDs)
● GraphX - extends RDDs to Multigraphs and provides
graph analytics
● Convenient and fast, all the way from prototyping
(spark-notebook, iSpark, Zeppelin) to production
Two Machine Learning Problems
● Generate ranking of items with respect to a given item
from an interaction graph
○ Graph Diffusion algorithms (e.g. Topic Sensitive Pagerank)
● Find Clusters of related items using co-occurrence data
○ Probabilistic Graphical Models (Latent Dirichlet Allocation)
Iterative Algorithms in GraphX
v1
v2v3
v4
v6
v7Vertex Attribute
Edge Attribute
Iterative Algorithms in GraphX
v1
v2v3
v4
v6
v7Vertex Attribute
Edge Attribute
GraphX represents the
graph as RDDs. e.g.
VertexRDD, EdgeRDD
Iterative Algorithms in GraphX
v1
v2v3
v4
v6
v7Vertex Attribute
Edge Attribute
GraphX provides APIs
to propagate and
update attributes
Iterative Algorithms in GraphX
v1
v2v3
v4
v6
v7Vertex Attribute
Edge Attribute
Iterative Algorithm
proceeds by creating
updated graphs
Graph Diffusion algorithms
● Popular graph diffusion algorithm
● Capturing vertex importance with regards to a particular
vertex
● e.g. for the topic “Seattle”
Topic Sensitive Pagerank @ Netflix
Iteration 0
We start by
activating a single
node
“Seattle”
related to
shot in
featured in
related to
cast
cast
cast
related to
Iteration 1
With some probability,
we follow outbound
edges, otherwise we
go back to the origin.
Iteration 2
Vertex accumulates
higher mass
Iteration 2
And again, until
convergence
GraphX implementation
● Running one propagation for each possible starting
node would be slow
● Keep a vector of activation probabilities at each vertex
● Use GraphX to run all propagations in parallel
Topic Sensitive Pagerank in GraphX
activation probability,
starting from vertex 1
activation probability,
starting from vertex 2
activation probability,
starting from vertex 3
...
Activation probabilities
as vertex attributes
...
...
... ...
...
...
Example graph diffusion results
“Matrix”
“Zombies”
“Seattle”
Distributed Clustering algorithms
LDA @ Netflix
● A popular clustering/latent factors model
● Discovers clusters/topics of related videos from Netflix
data
● e.g, a topic of Animal Documentaries
LDA - Graphical Model
Per-topic word
distributions
Per-document topic
distributions
Topic label for
document d and word w
LDA - Graphical Model
Question: How to parallelize inference?
LDA - Graphical Model
Question: How to parallelize inference?
Answer: Read conditional independencies
in the model
Gibbs Sampler 1 (Semi Collapsed)
Gibbs Sampler 1 (Semi Collapsed)
Sample Topic Labels in a given document Sequentially
Sample Topic Labels in different documents In parallel
Gibbs Sampler 2 (UnCollapsed)
Gibbs Sampler 2 (UnCollapsed)
Sample Topic Labels in a given document In parallel
Sample Topic Labels in different documents In parallel
Gibbs Sampler 2 (UnCollapsed)
Suitable For GraphX
Sample Topic Labels in a given document In parallel
Sample Topic Labels in different documents In parallel
Distributed Gibbs Sampler
w1
w2
w3
d1
d2
0.3
0.4
0.1
0.3
0.2
0.8
0.4
0.4
0.1
0.3 0.6 0.1
0.2 0.5 0.3
A distributed parameterized graph for
LDA with 3 Topics
Distributed Gibbs Sampler
w1
w2
w3
d1
d2
0.3
0.4
0.1
0.3
0.2
0.8
0.4
0.4
0.1
0.3 0.6 0.1
0.2 0.5 0.3
A distributed parameterized graph for
LDA with 3 Topics
document
Distributed Gibbs Sampler
w1
w2
w3
d1
d2
0.3
0.4
0.1
0.3
0.2
0.8
0.4
0.4
0.1
0.3 0.6 0.1
0.2 0.5 0.3
A distributed parameterized graph for
LDA with 3 Topics
word
Distributed Gibbs Sampler
w1
w2
w3
d1
d2
0.3
0.4
0.1
0.3
0.2
0.8
0.4
0.4
0.1
0.3 0.6 0.1
0.2 0.5 0.3
A distributed parameterized graph for
LDA with 3 Topics
Edge: if word appeared
in the document
Distributed Gibbs Sampler
w1
w2
w3
d1
d2
0.3
0.4
0.1
0.3
0.2
0.8
0.4
0.4
0.1
0.3 0.6 0.1
0.2 0.5 0.3
A distributed parameterized graph for
LDA with 3 Topics
Per-document topic
distribution
Distributed Gibbs Sampler
w1
w2
w3
d1
d2
0.3
0.4
0.1
0.3
0.2
0.8
0.4
0.4
0.1
0.3 0.6 0.1
0.2 0.5 0.3
A distributed parameterized graph for
LDA with 3 Topics
Per-topic word
distributions
Distributed Gibbs Sampler
w1
w2
w3
d1
d2
0.3
0.4
0.1
0.3
0.2
0.8
0.4
0.4
0.1
0.3 0.6 0.1
0.2 0.5 0.3
(vertex, edge, vertex) = triplet
Distributed Gibbs Sampler
w1
w2
w3
d1
d2
0.3
0.4
0.1
0.3
0.2
0.8
0.4
0.4
0.1
0.3 0.6 0.1
0.2 0.5 0.3
Categorical distribution
for the triplet using
vertex attributes
Distributed Gibbs Sampler
w1
w2
w3
d1
d2
0.3
0.4
0.1
0.3
0.2
0.8
0.4
0.4
0.1
0.3 0.6 0.1
0.2 0.5 0.3
Categorical distributions for
all triplets
Distributed Gibbs Sampler
w1
w2
w3
d1
d2
0.3
0.4
0.1
0.3
0.2
0.8
0.4
0.4
0.1
0.3 0.6 0.1
0.2 0.5 0.3
1
1
2
0
Sample Topics for all edges
Distributed Gibbs Sampler
w1
w2
w3
d1
d2
0
1
0
0
1
1
1
0
0
0 2 0
1 0 1
1
1
2
0
Neighborhood aggregation for topic
histograms
Distributed Gibbs Sampler
w1
w2
w3
d1
d2
0.1
0.4
0.3
0.1
0.4
0.4
0.8
0.2
0.3
0.1 0.8 0.1
0.45 0.1 0.45
Realize samples from Dirichlet to
update the graph
Example LDA Results
Cluster of Bollywood
Movies
Cluster of Kids shows
Cluster of Western
movies
GraphX performance comparison
Algorithm Implementations
● Topic Sensitive Pagerank
○ Distributed GraphX implementation
○ Alternative Implementation: Broadcast graph adjacency matrix,
Scala/Breeze code, triggered by Spark
● LDA
○ Distributed GraphX implementation
○ Alternative Implementation: Single machine, Multi-threaded Java code
● All implementations are Netflix internal code
Performance Comparison
Performance Comparison
Open Source DBPedia
dataset
Performance Comparison
Sublinear rise in time
with GraphX Vs Linear
rise in the Alternative
Performance Comparison
Doubling the size of cluster:
2.0 speedup in the Alternative
Impl Vs 1.2 in GraphX
Performance Comparison
Large number of
vertices propagated in
parallel lead to large
shuffle data, causing
failures in GraphX for
small clusters
Performance Comparison
Netflix dataset
Number of Topics = 100
Performance Comparison
GraphX setup:
8 x Resources than the
Multi-Core setup
Performance Comparison
Wikipedia dataset, 100
Topic LDA
Cluster: (16 x r3.2xl)
(source: Databricks)
Performance Comparison
GraphX for very large datasets
outperforms the multi-core
unCollapsed Impl
Lessons Learned
What we learned so far...
● Where is the cross-over point for your iterative ML
algorithm?
○ GraphX brings performance benefits if you’re on the right side of that
point
○ GraphX lets you easily throw more hardware at a problem
● GraphX very useful (and fast) for other graph
processing tasks
○ Data pre-processing
○ Efficient joins
What we learned so far ...
● Regularly save the state
○ With a 99.9% success rate, what’s the probability of successfully
running 1,000 iterations?
● Multi-Core Machine learning (r3.8xl, 32 threads, 220
GB) is very efficient
○ if your data fits in memory of single machine !
What we learned so far ...
● Regularly save the state
○ With a 99.9% success rate, what’s the probability of successfully
running 1,000 iterations?
○ ~36%
● Multi-Core Machine learning (r3.8xl, 32 threads, 220
GB) is very efficient
○ if your data fits in memory of single machine !
We’re hiring!
(come talk to us)
https://jobs.netflix.com/

Contenu connexe

Tendances

Apache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
Apache Spark MLlib's Past Trajectory and New Directions with Joseph BradleyApache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
Apache Spark MLlib's Past Trajectory and New Directions with Joseph BradleyDatabricks
 
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold XinUnifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold XinDatabricks
 
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéJen Aman
 
What’s New in the Berkeley Data Analytics Stack
What’s New in the Berkeley Data Analytics StackWhat’s New in the Berkeley Data Analytics Stack
What’s New in the Berkeley Data Analytics StackTuri, Inc.
 
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...Databricks
 
Designing Distributed Machine Learning on Apache Spark
Designing Distributed Machine Learning on Apache SparkDesigning Distributed Machine Learning on Apache Spark
Designing Distributed Machine Learning on Apache SparkDatabricks
 
Deep Learning with MXNet - Dmitry Larko
Deep Learning with MXNet - Dmitry LarkoDeep Learning with MXNet - Dmitry Larko
Deep Learning with MXNet - Dmitry LarkoSri Ambati
 
Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infr...
Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infr...Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infr...
Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infr...Turi, Inc.
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Databricks
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Spark Summit
 
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RSpark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RDatabricks
 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeFlink Forward
 
Large Scale Machine learning with Spark
Large Scale Machine learning with SparkLarge Scale Machine learning with Spark
Large Scale Machine learning with SparkMd. Mahedi Kaysar
 
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaReal-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaSpark Summit
 
Apache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new DirectionsApache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new DirectionsDatabricks
 
Distributed processing of large graphs in python
Distributed processing of large graphs in pythonDistributed processing of large graphs in python
Distributed processing of large graphs in pythonJose Quesada (hiring)
 
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATLParikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATLMLconf
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OSri Ambati
 
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...Spark Summit
 

Tendances (20)

Apache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
Apache Spark MLlib's Past Trajectory and New Directions with Joseph BradleyApache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
Apache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
 
Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark MeetupDistributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
 
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold XinUnifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
 
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher Ré
 
What’s New in the Berkeley Data Analytics Stack
What’s New in the Berkeley Data Analytics StackWhat’s New in the Berkeley Data Analytics Stack
What’s New in the Berkeley Data Analytics Stack
 
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
 
Designing Distributed Machine Learning on Apache Spark
Designing Distributed Machine Learning on Apache SparkDesigning Distributed Machine Learning on Apache Spark
Designing Distributed Machine Learning on Apache Spark
 
Deep Learning with MXNet - Dmitry Larko
Deep Learning with MXNet - Dmitry LarkoDeep Learning with MXNet - Dmitry Larko
Deep Learning with MXNet - Dmitry Larko
 
Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infr...
Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infr...Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infr...
Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infr...
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
 
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RSpark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
 
Large Scale Machine learning with Spark
Large Scale Machine learning with SparkLarge Scale Machine learning with Spark
Large Scale Machine learning with Spark
 
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaReal-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
 
Apache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new DirectionsApache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new Directions
 
Distributed processing of large graphs in python
Distributed processing of large graphs in pythonDistributed processing of large graphs in python
Distributed processing of large graphs in python
 
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATLParikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2O
 
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
 

En vedette

Spark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingSpark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingPetr Zapletal
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXKrishna Sankar
 
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)Ankur Dave
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectiveJustin Basilico
 
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...Dawen Liang
 
(Some) pitfalls of distributed learning
(Some) pitfalls of distributed learning(Some) pitfalls of distributed learning
(Some) pitfalls of distributed learningYves Raimond
 
Balancing Discovery and Continuation in Recommendations
Balancing Discovery and Continuation in RecommendationsBalancing Discovery and Continuation in Recommendations
Balancing Discovery and Continuation in RecommendationsMohammad Hossein Taghavi
 

En vedette (7)

Spark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingSpark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, Streaming
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphX
 
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
 
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
 
(Some) pitfalls of distributed learning
(Some) pitfalls of distributed learning(Some) pitfalls of distributed learning
(Some) pitfalls of distributed learning
 
Balancing Discovery and Continuation in Recommendations
Balancing Discovery and Continuation in RecommendationsBalancing Discovery and Continuation in Recommendations
Balancing Discovery and Continuation in Recommendations
 

Similaire à Spark Meetup @ Netflix, 05/19/2015

MLconf seattle 2015 presentation
MLconf seattle 2015 presentationMLconf seattle 2015 presentation
MLconf seattle 2015 presentationehtshamelahi
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriDemi Ben-Ari
 
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Databricks
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Herman Wu
 
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2Anant Corporation
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable PythonTravis Oliphant
 
Multiplaform Solution for Graph Datasources
Multiplaform Solution for Graph DatasourcesMultiplaform Solution for Graph Datasources
Multiplaform Solution for Graph DatasourcesStratio
 
The Future of Computing is Distributed
The Future of Computing is DistributedThe Future of Computing is Distributed
The Future of Computing is DistributedAlluxio, Inc.
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSAmazon Web Services
 
11. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:211. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:2Fabio Fumarola
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learningPaco Nathan
 
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache SparkProject Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache SparkDatabricks
 
Big data distributed processing: Spark introduction
Big data distributed processing: Spark introductionBig data distributed processing: Spark introduction
Big data distributed processing: Spark introductionHektor Jacynycz García
 
Apache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - PanoraysApache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - PanoraysDemi Ben-Ari
 
Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationRob Emanuele
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignMichael Noll
 
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19Sujit Pal
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onDony Riyanto
 

Similaire à Spark Meetup @ Netflix, 05/19/2015 (20)

MLconf seattle 2015 presentation
MLconf seattle 2015 presentationMLconf seattle 2015 presentation
MLconf seattle 2015 presentation
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
 
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable Python
 
Spark
SparkSpark
Spark
 
04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
 
Multiplaform Solution for Graph Datasources
Multiplaform Solution for Graph DatasourcesMultiplaform Solution for Graph Datasources
Multiplaform Solution for Graph Datasources
 
The Future of Computing is Distributed
The Future of Computing is DistributedThe Future of Computing is Distributed
The Future of Computing is Distributed
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
11. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:211. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:2
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learning
 
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache SparkProject Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
 
Big data distributed processing: Spark introduction
Big data distributed processing: Spark introductionBig data distributed processing: Spark introduction
Big data distributed processing: Spark introduction
 
Apache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - PanoraysApache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - Panorays
 
Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis Presentation
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
 
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 

Plus de Yves Raimond

Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsYves Raimond
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsYves Raimond
 
Recommending for the World
Recommending for the WorldRecommending for the World
Recommending for the WorldYves Raimond
 
Utilisation du Web Semantique pour les sites de la BBC
Utilisation du Web Semantique pour les sites de la BBCUtilisation du Web Semantique pour les sites de la BBC
Utilisation du Web Semantique pour les sites de la BBCYves Raimond
 
Linked Data on the BBC
Linked Data on the BBCLinked Data on the BBC
Linked Data on the BBCYves Raimond
 
Publishing and interlinking music-related data on the Web
Publishing and interlinking music-related data on the WebPublishing and interlinking music-related data on the Web
Publishing and interlinking music-related data on the WebYves Raimond
 
Linked data and applications
Linked data and applicationsLinked data and applications
Linked data and applicationsYves Raimond
 
Towards a musical Semantic Web
Towards a musical Semantic WebTowards a musical Semantic Web
Towards a musical Semantic WebYves Raimond
 

Plus de Yves Raimond (10)

Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Recommending for the World
Recommending for the WorldRecommending for the World
Recommending for the World
 
Paris ML meetup
Paris ML meetupParis ML meetup
Paris ML meetup
 
Utilisation du Web Semantique pour les sites de la BBC
Utilisation du Web Semantique pour les sites de la BBCUtilisation du Web Semantique pour les sites de la BBC
Utilisation du Web Semantique pour les sites de la BBC
 
Linked Data on the BBC
Linked Data on the BBCLinked Data on the BBC
Linked Data on the BBC
 
Publishing and interlinking music-related data on the Web
Publishing and interlinking music-related data on the WebPublishing and interlinking music-related data on the Web
Publishing and interlinking music-related data on the Web
 
Linked data and applications
Linked data and applicationsLinked data and applications
Linked data and applications
 
Web of data
Web of dataWeb of data
Web of data
 
Towards a musical Semantic Web
Towards a musical Semantic WebTowards a musical Semantic Web
Towards a musical Semantic Web
 

Dernier

IT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptxIT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptxSAJITHABANUS
 
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchrohitcse52
 
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docx
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docxSUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docx
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docxNaveenVerma126
 
دليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratory
دليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratoryدليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratory
دليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide LaboratoryBahzad5
 
Lifting Plan | Lifting Plan for Different Process Equipment | Gaurav Singh Ra...
Lifting Plan | Lifting Plan for Different Process Equipment | Gaurav Singh Ra...Lifting Plan | Lifting Plan for Different Process Equipment | Gaurav Singh Ra...
Lifting Plan | Lifting Plan for Different Process Equipment | Gaurav Singh Ra...Gaurav Singh Rajput
 
Quantitative Risk Assessment | QRA | Risk Assessment | Gaurav Singh Rajput
Quantitative Risk Assessment | QRA | Risk Assessment | Gaurav Singh RajputQuantitative Risk Assessment | QRA | Risk Assessment | Gaurav Singh Rajput
Quantitative Risk Assessment | QRA | Risk Assessment | Gaurav Singh RajputGaurav Singh Rajput
 
Modelling Guide for Timber Structures - FPInnovations
Modelling Guide for Timber Structures - FPInnovationsModelling Guide for Timber Structures - FPInnovations
Modelling Guide for Timber Structures - FPInnovationsYusuf Yıldız
 
ingles nivel 3 ucv 2024 - modulo 3 _ppt2
ingles nivel 3 ucv 2024 - modulo 3 _ppt2ingles nivel 3 ucv 2024 - modulo 3 _ppt2
ingles nivel 3 ucv 2024 - modulo 3 _ppt2nhuayllav
 
How to Write a Good Scientific Paper.pdf
How to Write a Good Scientific Paper.pdfHow to Write a Good Scientific Paper.pdf
How to Write a Good Scientific Paper.pdfRedhwan Qasem Shaddad
 
Nodal seismic construction requirements.pptx
Nodal seismic construction requirements.pptxNodal seismic construction requirements.pptx
Nodal seismic construction requirements.pptxwendy cai
 
Vertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptx
Vertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptxVertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptx
Vertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptxLMW Machine Tool Division
 
GENERAL CONDITIONS FOR CONTRACTS OF CIVIL ENGINEERING WORKS
GENERAL CONDITIONS  FOR  CONTRACTS OF CIVIL ENGINEERING WORKS GENERAL CONDITIONS  FOR  CONTRACTS OF CIVIL ENGINEERING WORKS
GENERAL CONDITIONS FOR CONTRACTS OF CIVIL ENGINEERING WORKS Bahzad5
 
A Seminar on Electric Vehicle Software Simulation
A Seminar on Electric Vehicle Software SimulationA Seminar on Electric Vehicle Software Simulation
A Seminar on Electric Vehicle Software SimulationMohsinKhanA
 
Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...
Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...
Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...amrabdallah9
 
Basic Principle of Electrochemical Sensor
Basic Principle of  Electrochemical SensorBasic Principle of  Electrochemical Sensor
Basic Principle of Electrochemical SensorTanvir Moin
 
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...Amil baba
 
OS Services, System call, Virtual Machine
OS Services, System call, Virtual MachineOS Services, System call, Virtual Machine
OS Services, System call, Virtual MachineDivya S
 
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdf
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdfSummer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdf
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdfNaveenVerma126
 
Engineering Mechanics Chapter 5 Equilibrium of a Rigid Body
Engineering Mechanics  Chapter 5  Equilibrium of a Rigid BodyEngineering Mechanics  Chapter 5  Equilibrium of a Rigid Body
Engineering Mechanics Chapter 5 Equilibrium of a Rigid BodyAhmadHajasad2
 

Dernier (20)

IT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptxIT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptx
 
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
 
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docx
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docxSUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docx
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docx
 
دليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratory
دليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratoryدليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratory
دليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratory
 
Lifting Plan | Lifting Plan for Different Process Equipment | Gaurav Singh Ra...
Lifting Plan | Lifting Plan for Different Process Equipment | Gaurav Singh Ra...Lifting Plan | Lifting Plan for Different Process Equipment | Gaurav Singh Ra...
Lifting Plan | Lifting Plan for Different Process Equipment | Gaurav Singh Ra...
 
Quantitative Risk Assessment | QRA | Risk Assessment | Gaurav Singh Rajput
Quantitative Risk Assessment | QRA | Risk Assessment | Gaurav Singh RajputQuantitative Risk Assessment | QRA | Risk Assessment | Gaurav Singh Rajput
Quantitative Risk Assessment | QRA | Risk Assessment | Gaurav Singh Rajput
 
Modelling Guide for Timber Structures - FPInnovations
Modelling Guide for Timber Structures - FPInnovationsModelling Guide for Timber Structures - FPInnovations
Modelling Guide for Timber Structures - FPInnovations
 
ingles nivel 3 ucv 2024 - modulo 3 _ppt2
ingles nivel 3 ucv 2024 - modulo 3 _ppt2ingles nivel 3 ucv 2024 - modulo 3 _ppt2
ingles nivel 3 ucv 2024 - modulo 3 _ppt2
 
How to Write a Good Scientific Paper.pdf
How to Write a Good Scientific Paper.pdfHow to Write a Good Scientific Paper.pdf
How to Write a Good Scientific Paper.pdf
 
Nodal seismic construction requirements.pptx
Nodal seismic construction requirements.pptxNodal seismic construction requirements.pptx
Nodal seismic construction requirements.pptx
 
Vertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptx
Vertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptxVertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptx
Vertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptx
 
計劃趕得上變化
計劃趕得上變化計劃趕得上變化
計劃趕得上變化
 
GENERAL CONDITIONS FOR CONTRACTS OF CIVIL ENGINEERING WORKS
GENERAL CONDITIONS  FOR  CONTRACTS OF CIVIL ENGINEERING WORKS GENERAL CONDITIONS  FOR  CONTRACTS OF CIVIL ENGINEERING WORKS
GENERAL CONDITIONS FOR CONTRACTS OF CIVIL ENGINEERING WORKS
 
A Seminar on Electric Vehicle Software Simulation
A Seminar on Electric Vehicle Software SimulationA Seminar on Electric Vehicle Software Simulation
A Seminar on Electric Vehicle Software Simulation
 
Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...
Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...
Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...
 
Basic Principle of Electrochemical Sensor
Basic Principle of  Electrochemical SensorBasic Principle of  Electrochemical Sensor
Basic Principle of Electrochemical Sensor
 
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...
 
OS Services, System call, Virtual Machine
OS Services, System call, Virtual MachineOS Services, System call, Virtual Machine
OS Services, System call, Virtual Machine
 
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdf
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdfSummer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdf
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdf
 
Engineering Mechanics Chapter 5 Equilibrium of a Rigid Body
Engineering Mechanics  Chapter 5  Equilibrium of a Rigid BodyEngineering Mechanics  Chapter 5  Equilibrium of a Rigid Body
Engineering Mechanics Chapter 5 Equilibrium of a Rigid Body
 

Spark Meetup @ Netflix, 05/19/2015