SlideShare a Scribd company logo
1 of 37
Download to read offline
A More Scalable Way of Making
Recommendations with MLlib
Xiangrui Meng
Spark Summit 2015
More interested in application than implementation?
iRIS: A Large-Scale Food and Recipe
Recommendation System Using Spark
Joohyun Kim (MyFitnessPal, Under Armour)
3:30 – 4:00 PM
Imperial Ballroom (Level 2)
2
About Databricks
• Founded by Apache spark creators
• Largest contributor to Spark project, committed to
keeping Spark 100% open source
• End-to-end hosted platform
https://www.databricks.com/product/databricks
3
Spark MLlib
Large-scale machine learning on Apache Spark
About MLlib
• Started in UC Berkeley AMPLab
• Shipped with Spark 0.8
• Currently (Spark 1.4)
• Contributions from 50+ organizations, 150+
individuals
• Good coverage of algorithms
5
MLlib’s Mission
MLlib’s mission is to make practical machine
learning easy and scalable.
• Easy to build machine learning applications
• Capable of learning from large-scale datasets
6
7
A Brief History of MLlib
Optim Algo API App
v0.8
GLMs
k-means
ALS
Java/Scalagradient decent
8
A Brief History of MLlib
Optim Algo API App
v0.9
Python
naive Bayes
implicit ALS
9
A Brief History of MLlib
Optim Algo API App
v1.0
decision tree
PCA
SVD
sparse data
L-BFGS
10
A Brief History of MLlib
v1.1
Optim Algo API App
tree reduce
torrent broadcast
statistics
NMF
streaming linear regression
Word2Vec
gap
11
A Brief History of MLlib
v1.2
Optim Algo API App
pipeline
random forest
gradient boosted trees
streaming k-means
12
A Brief History of MLlib
v1.3
Optim Algo API App
ALSv2
latent Dirichlet allocation (LDA)
multinomial logistic regression
Gaussian mixture model (GMM)
distributed block matrix
FP-growth / isotonic regression
power iteration clustering
pipeline in python
model import/export
SparkPackages
13
A Brief History of MLlib
Optim Algo API App
GLMs with elastic-net
online LDA
ALS.recommendAll feature transformers
estimators
Python pipeline API
v1.4
OWL-QN
Alternating Least Squares (ALS)
Collaborative filtering via matrix factorization
15
Collaborative Filtering
items
users
A: a rating matrix
Low-Rank Assumption
• What kind of movies do you like?
• sci-fi / crime / action
Perception of preferences usually takes place in a
low dimensional latent space.
So the rating matrix is approximately low-rank.
16
A ⇡ UV T
, U 2 Rm⇥k
, V 2 Rn⇥k
aij ⇡ uT
i vj
Objective Function
• minimize the reconstruction error
• only check observed ratings
17
minimize
1
2
kA UV T
k2
F
minimize
1
2
X
(i,j)2⌦
(aij uT
i vj)2
Alternating Least Squares (ALS)
• If we fix U, the objective becomes convex and
separable:
• Each sub-problem is a least squares problem, which
can be solved in parallel. So we take alternating
directions to minimize the objective:
• fix U, solve for V;
• fix V, solve for U. 18
minimize
1
2
X
j
0
@
X
i,(i,j)2⌦
(aij uT
i vj)2
1
A
Complexity
• To solve a least squares problem of size n-by-k, we need
O(n k2) time. So the total computation cost is O(nnz k2),
where nnz is the total number of ratings.
• We take the normal equation approach in ALS
• Solving each subproblem requires O(k2) storage. We call
LAPACK’s routine to solve this problem.
19
AT
Ax = AT
b
ALS Implementation in MLlib
How to scale to 100,000,000,000 ratings?
Communication Cost
The most important factor of implementing an
algorithm in parallel is the communication cost.
To make ALS scale to billions of ratings, millions of
users/items, we have to distribute ratings (A), user
factors (U), and item factors (V). How?
• all-to-all
• block-to-block
• …
21
Communication: All-to-All
• users: u1, u2, u3; items: v1, v2, v3, v4
• shuffle size: O(nnz k) (nnz: number of nonzeros, i.e., ratings)
• sending the same factor multiple times
22
Communication: Block-to-Block
• OutBlocks (P1, P2)
• for each item block, which user factors to send
• InBlocks (Q1, Q2)
• for each item, which user factors to use
23
Communication: Block-to-Block
• Shuffle size is significantly reduced.
• We cache two copies of ratings — InBlocks for users and
InBlocks for items.
24
DAG Visualization of an ALS Job
25
ratingBlocks
itemOutBlocks
userInBlocks itemInBlocks
userOutBlocks
itemFactors 0
userFactors 1 itemFactors 1
preparation iterations
Compressed Storage for InBlocks
Array of rating tuples
• huge storage overhead
• high garbage collection (GC) pressure
26
[(v1, u1, a11), (v2, u1, a12), (v1, u2, a21), (v2, u2, a22), (v2, u3, a32)]
Compressed Storage for InBlocks
Three primitive arrays
• low GC pressure
• constructing all sub-problems together
• O(nj k2) storage
27
([v1, v2, v1, v2, v2], [u1, u1, u2, u2, u3], [a11, a12, a21, a22, a32])
Compressed Storage for InBlocks
Primitive arrays with items ordered:
• solving sub-problems in sequence:
• O(k2) storage
• TimSort
28
([v1, v1, v2, v2, v2], [u1, u2, u1, u2, u3], [a11, a21, a12, a22, a32])
Compressed Storage for InBlocks
Compressed items:
• no duplicated items
• map lookup for user factors
29
([v1, v2], [0, 2, 5], [u1, u2, u1, u2, u3], [a11, a21, a12, a22, a32])
Compressed Storage for InBlocks
Store block IDs and local indices instead of user IDs. For example, u3
is the first vector sent from P2.
Encode (block ID, local index) into an integer
• use higher bits for block ID
• use lower bits for local index
• works for ~4 billions of unique users/items

01 | 00 0000 0000 0000
30
([v1, v2], [0, 2, 5], [0|0, 0|1, 0|0, 0|1, 1|0], [a11, a21, a12, a22, a32])
Avoid Garbage Collection
We use specialized code to replace the following:
• initial partitioning of ratings
ratings.map { r =>
((srcPart.getPartition(r.user), dstPart.getPartition(r.item)), r)
}.aggregateByKey(new RatingBlockBuilder)(
seqOp = (b, r) => b.add(r),
combOp = (b0, b1) => b0.merge(b1.build()))
.mapValues(_.build())
• map IDs to local indices
dstIds.toSet.toSeq.sorted.zipWithIndex.toMap
31
Amazon Reviews Dataset
• Amazon Reviews: ~6.6 million users, ~2.2 million items, and ~30 million
ratings
• Tested ALS on stacked copies on a 16-node m3.2xlarge cluster with
rank=10, iter=10:
32
Storage Comparison
33
1.2 1.3/1.4
userInBlock 941MB 277MB
userOutBlock 355MB 65MB
itemInBlock 1380MB 243MB
itemOutBlock 119MB 37MB
Spotify Dataset
• Spotify: 75+ million users and 30+ million songs
• Tested ALS on a subset with ~50 million users, ~5
million songs, and ~50 billion ratings.
• thanks to Chris Johnson and Anders Arpteg
• 32 r3.8xlarge nodes (~$10/hr with spot instances)
• It took 1 hour to finish 10 iterations with rank 10.
• 10 mins to prepare in/out blocks
• 5 mins per iteration
34
ALS Implementation in MLlib
• Save communication by duplicating data
• Efficient storage format
• Watch out for GC
• Native LAPACK calls
35
Future Directions
• Leverage on Project Tungsten to save some
specialized code that avoids GC.
• Solve issues with really popular items.
• Explore other recommendation algorithms, e.g.,
factorization machine.
36
Thank you.
• Spark: http://spark.apache.org
• Databricks: http://databricks.com

More Related Content

What's hot

Generalized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRGeneralized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRDatabricks
 
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)Spark Summit
 
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...Databricks
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Spark Summit
 
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S OptimizerDeep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S OptimizerSpark Summit
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkCloudera, Inc.
 
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15MLconf
 
Large-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache SparkLarge-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache SparkDB Tsai
 
Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Ram Sriharsha
 
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkYggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkJen Aman
 
Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Ram Sriharsha
 
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Spark Summit
 
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...Spark Summit
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkAndy Petrella
 
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15MLconf
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Jen Aman
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLFlink Forward
 
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo InterlandiLazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo InterlandiDatabricks
 
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui MengChallenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui MengDatabricks
 

What's hot (20)

Generalized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRGeneralized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkR
 
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
 
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
 
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S OptimizerDeep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache Spark
 
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
 
Large-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache SparkLarge-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache Spark
 
Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016
 
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkYggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
 
Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017
 
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
 
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache Spark
 
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
 
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo InterlandiLazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
 
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui MengChallenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
 

Similar to A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Databricks)

February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNNJunho Cho
 
L3-.pptx
L3-.pptxL3-.pptx
L3-.pptxasdq4
 
SPARKNaCl: A verified, fast cryptographic library
SPARKNaCl: A verified, fast cryptographic librarySPARKNaCl: A verified, fast cryptographic library
SPARKNaCl: A verified, fast cryptographic libraryAdaCore
 
Correctness and Performance of Apache Spark SQL with Bogdan Ghit and Nicolas ...
Correctness and Performance of Apache Spark SQL with Bogdan Ghit and Nicolas ...Correctness and Performance of Apache Spark SQL with Bogdan Ghit and Nicolas ...
Correctness and Performance of Apache Spark SQL with Bogdan Ghit and Nicolas ...Databricks
 
Correctness and Performance of Apache Spark SQL
Correctness and Performance of Apache Spark SQLCorrectness and Performance of Apache Spark SQL
Correctness and Performance of Apache Spark SQLNicolas Poggi
 
Python image processing_Python image processing.pptx
Python image processing_Python image processing.pptxPython image processing_Python image processing.pptx
Python image processing_Python image processing.pptxshashikant484397
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsDatabricks
 
Face detection ppt by Batyrbek
Face detection ppt by Batyrbek Face detection ppt by Batyrbek
Face detection ppt by Batyrbek Batyrbek Ryskhan
 
Spark Streaming Tips for Devs and Ops by Fran perez y federico fernández
Spark Streaming Tips for Devs and Ops by Fran perez y federico fernándezSpark Streaming Tips for Devs and Ops by Fran perez y federico fernández
Spark Streaming Tips for Devs and Ops by Fran perez y federico fernándezJ On The Beach
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupGwen (Chen) Shapira
 
IOEfficientParalleMatrixMultiplication_present
IOEfficientParalleMatrixMultiplication_presentIOEfficientParalleMatrixMultiplication_present
IOEfficientParalleMatrixMultiplication_presentShubham Joshi
 
Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Christian Peel
 
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...Databricks
 
Intro_OpenCV.ppt
Intro_OpenCV.pptIntro_OpenCV.ppt
Intro_OpenCV.pptRithikRaj25
 
Intel realtime analytics_spark
Intel realtime analytics_sparkIntel realtime analytics_spark
Intel realtime analytics_sparkGeetanjali G
 

Similar to A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Databricks) (20)

February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 
MATLAB & Image Processing
MATLAB & Image ProcessingMATLAB & Image Processing
MATLAB & Image Processing
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNN
 
L3-.pptx
L3-.pptxL3-.pptx
L3-.pptx
 
SPARKNaCl: A verified, fast cryptographic library
SPARKNaCl: A verified, fast cryptographic librarySPARKNaCl: A verified, fast cryptographic library
SPARKNaCl: A verified, fast cryptographic library
 
Lp seminar
Lp seminarLp seminar
Lp seminar
 
Correctness and Performance of Apache Spark SQL with Bogdan Ghit and Nicolas ...
Correctness and Performance of Apache Spark SQL with Bogdan Ghit and Nicolas ...Correctness and Performance of Apache Spark SQL with Bogdan Ghit and Nicolas ...
Correctness and Performance of Apache Spark SQL with Bogdan Ghit and Nicolas ...
 
Correctness and Performance of Apache Spark SQL
Correctness and Performance of Apache Spark SQLCorrectness and Performance of Apache Spark SQL
Correctness and Performance of Apache Spark SQL
 
Python image processing_Python image processing.pptx
Python image processing_Python image processing.pptxPython image processing_Python image processing.pptx
Python image processing_Python image processing.pptx
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDs
 
AA_Unit 1_part-I.pptx
AA_Unit 1_part-I.pptxAA_Unit 1_part-I.pptx
AA_Unit 1_part-I.pptx
 
Face detection ppt by Batyrbek
Face detection ppt by Batyrbek Face detection ppt by Batyrbek
Face detection ppt by Batyrbek
 
Spark Streaming Tips for Devs and Ops
Spark Streaming Tips for Devs and OpsSpark Streaming Tips for Devs and Ops
Spark Streaming Tips for Devs and Ops
 
Spark Streaming Tips for Devs and Ops by Fran perez y federico fernández
Spark Streaming Tips for Devs and Ops by Fran perez y federico fernándezSpark Streaming Tips for Devs and Ops by Fran perez y federico fernández
Spark Streaming Tips for Devs and Ops by Fran perez y federico fernández
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data Meetup
 
IOEfficientParalleMatrixMultiplication_present
IOEfficientParalleMatrixMultiplication_presentIOEfficientParalleMatrixMultiplication_present
IOEfficientParalleMatrixMultiplication_present
 
Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015
 
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
 
Intro_OpenCV.ppt
Intro_OpenCV.pptIntro_OpenCV.ppt
Intro_OpenCV.ppt
 
Intel realtime analytics_spark
Intel realtime analytics_sparkIntel realtime analytics_spark
Intel realtime analytics_spark
 

More from Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimSpark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovSpark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit
 

More from Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Recently uploaded

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 

Recently uploaded (20)

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 

A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Databricks)

  • 1. A More Scalable Way of Making Recommendations with MLlib Xiangrui Meng Spark Summit 2015
  • 2. More interested in application than implementation? iRIS: A Large-Scale Food and Recipe Recommendation System Using Spark Joohyun Kim (MyFitnessPal, Under Armour) 3:30 – 4:00 PM Imperial Ballroom (Level 2) 2
  • 3. About Databricks • Founded by Apache spark creators • Largest contributor to Spark project, committed to keeping Spark 100% open source • End-to-end hosted platform https://www.databricks.com/product/databricks 3
  • 4. Spark MLlib Large-scale machine learning on Apache Spark
  • 5. About MLlib • Started in UC Berkeley AMPLab • Shipped with Spark 0.8 • Currently (Spark 1.4) • Contributions from 50+ organizations, 150+ individuals • Good coverage of algorithms 5
  • 6. MLlib’s Mission MLlib’s mission is to make practical machine learning easy and scalable. • Easy to build machine learning applications • Capable of learning from large-scale datasets 6
  • 7. 7 A Brief History of MLlib Optim Algo API App v0.8 GLMs k-means ALS Java/Scalagradient decent
  • 8. 8 A Brief History of MLlib Optim Algo API App v0.9 Python naive Bayes implicit ALS
  • 9. 9 A Brief History of MLlib Optim Algo API App v1.0 decision tree PCA SVD sparse data L-BFGS
  • 10. 10 A Brief History of MLlib v1.1 Optim Algo API App tree reduce torrent broadcast statistics NMF streaming linear regression Word2Vec gap
  • 11. 11 A Brief History of MLlib v1.2 Optim Algo API App pipeline random forest gradient boosted trees streaming k-means
  • 12. 12 A Brief History of MLlib v1.3 Optim Algo API App ALSv2 latent Dirichlet allocation (LDA) multinomial logistic regression Gaussian mixture model (GMM) distributed block matrix FP-growth / isotonic regression power iteration clustering pipeline in python model import/export SparkPackages
  • 13. 13 A Brief History of MLlib Optim Algo API App GLMs with elastic-net online LDA ALS.recommendAll feature transformers estimators Python pipeline API v1.4 OWL-QN
  • 14. Alternating Least Squares (ALS) Collaborative filtering via matrix factorization
  • 16. Low-Rank Assumption • What kind of movies do you like? • sci-fi / crime / action Perception of preferences usually takes place in a low dimensional latent space. So the rating matrix is approximately low-rank. 16 A ⇡ UV T , U 2 Rm⇥k , V 2 Rn⇥k aij ⇡ uT i vj
  • 17. Objective Function • minimize the reconstruction error • only check observed ratings 17 minimize 1 2 kA UV T k2 F minimize 1 2 X (i,j)2⌦ (aij uT i vj)2
  • 18. Alternating Least Squares (ALS) • If we fix U, the objective becomes convex and separable: • Each sub-problem is a least squares problem, which can be solved in parallel. So we take alternating directions to minimize the objective: • fix U, solve for V; • fix V, solve for U. 18 minimize 1 2 X j 0 @ X i,(i,j)2⌦ (aij uT i vj)2 1 A
  • 19. Complexity • To solve a least squares problem of size n-by-k, we need O(n k2) time. So the total computation cost is O(nnz k2), where nnz is the total number of ratings. • We take the normal equation approach in ALS • Solving each subproblem requires O(k2) storage. We call LAPACK’s routine to solve this problem. 19 AT Ax = AT b
  • 20. ALS Implementation in MLlib How to scale to 100,000,000,000 ratings?
  • 21. Communication Cost The most important factor of implementing an algorithm in parallel is the communication cost. To make ALS scale to billions of ratings, millions of users/items, we have to distribute ratings (A), user factors (U), and item factors (V). How? • all-to-all • block-to-block • … 21
  • 22. Communication: All-to-All • users: u1, u2, u3; items: v1, v2, v3, v4 • shuffle size: O(nnz k) (nnz: number of nonzeros, i.e., ratings) • sending the same factor multiple times 22
  • 23. Communication: Block-to-Block • OutBlocks (P1, P2) • for each item block, which user factors to send • InBlocks (Q1, Q2) • for each item, which user factors to use 23
  • 24. Communication: Block-to-Block • Shuffle size is significantly reduced. • We cache two copies of ratings — InBlocks for users and InBlocks for items. 24
  • 25. DAG Visualization of an ALS Job 25 ratingBlocks itemOutBlocks userInBlocks itemInBlocks userOutBlocks itemFactors 0 userFactors 1 itemFactors 1 preparation iterations
  • 26. Compressed Storage for InBlocks Array of rating tuples • huge storage overhead • high garbage collection (GC) pressure 26 [(v1, u1, a11), (v2, u1, a12), (v1, u2, a21), (v2, u2, a22), (v2, u3, a32)]
  • 27. Compressed Storage for InBlocks Three primitive arrays • low GC pressure • constructing all sub-problems together • O(nj k2) storage 27 ([v1, v2, v1, v2, v2], [u1, u1, u2, u2, u3], [a11, a12, a21, a22, a32])
  • 28. Compressed Storage for InBlocks Primitive arrays with items ordered: • solving sub-problems in sequence: • O(k2) storage • TimSort 28 ([v1, v1, v2, v2, v2], [u1, u2, u1, u2, u3], [a11, a21, a12, a22, a32])
  • 29. Compressed Storage for InBlocks Compressed items: • no duplicated items • map lookup for user factors 29 ([v1, v2], [0, 2, 5], [u1, u2, u1, u2, u3], [a11, a21, a12, a22, a32])
  • 30. Compressed Storage for InBlocks Store block IDs and local indices instead of user IDs. For example, u3 is the first vector sent from P2. Encode (block ID, local index) into an integer • use higher bits for block ID • use lower bits for local index • works for ~4 billions of unique users/items
 01 | 00 0000 0000 0000 30 ([v1, v2], [0, 2, 5], [0|0, 0|1, 0|0, 0|1, 1|0], [a11, a21, a12, a22, a32])
  • 31. Avoid Garbage Collection We use specialized code to replace the following: • initial partitioning of ratings ratings.map { r => ((srcPart.getPartition(r.user), dstPart.getPartition(r.item)), r) }.aggregateByKey(new RatingBlockBuilder)( seqOp = (b, r) => b.add(r), combOp = (b0, b1) => b0.merge(b1.build())) .mapValues(_.build()) • map IDs to local indices dstIds.toSet.toSeq.sorted.zipWithIndex.toMap 31
  • 32. Amazon Reviews Dataset • Amazon Reviews: ~6.6 million users, ~2.2 million items, and ~30 million ratings • Tested ALS on stacked copies on a 16-node m3.2xlarge cluster with rank=10, iter=10: 32
  • 33. Storage Comparison 33 1.2 1.3/1.4 userInBlock 941MB 277MB userOutBlock 355MB 65MB itemInBlock 1380MB 243MB itemOutBlock 119MB 37MB
  • 34. Spotify Dataset • Spotify: 75+ million users and 30+ million songs • Tested ALS on a subset with ~50 million users, ~5 million songs, and ~50 billion ratings. • thanks to Chris Johnson and Anders Arpteg • 32 r3.8xlarge nodes (~$10/hr with spot instances) • It took 1 hour to finish 10 iterations with rank 10. • 10 mins to prepare in/out blocks • 5 mins per iteration 34
  • 35. ALS Implementation in MLlib • Save communication by duplicating data • Efficient storage format • Watch out for GC • Native LAPACK calls 35
  • 36. Future Directions • Leverage on Project Tungsten to save some specialized code that avoids GC. • Solve issues with really popular items. • Explore other recommendation algorithms, e.g., factorization machine. 36
  • 37. Thank you. • Spark: http://spark.apache.org • Databricks: http://databricks.com