In this talk, we will get into the architectural and functional details as to how we build scalable and robust data pipelines for music recommendations at Spotify. We will also discuss some of the challenges and an overview of work to address these challenges.
Scale your database traffic with Read & Write split using MySQL Router
Building Data Pipelines for Music Recommendations at Spotify
1. October 17, 2015
Data Pipelines for
Music Recommendations
@
Spotify
Vidhya Murali
@vid052
2. Vidhya Murali
Who Am I?
2
•Areas of Interest: Data & Machine Learning
•Data Engineer @Spotify
•Masters Student from the University of Wisconsin Madison
aka Happy Badger for life!
3. “Torture the data, and it will
confess!”
3
– Ronald Coase, Nobel Prize Laureate
4. Spotify’s Big Data
4
•Started in 2006, now available in 58 countries
• 70+ million active users, 20+ million paid subscribers
• 30+ million songs in our catalog, ~20K added every day
• 1.5 billion playlists so far and counting
• 1 TB of user data logged every day
• Hadoop cluster with 1500 nodes
• ~20,000 Hadoop jobs per day
5. Music Recommendations at Spotify
Features:
Discover
Discover Weekly
Moments
Radio
Related Artists
5
7. Approaches 7
•Manual curation by Experts
•Editorial Tagging
•Metadata (e.g. Label provided data, NLP over News,
Blogs)
•Audio Signals
•Collaborative Filtering Model
8. Approaches 7
•Manual curation by Experts
•Editorial Tagging
•Metadata (e.g. Label provided data, NLP over News,
Blogs)
•Audio Signals
•Collaborative Filtering Model
9. Collaborative Filtering Model 8
•Find patterns from user’s past behavior to generate
recommendations
•Domain independent
•Scalable
•Accuracy (Collaborative Model) >= Accuracy (Content
Based Model)
10. Definition of CF
9
Hey,
I like tracks P, Q, R, S!
Well,
I like tracks Q, R, S, T!
Then you should check out
track P!
Nice! Btw try track T!
Legacy Slide of Erik Bernhardsson
12. The YoLo Problem
10
•YoLo Problem: “You Only Listen Once” to judge recommendations
•Goal: Predict if users will listen to new music (new to user)
13. The YoLo Problem
10
•YoLo Problem: “You Only Listen Once” to judge recommendations
•Goal: Predict if users will listen to new music (new to user)
•Challenges
•Scale of catalog (30M songs + ~20K added every day)
•Repeated consumption of music is not very uncommon
•Music is niche
•Music consumption is heavily influenced by user’s lifestyle
14. The YoLo Problem
10
•YoLo Problem: “You Only Listen Once” to judge recommendations
•Goal: Predict if users will listen to new music (new to user)
•Challenges
•Scale of catalog (30M songs + ~20K added every day)
•Repeated consumption of music is not very uncommon
•Music is niche
•Music consumption is heavily influenced by user’s lifestyle
•Input: Feedback is implicit through streaming behavior, collection adds,
browse history, search history etc
16. User Plays to Track Recs 11
1. Weighted play counts from logs
17. User Plays to Track Recs 11
1. Weighted play counts from logs
2. Train Model using the input signals
18. User Plays to Track Recs 11
1. Weighted play counts from logs
2. Train Model using the input signals
3. Generate recs from
the trained model
19. User Plays to Track Recs 11
1. Weighted play counts from logs
2. Train Model using the input signals
3. Generate recs from
the trained model
4. Post process the
recommendations
20. 12
Step 1: ETL of Logs
•Extract and transform the anonymized logs to training data set
•Case: Logs -> (user, track, wt.count)
21. Step 2: Construct Big Matrix! 13
Tracks(n)
Users(m)
Vidhya
Burn by Ellie Goulding
22. Step 2: Construct Big Matrix! 13
Tracks(n)
Users(m)
Vidhya
Burn by Ellie Goulding
Order of 70M x 30M!
23. Latent Factor Models 14
Vidhya
Burn
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
•Use a “small” representation for each user and items(tracks): f-dimensional
vectors
.. .
.. .
.. .
.. .
. .
...
...
...
...
..
m m
n
m n
24. Latent Factor Models 14
Vidhya
Burn
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
•Use a “small” representation for each user and items(tracks): f-dimensional
vectors
.. .
.. .
.. .
.. .
. .
...
...
...
...
..
m m
n
m n
User Track Matrix:
(m x n)
25. Latent Factor Models 14
Vidhya
Burn
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
•Use a “small” representation for each user and items(tracks): f-dimensional
vectors
.. .
.. .
.. .
.. .
. .
...
...
...
...
..
m m
n
m n
User Vector Matrix:
X: (m x f)
User Track Matrix:
(m x n)
26. Latent Factor Models 14
Vidhya
Burn
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
•Use a “small” representation for each user and items(tracks): f-dimensional
vectors
.. .
.. .
.. .
.. .
. .
...
...
...
...
..
m m
n
m n
User Vector Matrix:
X: (m x f)
Track Vector Matrix:
Y: (n x f)
User Track Matrix:
(m x n)
27. Latent Factor Models 14
Vidhya
Burn
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
•Use a “small” representation for each user and items(tracks): f-dimensional
vectors
.. .
.. .
.. .
.. .
. .
...
...
...
...
..
(here, f = 2)
m m
n
m n
User Vector Matrix:
X: (m x f)
Track Vector Matrix:
Y: (n x f)
User Track Matrix:
(m x n)
30. Matrix Factorization using Implicit Feedback
User Track Play
Count Matrix
User Track
Preference
Matrix
Binary Label:
1 => played
0 => not played
15
31. Matrix Factorization using Implicit Feedback
User Track Play
Count Matrix
User Track
Preference
Matrix
Binary Label:
1 => played
0 => not played
Weights
Matrix
Weights based on play count
and smoothing
15
33. Implicit Matrix Factorization 17
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
•Aggregate all (user, track) streams into a large matrix
•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by
minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight
•Why?: Once learned, the top recommendations for a user are the top inner products between
their latent factor vector in X and the track latent factor vectors in Y.
X YUsers
Tracks
• = bias for user
• = bias for item
• = regularization parameter
• = 1 if user streamed track else 0
•
• = user latent factor vector
• = item latent factor vectoryi
34. Alternating Least Squares 18
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
X YUsers
Tracks
• = bias for user
• = bias for item
• = regularization parameter
• = 1 if user streamed track else 0
•
• = user latent factor vector
• = item latent factor vector
Fix tracks
•Aggregate all (user, track) streams into a large matrix
•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by
minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight
•Why?: Once learned, the top recommendations for a user are the top inner products between
their latent factor vector in X and the track latent factor vectors in Y.
yi
35. 19
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
X YUsers
• = bias for user
• = bias for item
• = regularization parameter
• = 1 if user streamed track else 0
•
• = user latent factor vector
• = item latent factor vector
Fix tracks
Solve for users
•Aggregate all (user, track) streams into a large matrix
•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by
minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight
•Why?: Once learned, the top recommendations for a user are the top inner products between
their latent factor vector in X and the track latent factor vectors in Y.
Alternating Least Squares
yi
Tracks
36. 20
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
X YUsers
• = bias for user
• = bias for item
• = regularization parameter
• = 1 if user streamed track else 0
•
• = user latent factor vector
• = item latent factor vector
Fix users
•Aggregate all (user, track) streams into a large matrix
•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by
minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight
•Why?: Once learned, the top recommendations for a user are the top inner products between
their latent factor vector in X and the track latent factor vectors in Y.
Alternating Least Squares
yi
Tracks
37. 21
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
X YUsers
• = bias for user
• = bias for item
• = regularization parameter
• = 1 if user streamed track else 0
•
• = user latent factor vector
• = item latent factor vector
Fix users
Solve for tracks
•Aggregate all (user, track) streams into a large matrix
•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by
minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight
•Why?: Once learned, the top recommendations for a user are the top inner products between
their latent factor vector in X and the track latent factor vectors in Y.
Alternating Least Squares
yi
Tracks
38. 22
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
X YUsers
• = bias for user
• = bias for item
• = regularization parameter
• = 1 if user streamed track else 0
•
• = user latent factor vector
• = item latent factor vector
Fix users
Solve for tracks
Repeat until convergence…
•Aggregate all (user, track) streams into a large matrix
•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by
minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight
•Why?: Once learned, the top recommendations for a user are the top inner products between
their latent factor vector in X and the track latent factor vectors in Y.
Alternating Least Squares
yi
Tracks
39. 23
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
X YUsers
• = bias for user
• = bias for item
• = regularization parameter
• = 1 if user streamed track else 0
•
• = user latent factor vector
• = item latent factor vector
Fix users
Solve for tracks
Repeat until convergence…
•Aggregate all (user, track) streams into a large matrix
•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by
minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight
•Why?: Once learned, the top recommendations for a user are the top inner products between
their latent factor vector in X and the track latent factor vectors in Y.
Alternating Least Squares
yi
Tracks
41. Why Vectors? 25
•Vectors encode higher order dependencies
•Users and Items in the same vector space!
•Use vector similarity to compute:
•Item-Item similarities
•User-Item recommendations
•Linear complexity: order of number of latent factors
•Easy to scale up
46. 28
Annoy
•70 million users, at least 4 million tracks for candidates per user
•Brute Force Approach:
•O(70M x 4M x 10) ~= 0(3 peta-operations)!
• Approximate Nearest Neighbor Oh Yeah!
• Uses Local Sensitive Hashing
• Clone: https://github.com/spotify/annoy
49. Matrix Factorization with MapReduce
31
Reduce stepMap step
u % K = 0
i % L = 0
u % K = 0
i % L = 1
...
u % K = 0
i % L = L-1
u % K = 1
i % L = 0
u % K = 1
i % L = 1
... ...
... ... ... ...
u % K = K-1
i % L = 0
... ...
u % K = K-1
i % L = L-1
item vectors
item%L=0
item vectors
item%L=1
item vectors
i % L = L-1
user vectors
u % K = 0
user vectors
u % K = 1
user vectors
u % K = K-1
all log entries
u % K = 1
i % L = 1
u % K = 0
u % K = 1
u % K = K-1
•Split the matrix up into K x L blocks.
•Each mapper gets a different block, sums up intermediate terms, then key by
user (or item) to reduce final user (or item) vector.
50. Matrix Factorization with MapReduce
32
One map task
Distributed
cache:
All user vectors
where u % K = x
Distributed
cache:
All item vectors
where i % L = y
Mapper Emit contributions
Map input:
tuples (u, i, count)
where
u % K = x
and
i % L = y
Reducer New vector!
•Input to Mapper is a list of (user, item, count) tuples
– user modulo K is the same for all users in block
– item modulo L is the same for all items in the block
– Mapper aggregates intermediate contributions for each user (or item)
– Eg: K=4, Mapper #1 gets user 1, 5, 9, 13 etc
– Reducer keys by user (or item), aggregates intermediate mapper sums and solves closed form for final user
(or item) vector
54. Optimizing for the Yolo Problem
•OFFLINE TESTING:
•Experts’ Inputs
•Measure accuracy
•A/B TESTS: control vs a/b group. Some useful metrics we consider:
•DAU / WAU / MAU
•Retention
•Session Length
•Skip Rate
36
55. Challenge Accepted!
•Cold start problem for both users and new music/upcoming artists:
•Content based signals, real time recommendation
•Measuring recommendation quality:
•A/B test metrics
•Active forums for getting user feedback
•Scam Attacks:
•Rule based model to detect scammers
•Humans choices are not always predictable:
•Faith in humanity
37
56. What Next?
•Personalize user experience on Spotify for every moment:
•Right Now
•Recommend other media formats:
•Podcasts
•Video
•Power music recommendations on other platforms:
•Google Now
38