What really are recommendations engines nowadays?
This presentation introduces the foundations of recommendation algorithms, and covers common approaches as well as some of the most advanced techniques. Although more focused on efficiency than theoretical properties, basics of matrix algebra and optimization-based machine learning are used through the presentation.
Table of Contents:
1. Collaborative Filtering
1.1 User-User
1.2 Item-Item
1.3 User-Item
* Matrix Factorization
* Stochastic Gradient Descent (SGD)
* Truncated Singular Value Decomposition (SVD)
* Alternating Least Square (ALS)
* Deep Learning
2. Content Extraction
* Item-Item Similarities
* Deep Content Extraction: NLP, CNN, LSTM
3. Hybrid Models
4. In Production
4.1 Problematics
4.2 Solutions
4.3 Tools
6. Recommendation Engine – Examples
Facebook–“People You May Know”
Netflix–“Other Movies You May Enjoy”
LinkedIn–“Jobs You May Be Interested In”
Amazon–“Customer who bought this item
also bought …”
YouTube–“Recommended Videos”
Google–“Search results adjusted”
Pinterest–“Recommended Images”
7. Plan for Today
1. Collaborative Filtering
- User-User
- Item-Item
- User-Item
2. Content-Based
3. Hybrid Model
4. In Production
11. 1. Collaborative Filtering – Similarity Function
Real function that quantify the similarity between two objects.
1 − ||a − b||2 = 1 −
n
∑
i=1
(ai − bi)2
1 − ||a − b||1 = 1 −
n
∑
i=1
|ai − bi |
1 − ||a − b||p = 1 −
(
n
∑
i=1
|ai − bi |p
)
1/p
a⊤
b
||a||2 ||b||2
=
∑
n
i=1
aibi
∑
n
i=1
a2
i ∑
n
i=1
b2
i
sim(a, b) = …
12. 1. Collaborative Filtering – Similarity Function
Real function that quantify the similarity between two objects.
1 − ||a − b||2 = 1 −
n
∑
i=1
(ai − bi)2
1 − ||a − b||1 = 1 −
n
∑
i=1
|ai − bi |
1 − ||a − b||p = 1 −
(
n
∑
i=1
|ai − bi |p
)
1/p
a⊤
b
||a||2 ||b||2
=
∑
n
i=1
aibi
∑
n
i=1
a2
i ∑
n
i=1
b2
i
sim(a, b) = …
20. 1. Collaborative Filtering – User-User Benefits
- “People who bought that also bought that”
- Good when #items >> #users
21. 1. Collaborative Filtering – User-User Challenges
- Sparsity
- Don’t scale – Nearest Neighbors requires computation that grows with
the number of users and items
- Model Too Simplistic – Accuracy of recommendation may be poor
31. 1. Collaborative Filtering – Item-Item Benefits
- “If you like this you might also like that”
- Good when #users >> #items
- Very fast after the item-item table has been pre-computed
32. 1. Collaborative Filtering – Item-Item Challenges
- Bottleneck – similarity computation
- Space complexity – dense item-item similarity matrix
- Model Too Simplistic – Accuracy of recommendation may be poor
36. 1. Collaborative Filtering – User-Item
…
… …
User Item
y!
u,i
D(u,i) ≈ Uu
⊤
Ii =
∑
z
Uu,z Ii,z
…
…
Uu Ii
37. 1. Collaborative Filtering – User-Item
…
… …
y!
u,i
(U, I) = argmin
∑ (U⊤
u Ii − D(u,i))
2
D(u,i) ≈ Uu
⊤
Ii =
∑
z
Uu,z Ii,z
User ItemUu Ii
38. 1. Collaborative Filtering – Matrix Factorization
ℒ(U, I) =
∑
u,i
(U⊤
u Ii − D(u,i))
2
≈ ||U⊤
I − D||2
F (U, I) = argmin ℒ(U, I)
- SGD – Stochastic Gradient Descent
Ub+1
← Ub
− η
∂ℒ(Ub
, Ib
)
∂Ub
Ib+1
← Ib
− η
∂ℒ(Ub
, Ib
)
∂Ib
39. 1. Collaborative Filtering – Matrix Factorization
ℒ(U, I) =
∑
u,i
(U⊤
u Ii − D(u,i))
2
≈ ||U⊤
I − D||2
F (U, I) = argmin ℒ(U, I)
- SGD – Stochastic Gradient Descent
- SVD – Truncated Singular Value Decomposition
Ub+1
← Ub
− η
∂ℒ(Ub
, Ib
)
∂Ub
Ib+1
← Ib
− η
∂ℒ(Ub
, Ib
)
∂Ib
D = VΣW ≈ V:kΣ:kW:k U⊤
← V:kΣ1/2
:k I ← W:kΣ1/2
:k
40. 1. Collaborative Filtering – Matrix Factorization
ℒ(U, I) =
∑
u,i
(U⊤
u Ii − D(u,i))
2
≈ ||U⊤
I − D||2
F (U, I) = argmin ℒ(U, I)
- SGD – Stochastic Gradient Descent
- SVD – Truncated Singular Value Decomposition
- ALS – Alternating Least Square
Ub+1
← Ub
− η
∂ℒ(Ub
, Ib
)
∂Ub
Ib+1
← Ib
− η
∂ℒ(Ub
, Ib
)
∂Ib
D = VΣW ≈ V:kΣ:kW:k U⊤
← V:kΣ1/2
:k I ← W:kΣ1/2
:k
Ub+1
← DIb
(Ib⊤
Ib
)−1
Ib+1
← DUb
(Ub⊤
Ub
)−1
41. 1. Collaborative Filtering – User-Item Benefits
similar
- Fast after U and I are pre-computed
- Can learn more about users with U
- Can learn more about items with I
42. 1. Collaborative Filtering – User-Item Challenges
- Sparsity
- Need to re-learn everything every time a new user or new item or new
rating enter the game
- Only linear prediction
43. 1. Collaborative Filtering – Sparsity Example, the Netflix Prize
- 17,770 Movies
- 480,189 Users
- 100,480,507 Ratings
How dense is our Matrix ?
Ratings
Movies ×Users
=
100,480,507
17,770 × 480,189
×100 = 1.18%
44. 1. Collaborative Filtering – Sparsity Example, the Netflix Prize
- 17,770 Movies
- 480,189 Users
- 100,480,507 Ratings
How dense is our Matrix ?
Ratings
Movies ×Users
=
100,480,507
17,770 × 480,189
×100 = 1.18%
users
movies
45. 1. Collaborative Filtering – Deep Learning
- Non-linear interactions
- Enable transfer learning on multiple dataset
- Enable to use meta-data (keywords, tags)
- Enable to use graph-based data (those who like movies with this actor also
like movies with this other actor)
50. 2. Content Extraction
Based on “what does the user like
about an item”:
- Meta-data extraction
- Clustering
- Similarity/distance between objects
likely buy
buy
51. 2. Content Extraction – Item-Item Similarity
- Allow to compute similarities between items
- Does not require rating dataset
- The previous item-item recommendation algorithm still works
- No item cold start
- User attributes mitigate user cold start
52. 2. Content Extraction – Deep
Every single item is not just about
the available meta-data.
Encode information from:
- Images (CNN)
- Text Information (NLP)
- Audio (LSTM)
Input
A documentary which
examines the creation and
co-production of the
popular children’s
television program in three
developing countries:
Bangladesh, Kosovo, and
South Africa.
Prediction
Comedy,
Adventure,
Family,
Animation
In his spectacular film
debut, young Babar, King
of the Elephants must
save his homeland from
certain destruction by
Rataxes and his band of
invading rhinos.
Documentary,
History
Comedy,
Adventure,
Family,
Animation
Adventure, War,
Documentary,
Music
54. 3. Hybrid Model
Layer 3
Layer 2
Layer 1
pooling
u g1 g2
1 1 … 0 0
User Attributes
g2 g4 g5i
0 1 … 1 1
Item Attributes
g100 g… g… g… g200 g… g…
A documentary which examines
the creation and co-production of
the popular children’s television
program in three developing
countries: Bangladesh, Kosovo,
and South Africa.
g…
pooling
g… g…
̂yu,i
55. 3. Hybrid Model
Layer 3
Layer 2
Layer 1
pooling
u g1 g2
1 1 … 0 0
User Attributes
g2 g4 g5i
0 1 … 1 1
Item Attributes
g100 g… g… g… g200 g… g…
A documentary which examines
the creation and co-production of
the popular children’s television
program in three developing
countries: Bangladesh, Kosovo,
and South Africa.
g…
pooling
g… g…
̂yu,i
pre-computed as input
fully trained in SGD
56. 3. Hybrid Model
- The previous deep learning recommendation algorithm still works
- Improve recommendations of items without many ratings
- Mitigate item cold start
- Mitigate user cold start
- Improve transfer learning
58. 4. In Production – Current Problematics
Data quality – thumbs up or down vs 10 stars; implicit feedback; etc.
Sparsity – increase in size with items / users
Cold start problem – user cold start; item cold start
Recommendation speed – O(#items) algorithms not possible
59. 4. In Production – Solutions
Data quality
Unbiased consumer app where the users enter their tastes
Sparsity
User interaction: Ask each user to rate the most informative items
Cold start problem
Hybrid models with deep content extraction to recommend new items without ratings
Recommendation speed
Use item-item rec-sys with pre-computed item similarities to compute a (large) set of
candidates; compute feedforward neural network on candidates only
60. 4. In Production – Tools
LightFM
🙂 open source: https://github.com/lyst/lightfm
🙂 hybrid: matrix factorization + context
😐 linear
Deep Learning?
😐 way less tools than Computer Vision or NLP
😐 no pre-trained model available – you need large dataset and GPUs
😐 TensorFlow and PyTorch support for sparse data is limited
63. 4. In Production – Cloud Based
Do you want to hear more?
Let's get in touch with us at Crossing Minds!
We are building an API to offer state-of-the-art recommendations on the cloud