2017 10-10 (netflix ml platform meetup) learning item and user representations with sparse data in recommender systems (focused learning, factorized deep retrieval)
1) Learning user and item representations is challenging due to sparse data and shifting preferences in recommender systems.
2) The presentation outlines research at Google to address sparsity through two approaches: focused learning, which develops specialized models for subsets of data like genres or cold-start items, and factorized deep retrieval, which jointly embeds items and their features to predict preferences for fresh items.
3) The techniques have improved overall viewership and nomination of candidates, demonstrating their effectiveness in production recommender systems.
Similar to 2017 10-10 (netflix ml platform meetup) learning item and user representations with sparse data in recommender systems (focused learning, factorized deep retrieval)
Similar to 2017 10-10 (netflix ml platform meetup) learning item and user representations with sparse data in recommender systems (focused learning, factorized deep retrieval) (20)
Nell’iperspazio con Rocket: il Framework Web di Rust!
2017 10-10 (netflix ml platform meetup) learning item and user representations with sparse data in recommender systems (focused learning, factorized deep retrieval)
1. Learning item and user representations with sparse data in recommender systems
Ed H. Chi
Google Inc.
Abstract:
Recommenders match users in a particular context with the best personalized items that they will engage with. The problem is that users have shifting item and
topic preferences, and give sparse feedback over time (or no-feedback at all). Contexts shift from interaction-to-interaction at various time scales (seconds to
minutes to days). Learning about users and items is hard because of noisy and sparse labels, and the user/item set changes rapidly and is large and long-tailed.
Given the enormity of the problem, it is a wonder that we learn anything at all about our items and users.
In this talk, I will outline some research at Google to tackle the sparsity problem. First, I will summarize some work on focused learning, which suggests that
learning about subsets of the data requires tuning the parameters for estimating the missing unobserved entries. Second, we utilize joint feature factorization to
impute possible user affinity to freshly-uploaded items, and employ hashing-based techniques to perform extremely fast similarity scoring on a large item
catalog, while controlling variance. This approach is currently serving a ~1TB model on production traffic using distributed TensorFlow Serving, demonstrating that
our techniques work in practice. I will conclude with some remarks on possible future directions.
Bio:
Ed is a Research Scientist at Google, leading a team focused on recommendation systems, machine learning, and social interaction research. He has launched
significant improvements of recommenders for YouTube, Google Play Store and Google+. With over 35 patents and over 100 research articles, he is known for
research on Web and online social systems, and the effects of social signals on user behavior. Prior to Google, he was the Area Manager and Principal Scientist at
Palo Alto Research Center‘s Augmented Social Cognition Group, where he led the group in understanding how social systems help groups of people to
remember, think and reason. Ed completed his three degrees (B.S., M.S., and Ph.D.) in 6.5 years from University of Minnesota, and has been doing research on
software systems since 1993. He has been featured and quoted in the press, including the Economist, Time Magazine, LA Times, and the Associated Press, and
has won awards for both teaching and research. In his spare time, Ed is an avid photographer and snowboarder.
1
2. Confidential + Proprietary
Ed H. Chi
Research Scientist & Manager, MLX/SIR Research team
Google Research & Machine Intelligence
Learning Item and User Representations with
Sparse Data in Recommender Systems
7. Confidential & Proprietary
The Recommendation Problem
User Context Item
Watch
History
Source
Preferences
Demographics Client Last
Interaction
Leanback or
Lean-forward
Topic Past user
behavior
Clickbait /
Brand safe?
7
9. Confidential & Proprietary
The Recommendation Problem
User Context Item
Shifting User
Preferences;
Sparse Feedback
Dynamic and
Changing
Contexts
9
10. Confidential & Proprietary
The Recommendation Problem
User Context Item
Noisy & Sparse
Labels;
Large Changing
Item Set
Shifting User
Preferences;
Sparse Feedback
Dynamic and
Changing
Contexts
10
11. Confidential & Proprietary
The Recommendation Problem
User Context Item
Noisy & Sparse
Labels;
Large Changing
Item Set
Shifting User
Preferences;
Sparse Feedback
Dynamic and
Changing
Contexts
Oh, and do this with low latency
with a huge corpus of users and items!
11
17. Confidential & Proprietary
We don’t represent users/items equally!
Per-User Prediction Accuracy
Frequency
Error (MSE)
Per-Movie Prediction Accuracy
Frequency
Error (MSE)
Wanted: A model that predicts well
for all users and all items.
17
18. Confidential & Proprietary
Focused Learning Problem Definition
Given:
● A dataset: R
● Group of items (or users) to focus on: I
Find: A model that has high prediction accuracy for RI
18
19. Confidential & Proprietary
Approach
1. Focus Selection - Where should the additional models focus?
2. Focused Learning - How can learn a new model to improve
prediction on a subset of the data?
19
20. Confidential & Proprietary
Approach
1. Focus Selection - Where should the additional models focus?
2. Focused Learning - How can learn a new model to improve
prediction on a subset of the data?Subset of columns
Movies
Users
Movies
Users
Subset of rows
“Focus Group” 20
21. Confidential & Proprietary
Approach
1. Focus Selection - Where should the additional models focus?
2. Focused Learning - How can learn a new model to improve
prediction on a subset of the data?
21
28. Confidential & Proprietary
Summary
1. “Globally optimal” is not best for
everybody.
Myth of the average user!
2. Learn additional models focused
on problematic regions.
The long-tail needs different
exploration strategies!
Per-User Prediction Accuracy
Frequency
MSE
Alex Beutel, Ed H. Chi, Zhiyuan Cheng, Hubert Pham, John
Anderson. Beyond Globally Optimal: Focused Learning
for Improved Recommendations. In WWW 2017.
28
32. Confidential & Proprietary
Deep Retrieval: A bit of history
Deep retrieval: Large-scale machine-learned item retrieval
[2013] Sibyl Deep Retrieval
Sibyl model (linear) + token indexing
[2017] TFX Factorized Deep Retrieval
WALS model (bilinear factorization) + ScaM
32
33. Confidential & Proprietary
Serving Flow
Huge item corpus
1,000 candidates
Offline
refinement
ranker
deep retrieval
nominator
online
re-ranker
other
candidate
generators
33
34. Confidential & Proprietary
Serving Flow
1,000 candidates
Offline
refinement
ranker
deep retrieval
nominator
Challenges
● index-friendly ML model
● generalizes well
● scores accurately
● avoids WTFs
online
re-ranker
other
candidate
generators
34
Huge item corpus
36. Confidential & Proprietary
WALS factorization
● Loss function:
● : 1. Prior on implicit negatives; 2. Control the degree of generalization
● Scalable training: Linear convergence by AltMin. Distributed TF implementation (1B * 1B)
P U VT
36
37. Confidential & Proprietary
Collective matrix factorization
Limitations of vanilla factorization:
1. Fixed vocabs
2. Does not make use of features.
watch videos
impression
videos
A
37
38. Confidential & Proprietary
Collective matrix factorization: learning feature representation
Limitations of vanilla factorization:
1. Fixed vocabs
2. Does not make use of features.
Solution: Co-embed features and items.
● Learns the representation of features
and items simultaneously.
● Key to learning Item and Feature
latent factors is to zero-initialize
submatrix D.
features of
watch videos
watch videos
impression
videos
features of
impression
videos
AB
CT
D
38
39. Confidential & Proprietary
features of
watch videos
watch videos
topic: pop music keyword: sugar channel: maroon 5
1 1 1
impression
videos
features of
impression
videos
AB
CT
D
39
Collective matrix factorization: learning feature representation
40. Confidential & Proprietary
Model tuning for missing observations
features of
watch videos
watch videos
impression
videos
features of
impression
videos
AB
CT
D
40
Remove all data points from submatrix A.
41. Confidential & Proprietary
Model tuning for missing observations
features of
watch videos
watch videos
impression
videos
features of
impression
videos
AB
CT
D
Cosine dist between proj and
original embeddings
Reproject videos using only feature
embeddings to predict their co-watch patterns.
41
43. Confidential & Proprietary
WALS Factorization
● Full TensorFlow implementation
○ Custom-ops/kernels for alternating minimization.
● Single-Machine Version
○ tf.learn.Estimator API
○ Open-sourced in tf.contrib
● Distributed Version
○ (not yet available externally)
○ Specialized Synchronization Control with Exact
Synchronized Row/Column Sweep Switching.
○ Fault tolerant.
○ Scalable. 400M x 400M x 200D trained in 1 ~ 3 days.
○ Moving to tf.Estimator Interface with TFX integration.
43
TF WALS
models
AB
CT
D
Training
44. Confidential & Proprietary
C++ and TensorFlow Serving backends
● Distributed TensorFlow Serving using Remote-session-run-op
○ Support embedding lookup and multi-sharded nearest-neighbor lookups.
● Serving a 1.2TB model in TensorFlow Serving!
Embedding
lookup graph
(sharded)
Master
graph
ScaM graph
(sharded)
44
User
Request
TF WALS
models
AB
CT
D
Training Input
Serving Training
45. Confidential & Proprietary
The Lesson
Modeling the long-tail items/users requires
special techniques and infrastructure
45
46. Confidential & Proprietary
The Lesson
Modeling the long-tail items/users requires
special techniques and infrastructure:
Focused Learning & Factorized Deep Retrieval
46
47. Thank you! Questions?
47
Learning Item and User Representations with
Sparse Data in Recommender Systems
Contact: edchi@google.com
Joint work with:
● Focused Learning: Alex Beutel, Zhiyuan Cheng, Hubert Pham, John Anderson
● Factorized DR: Xinyang Yi, Yifan Chen, Lichan Hong, Xiang Wu, Sukriti Ramesh,
Noah Fiedel, & from YouTube: Lukasz Heldt, Nandini, Nandini Seshadri