SlideShare une entreprise Scribd logo
1  sur  83
Artwork
Personalization
at Netflix
Justin Basilico
QCon SF 2018
2018-11-05
@JustinBasilico
Which artwork to show?
A good image is...
1. Representative
2. Informative
3. Engaging
4. Differential
A good image is...
1. Representative
2. Informative
3. Engaging
4. Differential
Personal
Intuition: Preferences in cast members
Intuition: Preferences in genre
Choose artwork so that members
understand if they will likely enjoy a title
to maximize satisfaction and retention
Challenges in Artwork
Personalization
Everything is a Recommendation
Over 80% of what
people watch
comes from our
recommendations
Rankings
Rows
Attribution
Pick
only one
Was it the recommendation or artwork?
Or both?
▶
Change Effects
Which one caused the play?
Is change confusing?
Day 1 Day 2
▶
● Creatives select the images that are available
● But algorithms must be still robust
Adding meaning and avoiding clickbait
Scale
Over 20M RPS
for images at
peak
Traditional Recommendations
Collaborative Filtering:
Recommend items that
similar users have chosen
0 1 0 1 0
0 0 1 1 0
1 0 0 1 1
0 1 0 0 0
0 0 0 0 1
Users
Items
Members can only play
images we choose
Need
something
more
Bandit
Not that kind
of Bandit
Image from Wikimedia commons
Multi-Armed Bandits (MAB)
● Multiple slot machines with
unknown reward distribution
● A gambler can play one arm at a time
● Which machine to play to maximize
reward?
Bandit Algorithms Setting
Each round:
● Learner chooses an action
● Environment provides a real-valued reward for action
● Learner updates to maximize the cumulative reward
Learner
(Policy)
Environment
Action
Reward
Artwork Optimization as Bandit
● Environment: Netflix homepage
● Learner: Artwork selector for a show
● Action: Display specific image for show
● Reward: Member has positive engagement
Artwork Selector
▶
● What images should creatives provide?
○ Variety of image designs
○ Thematic and visual differences
● How many images?
○ Creating each image has a cost
○ Diminishing returns
Images as Actions
● What is a good outcome?
✓ Watching and enjoying the content
● What is a bad outcome?
✖ No engagement
✖ Abandoning or not enjoying the
content
Designing Rewards
Metric: Take Fraction
▶
Example: Altered Carbon
Take Fraction: 1/3
Minimizing Regret
● What is the best that a bandit can do?
○ Always choose optimal action
● Regret: Difference between optimal
action and chosen action
● To maximize reward, minimize the
cumulative regret
Bandit Example
1 0 1 0 ?
0 0 ?
0 1 0 ?
Historical rewardsActions
Bandit Example
1 0 1 0 ?
0 0 ?
0 1 0 ?
Historical rewardsActions
Choose
image
Bandit Example
1 0 1 0 ?
0 0 ?
0 1 0 ?
Historical rewardsActions
2/4
0/2
1/3
Observed
Take
Fraction
Overall: 3/9
Strategy
Show current best image
Try another image to learn
if it is actually better
ExplorationMaximization
vs.
Principles of Exploration
● Gather information to make the best overall decision
in the long-run
● Best long-term strategy may involve short-term
sacrifices
Common strategies
1. Naive Exploration
2. Optimism in the Face of Uncertainty
3. Probability Matching
Naive Exploration: 𝝐-greedy
● Idea: Add a noise to the greedy policy
● Algorithm:
○ With probability 𝝐
■ Choose one action uniformly at random
○ Otherwise
■ Choose the action with the best reward so far
● Pros: Simple
● Cons: Regret is unbounded
Epsilon-Greedy Example
1 0 1 0 ?
0 0 ?
0 1 0 ?
2/4
(greedy)
0/2
1/3
Observed
Reward
Epsilon-Greedy Example
1 0 1 0 ?
0 0 ?
0 1 0 ?
𝝐 / 3
𝝐 / 3
1 - 2𝝐 / 3
Epsilon-Greedy Example
1 0 1 0 ?
0 0 ?
0 1 0 ?
Epsilon-Greedy Example
1 0 1 0
0 0 0
0 1 0
2/4
(greedy)
0/3
1/3
Observed
Reward
Optimism: Upper Confidence Bound (UCB)
● Idea: Prefer actions with uncertain values
● Approach:
○ Compute confidence interval of observed rewards
for each action
○ Choose action a with the highest 𝛂-percentile
○ Observe reward and update confidence interval
for a
● Pros: Theoretical regret minimization properties
● Cons: Needs to update quickly from observed rewards
Beta-Bernoulli Distribution
Image from Wikipedia
Beta Bernoulli
Pr(1) = p
Pr(0) = 1 - p
Prior
Bandit Example with Beta-Bernoulli
2/4
0/2
1/3
Observed Take
Fraction
Prior:
𝛽(1, 1)
𝛽(3, 3)
𝛽(2, 3)
𝛽(1, 3)+ =
A
B
C
Bayesian UCB Example
1 0 1 1 ?
0 0 ?
0 1 0 ?
[0.15, 0.85]
[0.07, 0.81]
Reward 95%
Confidence
[0.01, 0.71]
Bayesian UCB Example
1 0 1 1 ?
0 0 ?
0 1 0 ?
[0.15, 0.85]
[0.07, 0.81]
Reward 95%
Confidence
[0.01, 0.71]
Bayesian UCB Example
1 0 1 1 0
0 0
0 1 0
[0.12, 0.78]
[0.07, 0.81]
Reward 95%
Confidence
[0.01, 0.71]
Bayesian UCB Example
1 0 1 1 0
0 0
0 1 0
[0.12, 0.78]
[0.07, 0.81]
Reward 95%
Confidence
[0.01, 0.71]
Probabilistic: Thompson Sampling
● Idea: Select the actions by the probability they are the best
● Approach:
○ Keep a distribution over model parameters for each action
○ Sample estimated reward value for each action
○ Choose action a with maximum sampled value
○ Observe reward for action a and update its parameter distribution
● Pros: Randomness continues to explore without update
● Cons: Hard to compute probabilities of actions
Thompson Sampling Example
1 0 1 0 ?
0 0 ?
0 1 0 ?
𝛽(3, 3) =
𝛽(2, 3) =
Distribution
𝛽(1, 3) =
Thompson Sampling Example
1 0 1 0 ?
0 0 ?
0 1 0 ?
0.38
0.59
Sampled
values
0.18
Thompson Sampling Example
1 0 1 0 ?
0 0 ?
0 1 0 ?
0.38
0.59
Sampled
values
0.18
Thompson Sampling Example
1 0 1 0
0 0
0 1 0 1
Distribution
𝛽(3, 3) =
𝛽(3, 3) =
𝛽(1, 3) =
Many Variants of Bandits
● Standard setting: Stochastic and stationary
● Drifting: Reward values change over time
● Adversarial: No assumptions on how rewards are generated
● Continuous action space
● Infinite set of actions
● Varying set of actions over time
● ...
What about personalization?
Contextual Bandits
● Let’s make this harder!
● Slot machines where payout depends on
context
● E.g. time of day, blinking light on slot
machine, ...
Contextual Bandit
Learner
(Policy)
Environment
Action
Reward
Context
Each round:
● Environment provides context (feature) vector
● Learner chooses an action for context
● Environment provides a real-valued reward for action in context
● Learner updates to maximize the cumulative reward
Supervised Learning Contextual Bandits
Input: Features (x∊ℝd
)
Output: Predicted label
Feedback: Actual label (y)
Input: Context (x∊ℝd
)
Output: Action (a = 𝜋(x))
Feedback: Reward (r∊ℝ)
Supervised Learning Contextual Bandits
Example Chihuahua images from ImageNet
Cat Dog Cat 0
Dog ✓ Seal 0
???
Reward
Dog
Label
Dog Dog Fox 0✓
Artwork Personalization
as Contextual Bandit
● Context: Member, device, page, etc.
Artwork Selector
▶
Choose
Epsilon Greedy Example
𝝐 1-𝝐
Personalized
Image
Image
At Random
● Learn a supervised regression model per image to predict reward
● Pick image with highest predicted reward
Member
(context)
Features
Image Pool
Model 1
Winner
Model 2
Model 3
Model 4
arg
max
Greedy Policy Example
● Linear model to calculate uncertainty in reward estimate
● Choose image with highest 𝛂-percentile predicted reward value
Member
(context)
Features
Image Pool
Model 1
Winner
Model 2
Model 3
Model 4
arg
max
LinUCB Example
Lin et al., 2010
● Learn distribution over model parameters (e.g. Bayesian Regression)
● Sample a model, evaluate features, take arg max
Thompson Sampling Example
Member
(context)
Features
Image Pool
Sample 1
Winner
Sample 2
Sample 3
Sample 4
arg
max
Chappelle & Li, 2011
Model 1
Model 2
Model 3
Model 4
Offline Metric: Replay
Offline Take Fraction: 2/3
Logged
Actions
Model
Assignments
▶ ▶
Li et al., 2011
Replay
● Pros
○ Unbiased metric when using logged probabilities
○ Easy to compute
○ Rewards observed are real
● Cons
○ Requires a lot of data
○ High variance due if few matches
■ Techniques like Doubly-Robust estimation (Dudik, Langford
& Li, 2011) can help
Offline Replay Results
● Bandit finds good images
● Personalization is better
● Artwork variety matters
● Personalization wiggles
around best images
Lift in Replay in the various algorithms as
compared to the Random baseline
Bandits in the Real
World
● Getting started
○ Need data to learn
○ Warm-starting via batch learning from existing data
● Closing the feedback loop
○ Only exposing bandit to its own output
● Algorithm performance depends data volume
○ Need to be able to test bandits at large scale, head-to-head
A/B testing Bandit Algorithms
Starting the Loop
Explore User
Action
Context
Join
Logging
Reward
Data
Store
Model User
Update
Action
Context
Join
Logging
Reward
Data
Store
Incremental
Publish
Train Batch
Publish
Completing the Loop
● Need to serve an image for any title in the catalog
○ Calls from homepage, search, galleries, etc.
○ > 20M RPS at peak
● Existing UI code written assuming image lookup is fast
○ In memory map of video ID to URL
○ Want to insert Machine Learned model
○ Don’t want a big rewrite across all UI code
Scale Challenges
Live Compute Online Precompute
Synchronous computation
to choose image for title in
response to a member
request
Asynchronous computation
to choose image for title
before request and stored in
cache
Live Compute Online Precompute
Pros:
● Access to most fresh data
● Knowledge of full context
● Compute only what is necessary
Cons:
● Strict Service Level Agreements
○ Must respond quickly in all
cases
○ Requires high availability
● Restricted to simple algorithms
Pros:
● Can handle large data
● Can run moderate complexity
algorithms
● Can average computational
cost across users
● Change from actions
Cons:
● Has some delay
● Done in event context
● Extra compute for users and
items not served
See techblog for more details
System Architecture
Edge
Personalized
Image
Precompute
EV Cache
Precompute
logs
ETL
(aggregate
data)
Model
training
Bandit
model
UI
image
request
Play and
Impression
logs
Precompute & Image Lookup
● Precompute
○ Run bandit for each title on each profile to
choose personalized image
○ Store the title to image mapping in EVCache
● Image Lookup
○ Pull profile’s image mapping from EVCache
once per request
Logging & Reward
● Precompute Logging
○ Selected image
○ Exploration Probability
○ Candidate pool
○ Snapshot facts for feature generation
● Reward Logging
○ Image rendered in UI & if played
○ Precompute ID
Image via YouTube
Feature Generation & Training
● Join rewards with snapshotted facts
● Generate features using DeLorean
○ Feature encoders are shared online and offline
● Train the model using Spark
● Publish model to production
DeLorean image by JMortonPhoto.com & OtoGodfrey.com
Track the quality of the model
- Compare prediction to actual behavior
- Online equivalents of offline metrics
Reserve a fraction of data for a simple
policy (e.g. 𝝐-greedy) to sanity check
bandits
Monitoring and Resiliency
● Missing images greatly degrade the member experience
● Try to serve the best image possible
Graceful Degradation
Personalized
Selection
Unpersonalized
Fallback
Default Image
(when all else fails)
Does it work?
Online results
● A/B test: It works!
● Rolled out to our >130M member base
● Most beneficial for lesser known titles
● Competition between titles for
attention leads to compression of
offline metrics More details in our blog post
Future Work
More dimensions to personalize
Rows
Trailer
Evidence
Synopsis
Image
Row Title
Metadata
Ranking
Automatic image selection
● Generating new artwork is costly and time consuming
● Can we predict performance from raw image?
Artwork selection orchestration
● Neighboring image selection influences result
Row A
(microphones)
Example: Stand-up comedy
Row B
(more variety)
Long-term Reward: Road to
Reinforcement Learning
● RL involves multiple actions and delayed reward
● Useful to maximize user long-term joy?
Thank you
@JustinBasilico

Contenu connexe

Tendances

Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Fernando Amat
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixJaya Kawale
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixJustin Basilico
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixGrace T. Huang
 
Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Faisal Siddiqi
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsYves Raimond
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...Sudeep Das, Ph.D.
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsJustin Basilico
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectiveJustin Basilico
 
Recommending for the World
Recommending for the WorldRecommending for the World
Recommending for the WorldYves Raimond
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableJustin Basilico
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Anoop Deoras
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated RecommendationsHarald Steck
 
Exploration and diversity in recommender systems
Exploration and diversity in recommender systemsExploration and diversity in recommender systems
Exploration and diversity in recommender systemsJaya Kawale
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorialAlexandros Karatzoglou
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized HomepageJustin Basilico
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningAnoop Deoras
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Sudeep Das, Ph.D.
 
Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemAnoop Deoras
 

Tendances (20)

Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
 
Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
 
Recommending for the World
Recommending for the WorldRecommending for the World
Recommending for the World
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
 
Exploration and diversity in recommender systems
Exploration and diversity in recommender systemsExploration and diversity in recommender systems
Exploration and diversity in recommender systems
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep Learning
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender System
 

Similaire à Artwork Personalization at Netflix

Artworks personalization on Netflix
Artworks personalization on Netflix Artworks personalization on Netflix
Artworks personalization on Netflix IntoTheMinds
 
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017 Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017 MLconf
 
Correlation, causation and incrementally recommendation problems at netflix ...
Correlation, causation and incrementally  recommendation problems at netflix ...Correlation, causation and incrementally  recommendation problems at netflix ...
Correlation, causation and incrementally recommendation problems at netflix ...Roelof van Zwol
 
Scott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SFScott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SFMLconf
 
Making smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement LearningMaking smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement LearningRuth Yakubu
 
Optimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEOptimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEYelp Engineering
 
Multi-Armed Bandit: an algorithmic perspective
Multi-Armed Bandit: an algorithmic perspectiveMulti-Armed Bandit: an algorithmic perspective
Multi-Armed Bandit: an algorithmic perspectiveGabriele Sottocornola
 
Sprezzatura - Roelof van Zwol - May 2018
Sprezzatura  - Roelof van Zwol - May 2018Sprezzatura  - Roelof van Zwol - May 2018
Sprezzatura - Roelof van Zwol - May 2018Roelof van Zwol
 
Story points considered harmful - or why the future of estimation is really i...
Story points considered harmful - or why the future of estimation is really i...Story points considered harmful - or why the future of estimation is really i...
Story points considered harmful - or why the future of estimation is really i...Vasco Duarte
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectiveXavier Amatriain
 
Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019Faisal Siddiqi
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learningBig Data Colombia
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ FyberDaniel Hen
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning SystemsXavier Amatriain
 
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsBIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsXavier Amatriain
 
Engine90 crawford-decision-making (1)
Engine90 crawford-decision-making (1)Engine90 crawford-decision-making (1)
Engine90 crawford-decision-making (1)Divyansh Dokania
 
[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies台灣資料科學年會
 

Similaire à Artwork Personalization at Netflix (20)

Recommender systems
Recommender systems Recommender systems
Recommender systems
 
Artworks personalization on Netflix
Artworks personalization on Netflix Artworks personalization on Netflix
Artworks personalization on Netflix
 
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017 Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
 
Correlation, causation and incrementally recommendation problems at netflix ...
Correlation, causation and incrementally  recommendation problems at netflix ...Correlation, causation and incrementally  recommendation problems at netflix ...
Correlation, causation and incrementally recommendation problems at netflix ...
 
Scott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SFScott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SF
 
Deep recommendations in PyTorch
Deep recommendations in PyTorchDeep recommendations in PyTorch
Deep recommendations in PyTorch
 
Making smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement LearningMaking smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement Learning
 
Optimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEOptimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOE
 
Multi-Armed Bandit: an algorithmic perspective
Multi-Armed Bandit: an algorithmic perspectiveMulti-Armed Bandit: an algorithmic perspective
Multi-Armed Bandit: an algorithmic perspective
 
Sprezzatura - Roelof van Zwol - May 2018
Sprezzatura  - Roelof van Zwol - May 2018Sprezzatura  - Roelof van Zwol - May 2018
Sprezzatura - Roelof van Zwol - May 2018
 
Story points considered harmful - or why the future of estimation is really i...
Story points considered harmful - or why the future of estimation is really i...Story points considered harmful - or why the future of estimation is really i...
Story points considered harmful - or why the future of estimation is really i...
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ Fyber
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsBIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
 
Engine90 crawford-decision-making (1)
Engine90 crawford-decision-making (1)Engine90 crawford-decision-making (1)
Engine90 crawford-decision-making (1)
 
[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies
 

Plus de Justin Basilico

Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...Justin Basilico
 
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...Justin Basilico
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareJustin Basilico
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareJustin Basilico
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixJustin Basilico
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix ScaleJustin Basilico
 

Plus de Justin Basilico (7)

Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
 
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
Learning to Personalize
Learning to PersonalizeLearning to Personalize
Learning to Personalize
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix Scale
 

Dernier

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Dernier (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Artwork Personalization at Netflix

  • 1. Artwork Personalization at Netflix Justin Basilico QCon SF 2018 2018-11-05 @JustinBasilico
  • 2.
  • 4. A good image is... 1. Representative 2. Informative 3. Engaging 4. Differential
  • 5. A good image is... 1. Representative 2. Informative 3. Engaging 4. Differential Personal
  • 8. Choose artwork so that members understand if they will likely enjoy a title to maximize satisfaction and retention
  • 10. Everything is a Recommendation Over 80% of what people watch comes from our recommendations Rankings Rows
  • 11. Attribution Pick only one Was it the recommendation or artwork? Or both? ▶
  • 12. Change Effects Which one caused the play? Is change confusing? Day 1 Day 2 ▶
  • 13. ● Creatives select the images that are available ● But algorithms must be still robust Adding meaning and avoiding clickbait
  • 14. Scale Over 20M RPS for images at peak
  • 15. Traditional Recommendations Collaborative Filtering: Recommend items that similar users have chosen 0 1 0 1 0 0 0 1 1 0 1 0 0 1 1 0 1 0 0 0 0 0 0 0 1 Users Items Members can only play images we choose
  • 20. Multi-Armed Bandits (MAB) ● Multiple slot machines with unknown reward distribution ● A gambler can play one arm at a time ● Which machine to play to maximize reward?
  • 21. Bandit Algorithms Setting Each round: ● Learner chooses an action ● Environment provides a real-valued reward for action ● Learner updates to maximize the cumulative reward Learner (Policy) Environment Action Reward
  • 22. Artwork Optimization as Bandit ● Environment: Netflix homepage ● Learner: Artwork selector for a show ● Action: Display specific image for show ● Reward: Member has positive engagement Artwork Selector ▶
  • 23. ● What images should creatives provide? ○ Variety of image designs ○ Thematic and visual differences ● How many images? ○ Creating each image has a cost ○ Diminishing returns Images as Actions
  • 24. ● What is a good outcome? ✓ Watching and enjoying the content ● What is a bad outcome? ✖ No engagement ✖ Abandoning or not enjoying the content Designing Rewards
  • 25. Metric: Take Fraction ▶ Example: Altered Carbon Take Fraction: 1/3
  • 26. Minimizing Regret ● What is the best that a bandit can do? ○ Always choose optimal action ● Regret: Difference between optimal action and chosen action ● To maximize reward, minimize the cumulative regret
  • 27. Bandit Example 1 0 1 0 ? 0 0 ? 0 1 0 ? Historical rewardsActions
  • 28. Bandit Example 1 0 1 0 ? 0 0 ? 0 1 0 ? Historical rewardsActions Choose image
  • 29. Bandit Example 1 0 1 0 ? 0 0 ? 0 1 0 ? Historical rewardsActions 2/4 0/2 1/3 Observed Take Fraction Overall: 3/9
  • 30. Strategy Show current best image Try another image to learn if it is actually better ExplorationMaximization vs.
  • 31. Principles of Exploration ● Gather information to make the best overall decision in the long-run ● Best long-term strategy may involve short-term sacrifices
  • 32. Common strategies 1. Naive Exploration 2. Optimism in the Face of Uncertainty 3. Probability Matching
  • 33. Naive Exploration: 𝝐-greedy ● Idea: Add a noise to the greedy policy ● Algorithm: ○ With probability 𝝐 ■ Choose one action uniformly at random ○ Otherwise ■ Choose the action with the best reward so far ● Pros: Simple ● Cons: Regret is unbounded
  • 34. Epsilon-Greedy Example 1 0 1 0 ? 0 0 ? 0 1 0 ? 2/4 (greedy) 0/2 1/3 Observed Reward
  • 35. Epsilon-Greedy Example 1 0 1 0 ? 0 0 ? 0 1 0 ? 𝝐 / 3 𝝐 / 3 1 - 2𝝐 / 3
  • 36. Epsilon-Greedy Example 1 0 1 0 ? 0 0 ? 0 1 0 ?
  • 37. Epsilon-Greedy Example 1 0 1 0 0 0 0 0 1 0 2/4 (greedy) 0/3 1/3 Observed Reward
  • 38. Optimism: Upper Confidence Bound (UCB) ● Idea: Prefer actions with uncertain values ● Approach: ○ Compute confidence interval of observed rewards for each action ○ Choose action a with the highest 𝛂-percentile ○ Observe reward and update confidence interval for a ● Pros: Theoretical regret minimization properties ● Cons: Needs to update quickly from observed rewards
  • 39. Beta-Bernoulli Distribution Image from Wikipedia Beta Bernoulli Pr(1) = p Pr(0) = 1 - p Prior
  • 40. Bandit Example with Beta-Bernoulli 2/4 0/2 1/3 Observed Take Fraction Prior: 𝛽(1, 1) 𝛽(3, 3) 𝛽(2, 3) 𝛽(1, 3)+ = A B C
  • 41. Bayesian UCB Example 1 0 1 1 ? 0 0 ? 0 1 0 ? [0.15, 0.85] [0.07, 0.81] Reward 95% Confidence [0.01, 0.71]
  • 42. Bayesian UCB Example 1 0 1 1 ? 0 0 ? 0 1 0 ? [0.15, 0.85] [0.07, 0.81] Reward 95% Confidence [0.01, 0.71]
  • 43. Bayesian UCB Example 1 0 1 1 0 0 0 0 1 0 [0.12, 0.78] [0.07, 0.81] Reward 95% Confidence [0.01, 0.71]
  • 44. Bayesian UCB Example 1 0 1 1 0 0 0 0 1 0 [0.12, 0.78] [0.07, 0.81] Reward 95% Confidence [0.01, 0.71]
  • 45. Probabilistic: Thompson Sampling ● Idea: Select the actions by the probability they are the best ● Approach: ○ Keep a distribution over model parameters for each action ○ Sample estimated reward value for each action ○ Choose action a with maximum sampled value ○ Observe reward for action a and update its parameter distribution ● Pros: Randomness continues to explore without update ● Cons: Hard to compute probabilities of actions
  • 46. Thompson Sampling Example 1 0 1 0 ? 0 0 ? 0 1 0 ? 𝛽(3, 3) = 𝛽(2, 3) = Distribution 𝛽(1, 3) =
  • 47. Thompson Sampling Example 1 0 1 0 ? 0 0 ? 0 1 0 ? 0.38 0.59 Sampled values 0.18
  • 48. Thompson Sampling Example 1 0 1 0 ? 0 0 ? 0 1 0 ? 0.38 0.59 Sampled values 0.18
  • 49. Thompson Sampling Example 1 0 1 0 0 0 0 1 0 1 Distribution 𝛽(3, 3) = 𝛽(3, 3) = 𝛽(1, 3) =
  • 50. Many Variants of Bandits ● Standard setting: Stochastic and stationary ● Drifting: Reward values change over time ● Adversarial: No assumptions on how rewards are generated ● Continuous action space ● Infinite set of actions ● Varying set of actions over time ● ...
  • 52. Contextual Bandits ● Let’s make this harder! ● Slot machines where payout depends on context ● E.g. time of day, blinking light on slot machine, ...
  • 53. Contextual Bandit Learner (Policy) Environment Action Reward Context Each round: ● Environment provides context (feature) vector ● Learner chooses an action for context ● Environment provides a real-valued reward for action in context ● Learner updates to maximize the cumulative reward
  • 54. Supervised Learning Contextual Bandits Input: Features (x∊ℝd ) Output: Predicted label Feedback: Actual label (y) Input: Context (x∊ℝd ) Output: Action (a = 𝜋(x)) Feedback: Reward (r∊ℝ)
  • 55. Supervised Learning Contextual Bandits Example Chihuahua images from ImageNet Cat Dog Cat 0 Dog ✓ Seal 0 ??? Reward Dog Label Dog Dog Fox 0✓
  • 56. Artwork Personalization as Contextual Bandit ● Context: Member, device, page, etc. Artwork Selector ▶
  • 57. Choose Epsilon Greedy Example 𝝐 1-𝝐 Personalized Image Image At Random
  • 58. ● Learn a supervised regression model per image to predict reward ● Pick image with highest predicted reward Member (context) Features Image Pool Model 1 Winner Model 2 Model 3 Model 4 arg max Greedy Policy Example
  • 59. ● Linear model to calculate uncertainty in reward estimate ● Choose image with highest 𝛂-percentile predicted reward value Member (context) Features Image Pool Model 1 Winner Model 2 Model 3 Model 4 arg max LinUCB Example Lin et al., 2010
  • 60. ● Learn distribution over model parameters (e.g. Bayesian Regression) ● Sample a model, evaluate features, take arg max Thompson Sampling Example Member (context) Features Image Pool Sample 1 Winner Sample 2 Sample 3 Sample 4 arg max Chappelle & Li, 2011 Model 1 Model 2 Model 3 Model 4
  • 61. Offline Metric: Replay Offline Take Fraction: 2/3 Logged Actions Model Assignments ▶ ▶ Li et al., 2011
  • 62. Replay ● Pros ○ Unbiased metric when using logged probabilities ○ Easy to compute ○ Rewards observed are real ● Cons ○ Requires a lot of data ○ High variance due if few matches ■ Techniques like Doubly-Robust estimation (Dudik, Langford & Li, 2011) can help
  • 63. Offline Replay Results ● Bandit finds good images ● Personalization is better ● Artwork variety matters ● Personalization wiggles around best images Lift in Replay in the various algorithms as compared to the Random baseline
  • 64. Bandits in the Real World
  • 65. ● Getting started ○ Need data to learn ○ Warm-starting via batch learning from existing data ● Closing the feedback loop ○ Only exposing bandit to its own output ● Algorithm performance depends data volume ○ Need to be able to test bandits at large scale, head-to-head A/B testing Bandit Algorithms
  • 66. Starting the Loop Explore User Action Context Join Logging Reward Data Store Model User Update Action Context Join Logging Reward Data Store Incremental Publish Train Batch Publish Completing the Loop
  • 67. ● Need to serve an image for any title in the catalog ○ Calls from homepage, search, galleries, etc. ○ > 20M RPS at peak ● Existing UI code written assuming image lookup is fast ○ In memory map of video ID to URL ○ Want to insert Machine Learned model ○ Don’t want a big rewrite across all UI code Scale Challenges
  • 68. Live Compute Online Precompute Synchronous computation to choose image for title in response to a member request Asynchronous computation to choose image for title before request and stored in cache
  • 69. Live Compute Online Precompute Pros: ● Access to most fresh data ● Knowledge of full context ● Compute only what is necessary Cons: ● Strict Service Level Agreements ○ Must respond quickly in all cases ○ Requires high availability ● Restricted to simple algorithms Pros: ● Can handle large data ● Can run moderate complexity algorithms ● Can average computational cost across users ● Change from actions Cons: ● Has some delay ● Done in event context ● Extra compute for users and items not served See techblog for more details
  • 71. Precompute & Image Lookup ● Precompute ○ Run bandit for each title on each profile to choose personalized image ○ Store the title to image mapping in EVCache ● Image Lookup ○ Pull profile’s image mapping from EVCache once per request
  • 72. Logging & Reward ● Precompute Logging ○ Selected image ○ Exploration Probability ○ Candidate pool ○ Snapshot facts for feature generation ● Reward Logging ○ Image rendered in UI & if played ○ Precompute ID Image via YouTube
  • 73. Feature Generation & Training ● Join rewards with snapshotted facts ● Generate features using DeLorean ○ Feature encoders are shared online and offline ● Train the model using Spark ● Publish model to production DeLorean image by JMortonPhoto.com & OtoGodfrey.com
  • 74. Track the quality of the model - Compare prediction to actual behavior - Online equivalents of offline metrics Reserve a fraction of data for a simple policy (e.g. 𝝐-greedy) to sanity check bandits Monitoring and Resiliency
  • 75. ● Missing images greatly degrade the member experience ● Try to serve the best image possible Graceful Degradation Personalized Selection Unpersonalized Fallback Default Image (when all else fails)
  • 77. Online results ● A/B test: It works! ● Rolled out to our >130M member base ● Most beneficial for lesser known titles ● Competition between titles for attention leads to compression of offline metrics More details in our blog post
  • 79. More dimensions to personalize Rows Trailer Evidence Synopsis Image Row Title Metadata Ranking
  • 80. Automatic image selection ● Generating new artwork is costly and time consuming ● Can we predict performance from raw image?
  • 81. Artwork selection orchestration ● Neighboring image selection influences result Row A (microphones) Example: Stand-up comedy Row B (more variety)
  • 82. Long-term Reward: Road to Reinforcement Learning ● RL involves multiple actions and delayed reward ● Useful to maximize user long-term joy?