SlideShare une entreprise Scribd logo
1  sur  19
Recap: Designing a more Efficient Estimator for
Off-policy Evaluation in Bandits with Large Action
Spaces
Ajinkya More, Linas Baltrunas, Nikos Vlassis, Justin Basilico
REVEAL Workshop at RecSys 2019
Copenhagen, Denmark, Sept 20, 2019
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 1 / 19
Introduction
Contextual bandits are a common modeling tool in modern
personalization systems (e.g. artwork personalization at Netflix,
playlist ordering on Spotify, ad ranking in web search)
Evaluating bandit algorithms via online A/B testing is expensive
Off-policy evaluation is a preferred alternative
Goal: A sensitive, efficient estimator that aligns with online metrics
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 2 / 19
Challenge
Several metrics proposed in literature have limitations in personalization
settings:
Replay, IPS, SNIPS - High variance
Few matches with many arms
Requires very large amounts of data
Direct Method, Doubly Robust - Require a reward model
This model is used to evaluate other models
Introduces modeling bias
E.g. what features to use in the reward model?
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 3 / 19
Overview
This paper:
Introduce a new off-policy evaluation metric: Recap
Trades off bias for significantly lowering variance in off-policy evaluation
Model free - No need to define reward model
Study properties via simulated data
Investigate alignment with online metric
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 4 / 19
Notation
Let A = {1, 2, ..., K} be the set of actions.
In trial t ∈ {1, 2, ..., n}, the environment chooses a context xt and in
response the agent chooses an arm at ∈ A.
The environment then reveals a reward ra,t ∈ [0, 1].
Denote the probability of the logging policy taking action a for
context x as π(x, a) and the logged action at trial t as aπ.
We want to estimate the expected reward E[ra] of a deterministic
target policy τ given data from a potentially different logging policy.
We assume the target policy is mediated by a scorer or a ranker, that
assigns a real valued score s(x, a) to each arm and selects the arm
argmaxa∈As(x, a).
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 5 / 19
IPS, SNIPS
Inverse Propensity Scoring (IPS):
ˆV τ
IPS =
Σn
t=0raπ
I(aτ =aπ)
π(xt ,aπ)
n
(1)
Self-Normalized Inverse Propensity Scoring (SNIPS):
ˆV τ
SNIPS =
Σn
t=0raπ
I(aτ =aπ)
π(xt ,aπ)
Σn
t=0
I(aτ =aπ)
π(xt ,aπ)
(2)
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 6 / 19
Recap
ˆV τ
Recap =
Σn
t=0raπ
RR
π(xt ,aπ)
Σn
t=0
RR
π(xt ,aπ)
(3)
where
RR =
1
Σa∈AI(s(x, a) ≥ s(x, aπ))
is simply the reciprocal rank of the logged action according to the scores
assigned by the target policy.
It assigns a“partial credit” when the logged action and target policy action
are different (which may be seen as “stochastifying” a deterministic
policy).
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 7 / 19
Recap(m)
More generally, for m ∈ N (or even m ∈ R+), we define,
ˆV τ
Recap(m) =
Σn
t=0raπ
RRm
π(xt ,aπ)
Σn
t=0
RRm
π(xt ,aπ)
(4)
ˆV τ
Recap(1) = ˆV τ
Recap
As m → ∞, ˆV τ
Recap(m) → ˆV τ
SNIPS
Thus, m allows us to smoothly trade off bias for variance.
Though we find m = 1 to work well in practice.
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 8 / 19
Simulations
Set up
1 The rewards for arm a are drawn from a Bernoulli distribution with
parameter θa = 1/a
2 The arm scores s(x, a) are drawn from a normal distribution N(µ, σ)
where σ = 0.1 and µ is drawn from a uniform distribution with
support [0, 0.2].
3 In each trial, the logging policy samples the action from a
multinomial distribution over the arms.
4 The target policy is argmaxa∈As(x, a).
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 9 / 19
Behavior in small action spaces
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 10 / 19
Behavior in large action spaces
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 11 / 19
Variance: Small number of arms
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 12 / 19
Variance: Medium number of arms
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 13 / 19
Variance: Large number of arms
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 14 / 19
Risk
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 15 / 19
Online–Offline correlation
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 16 / 19
Summary
Recap: An off-policy estimator for bandits based on MRR
Trades off a small increase in bias (compared with IPS) for
significantly reduced variance compared to both IPS and SNIPS
Recap uses all the available data making it more efficient with small
logged data or large numbers of arms
Doesn’t require a reward model as compared to Direct Method and
Doubly Robust
Signs of alignment with online metrics
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 17 / 19
Ongoing Work
Analyze some theoretical properties of the estimator
Bias/Variance/Risk of the estimator
Closed form expressions
Asymptotic properties
Is the estimator consistent?
Deviation bounds
Training models to optimize Recap
Connection to Learning to Rank approaches
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 18 / 19
Thank you!
Questions?
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 19 / 19

Contenu connexe

Tendances

Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsJustin Basilico
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixJaya Kawale
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Sudeep Das, Ph.D.
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsJustin Basilico
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsYves Raimond
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...Sudeep Das, Ph.D.
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixJustin Basilico
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsJaya Kawale
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender modelsParmeshwar Khurd
 
Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemAnoop Deoras
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorialAlexandros Karatzoglou
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveJustin Basilico
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixJustin Basilico
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableJustin Basilico
 
Exploration and diversity in recommender systems
Exploration and diversity in recommender systemsExploration and diversity in recommender systems
Exploration and diversity in recommender systemsJaya Kawale
 
Recommending for the World
Recommending for the WorldRecommending for the World
Recommending for the WorldYves Raimond
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated RecommendationsHarald Steck
 

Tendances (20)

Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
 
Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender System
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
 
User behavior analytics
User behavior analyticsUser behavior analytics
User behavior analytics
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
Exploration and diversity in recommender systems
Exploration and diversity in recommender systemsExploration and diversity in recommender systems
Exploration and diversity in recommender systems
 
Recommending for the World
Recommending for the WorldRecommending for the World
Recommending for the World
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
 

Plus de Justin Basilico

Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...Justin Basilico
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareJustin Basilico
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareJustin Basilico
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized HomepageJustin Basilico
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix ScaleJustin Basilico
 

Plus de Justin Basilico (6)

Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Learning to Personalize
Learning to PersonalizeLearning to Personalize
Learning to Personalize
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix Scale
 

Dernier

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Dernier (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Bandits with Large Action Spaces

  • 1. Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Bandits with Large Action Spaces Ajinkya More, Linas Baltrunas, Nikos Vlassis, Justin Basilico REVEAL Workshop at RecSys 2019 Copenhagen, Denmark, Sept 20, 2019 Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 1 / 19
  • 2. Introduction Contextual bandits are a common modeling tool in modern personalization systems (e.g. artwork personalization at Netflix, playlist ordering on Spotify, ad ranking in web search) Evaluating bandit algorithms via online A/B testing is expensive Off-policy evaluation is a preferred alternative Goal: A sensitive, efficient estimator that aligns with online metrics Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 2 / 19
  • 3. Challenge Several metrics proposed in literature have limitations in personalization settings: Replay, IPS, SNIPS - High variance Few matches with many arms Requires very large amounts of data Direct Method, Doubly Robust - Require a reward model This model is used to evaluate other models Introduces modeling bias E.g. what features to use in the reward model? Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 3 / 19
  • 4. Overview This paper: Introduce a new off-policy evaluation metric: Recap Trades off bias for significantly lowering variance in off-policy evaluation Model free - No need to define reward model Study properties via simulated data Investigate alignment with online metric Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 4 / 19
  • 5. Notation Let A = {1, 2, ..., K} be the set of actions. In trial t ∈ {1, 2, ..., n}, the environment chooses a context xt and in response the agent chooses an arm at ∈ A. The environment then reveals a reward ra,t ∈ [0, 1]. Denote the probability of the logging policy taking action a for context x as π(x, a) and the logged action at trial t as aπ. We want to estimate the expected reward E[ra] of a deterministic target policy τ given data from a potentially different logging policy. We assume the target policy is mediated by a scorer or a ranker, that assigns a real valued score s(x, a) to each arm and selects the arm argmaxa∈As(x, a). Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 5 / 19
  • 6. IPS, SNIPS Inverse Propensity Scoring (IPS): ˆV τ IPS = Σn t=0raπ I(aτ =aπ) π(xt ,aπ) n (1) Self-Normalized Inverse Propensity Scoring (SNIPS): ˆV τ SNIPS = Σn t=0raπ I(aτ =aπ) π(xt ,aπ) Σn t=0 I(aτ =aπ) π(xt ,aπ) (2) Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 6 / 19
  • 7. Recap ˆV τ Recap = Σn t=0raπ RR π(xt ,aπ) Σn t=0 RR π(xt ,aπ) (3) where RR = 1 Σa∈AI(s(x, a) ≥ s(x, aπ)) is simply the reciprocal rank of the logged action according to the scores assigned by the target policy. It assigns a“partial credit” when the logged action and target policy action are different (which may be seen as “stochastifying” a deterministic policy). Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 7 / 19
  • 8. Recap(m) More generally, for m ∈ N (or even m ∈ R+), we define, ˆV τ Recap(m) = Σn t=0raπ RRm π(xt ,aπ) Σn t=0 RRm π(xt ,aπ) (4) ˆV τ Recap(1) = ˆV τ Recap As m → ∞, ˆV τ Recap(m) → ˆV τ SNIPS Thus, m allows us to smoothly trade off bias for variance. Though we find m = 1 to work well in practice. Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 8 / 19
  • 9. Simulations Set up 1 The rewards for arm a are drawn from a Bernoulli distribution with parameter θa = 1/a 2 The arm scores s(x, a) are drawn from a normal distribution N(µ, σ) where σ = 0.1 and µ is drawn from a uniform distribution with support [0, 0.2]. 3 In each trial, the logging policy samples the action from a multinomial distribution over the arms. 4 The target policy is argmaxa∈As(x, a). Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 9 / 19
  • 10. Behavior in small action spaces Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 10 / 19
  • 11. Behavior in large action spaces Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 11 / 19
  • 12. Variance: Small number of arms Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 12 / 19
  • 13. Variance: Medium number of arms Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 13 / 19
  • 14. Variance: Large number of arms Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 14 / 19
  • 15. Risk Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 15 / 19
  • 16. Online–Offline correlation Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 16 / 19
  • 17. Summary Recap: An off-policy estimator for bandits based on MRR Trades off a small increase in bias (compared with IPS) for significantly reduced variance compared to both IPS and SNIPS Recap uses all the available data making it more efficient with small logged data or large numbers of arms Doesn’t require a reward model as compared to Direct Method and Doubly Robust Signs of alignment with online metrics Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 17 / 19
  • 18. Ongoing Work Analyze some theoretical properties of the estimator Bias/Variance/Risk of the estimator Closed form expressions Asymptotic properties Is the estimator consistent? Deviation bounds Training models to optimize Recap Connection to Learning to Rank approaches Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 18 / 19
  • 19. Thank you! Questions? Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 19 / 19