SlideShare une entreprise Scribd logo
1  sur  26
EE想说爱你不容易
黄晶
2015.01
主要内容
 问题描述
 求解方法
E-Greedy
UCB
Thompson-Sampling
 问题变种-contextual E&E
 应用
当E&E遇到LR
Exploration & Exploitation
 什么是E & E?
 问题起源:老虎机
N个老虎机
每个老虎机概率为pi的产生收益
目标:最大化整个过程中的收益
 关键点:不知道p1,p2,…,pn,只能通过每次的结果来估算
pi,探测的过程中是有损失的
Exploration:探测新的--这是有成本的
Exploitation:学习历史观测结果,来推断pi
Regret: 最大收益*实验次数-实验收益
choosing best arm(result) for all users
p1 p2 pn
应用
 广告展现问题
有1个展现位置,N个广告(老虎机)
每个广告以概率pi被点击,从而获得收益1(统一为1,简化问题)
目标:怎样的展现策略使得整体收益最大?
求解方法
 最土豪的方法
对每个广告,让其不停的展现,直到可以后验统计出一个靠谱的点
击概率—这是最大力度的exploration,最小力度的Exploitation
具体来说,我们发现【推导:附录1】
• 点击率为0.1的广告需要展现13829次,才可以使得后验点击率与真
实点击率误差在5%以内
如果:广告A后验已经收敛到10%,广告B展现了100次,点击了0
次,那么,真的还需要再展现B,直到点击率收敛么?
简单算下,广告B的点击率会比A高的概率为(1-0.1)^100=10^(-5)
求解方法
 E-greedy
每次以1-e的概率选后验点击率高的,e的概率随机选其他的
Exploration力度:e
Exploitation力度:1-e
性质:由于e是固定的,即使实验了很多步,仍然会以固定
概率去抛弃当前最好的,Explore其他的
e=0.2
e=0.1
e=0.01
e=0.2
e=0.01
e=0.1
Explore概率太低,开始很难跳出
Explore概率太高,即使收敛了,仍然总
有0.2的概率选择差的
E=0.2
Explore概率很高,非常明显的线性
regret
E=0.01
开始Explore效率很低,一直探测不到最
好的,N很大以后,由于explore概率低
Regret会比较小
E=0.1
N<10000,看起来是最好的,但是N再增
加,regret会超越红色的线
求解方法
 E-greedy数值实验性质
e越大,启动阶段的regret越小,收敛速度越快
e越大,到N大于一定步数,regret斜率越高
总结:大e起始效率高,小e结束效率高
改进:e随着N的增加而降低,可以把regret控制到O(N)
e=0.2
e=0.1
e=0.01
e=0.2
e=0.01
e=0.1
Explore概率太低,开始很难跳出
Explore概率太高,即使收敛了,仍然总
有0.2的概率选择差的
E=0.2
Explore概率很高,非常明显的线性
regret
E=0.01
开始Explore效率很低,一直探测不到最
好的,N很大以后,由于explore概率低
Regret会比较小
E=0.1
N<10000,看起来是最好的,但是N再增
加,regret会超越红色的线
求解方法
 UCB
思想:后验总是不准的,但是随着实验次数增多,后验的误差范
围越来越小,如果能给后验误差一个合理估计,就能控制风险
核心:怎么估计后验误差?
估算p的上界,保证p大于上界的概率小于e
P的上界,ni越大,范围越小
P的上界估计想让误差随n增大而减小,于是变成左边的样子??
有更为精确的估计,但是大同
小异,regret基本都是logN
求解方法
 UCB数值实验性质
实验次数
P值
黑色短线:后验观察值
黑色长线:上界值
实验次数最
小,所以上
界值范围越
大彩色线:真实值
实验一定次数后,三个arm的性质比较
实验步数
每个广告的实验次数
绿色:0.5
红色:0.3
蓝色:0.1
黑色:0.2
绿色部分绝对主导,但是上界区间其实还挺大,且收敛速度越来越慢
N=100 N=5000 N=7500
求解方法
 UCB数值实验性质
最优广告选择概率 Regret VS 38logN
求解方法
 Thompson Sampling
思想:贝叶斯观点,认为先验参数p服从一个beta分布,通过后
验的观测,不断对这个beta分布做修正,在explore的时候,对
这个beta分布做随机抽样
Why beta分布?因为后验的观测是伯努利分布,先验取伯努利分
布的共轭分布,可以保证后续分布一直是该分布
推导过程:
伯努利分布的统一写法,x=1为theta,x=0为1-theta
可以认为alpha基本代表成功次数,beta代表不成功次数
求解方法
 Thompson Sampling
Beta分布的性质
1.点击率等比率情况下,实验次数越多,beta分
布范围越窄
Beta(10,30) vs beta(2,6)
Beta(20,20) vs beta(4,4)
2.Beta分布的众数基本等于点击率
Beta(10,30)的中心点=10/(10+30)
Beta(20,20)的中心点=20/(20+20)
Ps:本问题中,小于1的beta分布基本不用关注
1.既然alpha和beta基本代表实验成功和实验失
败的次数,因此在初始选参数的时候,可以根据
经验选择alpha和beta的取值,使得更接近点击
率
2.固定比值以后,beta的选择决定了后续观测值
想修改这个分布的难度,beta越大,表明当前的
分布是依赖于很多次实验获得,后面必须有充分
的数据才能大规模修改分布
求解方法
 Thompson Sampling
数值实验性质:不确定!
一样的参数,三次实
验,表现差异很大
方法比较:思想
 UCB & Thompson方法的思想差异
思想对比:概率学派 VS 贝叶斯学派
操作方法对比:给定估计的分布后,一个取最大值,一个对分布
抽样
理论收敛性:有收敛理论 VS 无收敛理论
UCB:选上界最高的B
Thompson:都有一定
概率
方法比较:收敛性
 e-greedy & UCB & Thompson E-greedy
UCB
Thompson
选择最好广告的概率 regret
问题的变种-contextual
 回顾:问题起源:老虎机
N个老虎机
每个老虎机概率为pi的产生收益
目标:最大化整个过程中的收益
 假设:Pay与环境无关
 Contexutal:
Pay与环境相关,环境+广告由一堆feature表示
feature对老虎机pay的影响与时间无关,只与老虎机相关,即对
每个老虎机做了个线性model
choosing best arm(result) for all users
p1 p2 pn
问题的变种-contextual
 Contextual:Pay与环境相关,不再是常数-难!
E-greedy->epoch-greedy O(T^2/3)
UCB->freq method
• 一般需要加一些更强的假设
• 如linear assumption下的linUCB
Thompson-sampling->bayes prior
• such as Gittins index method
• 需要offline对prior做很好的估计
• 一般计算代价大,且没有近似方法
问题的变种-contextual
 LinUCB假设:
Pay与环境相关,环境+广告用一堆feature表示
feature对老虎机pay的影响与时间无关,只与老虎机相关,即对每个
老虎机做了个线性model
 数学表示:
 求解方法:
不同的Ut,只决定了Xt,a的
取值的多少,不影响参数
实操感觉很蛋疼,每个广告样本自积累数据计算参数
如果有较多的数据,为什么不直接用LR呢?还能积累泛化特征
问题的变种-contextual
 LinUCB假设:
Pay与环境相关,环境+广告用一堆feature表示
feature对老虎机pay的影响与时间无关,只与老虎机相关,即对每个
老虎机做了个线性model
 数学表示:
 求解方法:
不同的Ut,只决定了Xt,a的
取值的多少,不影响参数
Improve:arm之间是有关联的
Beta与arm无关,是共享的
注意若theta退化为空,就是个线性
model
实操需要注意的问题
 框架对反馈的时效性要求很大
每次的参数都依赖于上一次所有的点击和展现,因此点击必须实
时传回e&e系统
实际操作一般会是每隔一段时间批量更新,会有效率的损失
 随机带来的排序不稳定
每次排序都是不稳定的,因此刷新会带来排序的完全变更,用户
体验不好
一般都会按照cookieid,一段时间再更新,这又会有部分效率的
损失
当E&E遇上LR
 冷启动
 E&E+LR VS LR
当E&E遇上LR
 实际经验表明:
自然搜索时,在没有用到LR的情况下,上explore效果好
广告搜索在没有LR的情况下,上explore效果好,但是好不过直
接后验(即只是exploit在起作用)
广告搜索在有LR的情况下,怎么试都很难超越LR,且探测的新广
告能很快被LR吸收—LR吸收效率比E&E高
 为什么?分场景分析
场景一:冷启动
场景二:LR对某些case存在bias
场景一:冷启动
 假象:系统刚启动的时候,LR数据不够
广告库主体变化不大(VS新闻系统)
初期数据不够,有overfitting对广告系统不是坏事-持续好的pay
在我们的应用场景,一天的数据足够训练auc不错的LR,该LR通
过泛化特征做explore
 E&E自身局限性:
前提:均值的估计
• 粒度太小:样本收敛慢
• 粒度太大:均值不稳定
• 广告系统环境因素太多,粒度小到query-广告都不够
场景二:LR存在bias,某些新广告出不来
 撞大运:很难在线上收到预期效果
前提:大量好的广告在泛化特征上打分很低
即使前提成立:
• 直接无目标的explore,很难保证正好撞上
• 如果仔细分析,并发现这些case,似乎不如研究泛化特征怎么改进
 EE的学习效率比LR低太多了
探测出来的样本能很快被LR学去
原因:
• E&E是对单样本做的学习,样本与样本之间没有关联
• LR是对样本的特征做的学习,一个样本会同时带来多个特征的更新
附录一
 已知广告点击概率为p,需要展现多少次,才能使得后验点击率
与p的误差小于5%
 广告的点击率抽样分布的均值为p,方差为sqrt(p(1-p)/N)
 如果考虑3倍的置信度:
 (p + 1.96*sqrt(p(1-p)/N) – p)/p < 5%
 N=1536*(1-p)/p
附录二:参考资料
 理解共轭分布
http://wenku.baidu.com/view/a542dbf2770bf78a6529546a.html?st=1
 Beta分布性质
http://cos.name/2013/01/lda-math-beta-dirichlet/
 Introduction to Bandits: Algorithms and theory
ICML 2011 Jean-Yves Audibert & Remi Munos
 A Contextual-Bandit Approach to Personalized News Article
Recommendation
WWW 2010 Lihong Li

Contenu connexe

En vedette

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

En vedette (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Ee 想说爱你不容易

Notes de l'éditeur