SlideShare une entreprise Scribd logo
1  sur  23
从梯度下降到LBFGS
黄晶
2015.01
主要内容
 求解问题
 求解方法
梯度下降法
牛顿法
拟牛顿法
BFGS,L-BFGS
OWLQN
优化
 什么是优化问题?
 问题界定:注意这里我们都是无约束优化
 即x的取值为全空间
 理论基础:x是否一定有最优解?
若f为严格凸,x有最优解(凸函数的性质其实很苛刻,很难满足)
• Luckily,logistic的loss function是凸的
否则,不一定,这时,我们求个局部最优就好了
优化求解思想
 优化算法求解的基本思想:迭代
 每次都保证:
 问题:
怎么找下一个迭代点?
• 不同方法的区别就在于此
• 一般拆解成两步:定方向,定步长
这样迭代下去一定能找到最优解?
• 所有迭代法都只能保证局部最优
• 对于凸函数,局部最优等价于最优
• (因为凸函数只有一个局部最优) 这里x为二维,图中圆圈表示等高线
解法思想关系图
梯度下降
随机梯度
下降
共轭梯度
解决样本太多
计算量大的问题
解决狭长区域
收敛慢的问题
牛顿法
拟牛顿法
下降方向:贪婪选取
局部下降最快
下降方向:用二次泰勒
展开在当前点局部拟合
目标函数,下降方向为
点到二次函数的极值点
解决二次导
计算量大的问题
BFGS
LBFGS
OWLQN
满足两个条件限制下,
frobinious范数最小
的一种拟牛顿法
解决BFGS存储量太
大的近似方法
解决L1范数无法求导
CDN
计算量太大
梯度下降法
 Step1:定方向
思想:找最陡的方向
将函数比作一座山,我们站在某个山坡上,往四周看,从哪个方
向向下走一小步,能够下降的最快
F在xn上最陡的方向为在该点的负梯度方向
 Step2:定步长(learning rate)
下降最快也是有范围,只是在xn上下降快
步长一般人工调节 梯度方向
一元函数的梯度方向只有两个:左和右
只能说明在这
个点的“附近”
是下降最快
迭代开始
函数:
100*X1^2+x2^2
初始点:(1,1)
负梯度:
(200*X1,2*X2)
步长如果太大
Alpha*200*X1会大
于1,表明迭代的时候
跳过了极值点
第100步
放大
x2方向下降非常困难,因为x1的限制,
步长不能取太大
步长=0.00001
步长=0.01
Overshooting,来回荡来荡去
步长=0.00001
梯度下降法伪代码&trick
 代码
 Trick:feature scaling
如果不做feature scaling,很容易出现右边的很扁的椭圆状等高线,导致低效
从第100步到第
1000步
梯度下降法的优缺点
 优点
计算复杂度O(N*step)
空间复杂度O(N),计算每一个方向的梯度
 缺点
接近极值点的时候,速度会很慢
实操层面,alpha好选么?
共轭梯度法(待补充)
 Step1:定方向
思想:在负梯度和修正方向张成的平面找一个最优方向
牛顿法
 定方向:
思想:用函数在该点的二次泰勒展开的二次函数极值点方向
点:y=x^3
曲线:y= 3*(x-0.5)^2+0.25
曲线为函数在(1,1)这个点的二次泰勒展开函数,在
(1,1)附近,两个函数取值近似相等
下降方向:红色所示
(二维只有向左和向右)
在这里和负梯度方向是一致
的
二次函数极值点
牛顿法
 选步长:不精确一维搜索
 Wolfe conditions
Pk为下降方向
充分减小条件:
f的减小量至少与 成正比
在该区域的点到0点的直线应该等
式右边代表的直线下方
关于alpha的一条直线
曲线在0点的斜率
充分减小条件:
函数在接受点的斜率要大于等于初
始点的斜率的c2倍
1 20 1c c  
4
1 10c 
一般
牛顿法
 步长迭代方法
a1=0,a2=无穷,a=(a1+a2)/2
如a满足wolfe条件,步长即为a
否则二分重新计算a(问题??哪个区间计算二分?)
 伪代码
二阶hessian矩阵,对于大规模机器学习十亿*
十亿的矩阵是无法承受的
拟牛顿法
 思想:找个矩阵近似hessian矩阵就好了
 问题:怎样的相似性度量,能够保证近似矩阵也能同
样迭代求导最优解?
 条件:
对称(Hij = Hji),二阶偏导要求
Secant condition:
gn为在xn处的一阶导数
Secant condition有点类似于保证在xn处的一阶导数这个函数在
该点的一阶泰勒展开与Hessian矩阵一致
BFGS
 BFGS是在满足下面两个条件下,forbinious范数最
小的近似矩阵
 求解结果:
Sn=Xn-Xn-1
Yn=Gn-Gn-1
BFGS->LBFGS
 代码  {Sk}{Yk}的存储仍然大
 因此循环条件改成
 For I = n,…,n-m-1
Nocedal 1980
OWLQN-思想
 前面所有的前提都是函数的一阶导数存在
 但是,对于L1 norm,在0点处是不可导的
 Luckily,L1 norm在每个象限内是存在导数的
 同样的方法计算下降方向,但是步长选择必须保证在
下降方向满足的那个象限
在该区域是一阶可导的
从而可以得到下降方向
但是在选取步长的时候,必
须保证下一个点还在同一象
限 下降方向
只有这一段是
可行的
OWLQN-细节
 若当前迭代点有分量为0,梯度取哪个?
若xi!=0, 与 相等,虚梯度等于梯度
若xi=0, 与 不相等
注意:C>0
故: 一定小于
说明当x从负方向逼近当前点,f是增加的
说明当x从正方向逼近当前点,f是增加的
X1
X2
f=|x1|+|x2|
若x1=0
=1
= -1
该点虚梯度为0
因此在x1方向不变
在x2方向往下移
OWLQN-细节
 若当前迭代点有分量为0,梯度取哪个?
若xi!=0, 与 相等,虚梯度等于梯度
若xi=0, 与 不相等
注意:C>0
故: 一定小于
说明当x从负方向逼近当前点,f是增加的
说明当x从正方向逼近当前点,f是增加的
f=越靠近中心,越小
若x1=0
小于0
小于0
该点虚梯度为
因此在x1方向下移
X1
X2
CDN
 当前的迭代轮数 ,当前迭代点
 循环坐标下降法计算生成一系列的解
 表示“第 轮,现在该更新第 个下标了”,
也就是
 计算第 个维度变量(从 到 )时
 求解
二阶泰勒近似
这是个二次函数
CDN计算量分析
 求解内循环的主要流程在于遍历所有的instance,计算
和
 主要计算量是计算
 遍历instance是无法避免的,但是计算内积并不用遍历所
有的feature
 对于所有的instance,保存 ,然后移动到新点
之后,计算 即可
OWLQN与CDN比较
 OWLQN
 计算梯度和目标函数需要遍历所有instance的所有feature
 CDN
 为每个instance保存数据,需要占用额外的大量内存,但是
内循环只用把instance过一遍,计算内积消耗较少,外循环
需要遍历所有的feature
 OWLQN VS CDN 快在哪?CDN所需迭代轮数较小,
OWLQN一般要迭代100次,而CDN要20次,刚开始,
OWLQN收敛会比较慢,找到一个较好的方向后,收敛会比
CDN快,CDN是刚开始收敛会快,由于我们不需要那么精
确的结果,所以CDN总体时间比OWLQN快
一些数字
 OWLQN单机版本处理的数据量是20G instance数据
和600万feature,迭代约90轮,耗时共10个小时

Contenu connexe

En vedette

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

En vedette (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

From gradient decent to CDN

Notes de l'éditeur

  1. 机器学习里的loss function,哪些是凸的?
  2. 1.有没有很直观的几何含义说明负梯度就是最陡的方向
  3. 一个狭长的椭圆会严重影响迭代效率,有的维度变化快,有的维度变化慢,这里x1迅速到达0,但是x2到达的很缓慢
  4. 梯度下降法的好处是只需要求一阶导,存储空间为o(N),计算复杂度o(N*step) 梯度下降的收敛速度? 一般的问题,梯度下降需要迭代多少轮? 随机梯度下降