SlideShare une entreprise Scribd logo
1  sur  13
Télécharger pour lire hors ligne
Pricing American Options with Reinforcement Learning
Ashwin Rao
ICME, Stanford University
November 16, 2018
Ashwin Rao (Stanford) RL for American Options November 16, 2018 1 / 13
Overview
1 Review of Optimal Stopping and its MDP Formulation
2 Longstaff-Schwartz Algorithm
3 RL Algorithms for American Option Pricing: LSPI and FQI
4 Choice of feature functions and Model-based adaptation
Ashwin Rao (Stanford) RL for American Options November 16, 2018 2 / 13
Stopping Time
Stopping time τ is a “random time” (random variable) interpreted as
time at which a given stochastic process exhibits certain behavior
Stopping time often defined by a “stopping policy” to decide whether
to continue/stop a process based on present position and past events
Random variable τ such that Pr[τ ≤ t] is in σ-algebra Ft, for all t
Deciding whether τ ≤ t only depends on information up to time t
Hitting time of a Borel set A for a process Xt is the first time Xt
takes a value within the set A
Hitting time is an example of stopping time. Formally,
TX,A = min{t ∈ R Xt ∈ A}
eg: Hitting time of a process to exceed a certain fixed level
Ashwin Rao (Stanford) RL for American Options November 16, 2018 3 / 13
Optimal Stopping Problem
Optimal Stopping problem for Stochastic Process Xt:
V (x) = max
τ
E[G(Xτ ) X0 = x]
where τ is a set of stopping times of Xt, V (⋅) is called the Value
function, and G is the Reward function.
Note that sometimes we can have several stopping times that
maximize E[G(Xτ )] and we say that the optimal stopping time is the
smallest stopping time achieving the maximum value.
Example of Optimal Stopping: Optimal Exercise of American Options
Xt is stochastic process for underlying security’s price
x is underlying security’s current price
τ is set of exercise times corresponding to various stopping policies
V (⋅) is American option price as function of underlying’s current price
G(⋅) is the option payoff function
Ashwin Rao (Stanford) RL for American Options November 16, 2018 4 / 13
Optimal Stopping Problems as Markov Decision Processes
We formulate Stopping Time problems as Markov Decision Processes
State is a suitable function of the history of Stochastic Process Xt
Action is Boolean: Stop or Continue
Reward always 0, except upon Stopping (when it is = G(Xτ ))
State-transitions governed by Underlying Price Stochastic Process
Ashwin Rao (Stanford) RL for American Options November 16, 2018 5 / 13
Mainstream approaches to American Option Pricing
American Option Pricing is Optimal Stopping, and hence an MDP
So can be tackled with Dynamic Programming or RL algorithms
But let us first review the mainstream approaches
For some American options, just price the European, eg: vanilla call
When payoff is not path-dependent and state dimension is not large,
we can do backward induction on a binomial/trinomial tree/grid
Otherwise, the standard approach is Longstaff-Schwartz algorithm
Longstaff-Schwartz algorithm combines 3 ideas:
Valuation based on Monte-Carlo simulation
Function approximation of continuation value for in-the-money states
Backward-recursive determination of early exercise states
Ashwin Rao (Stanford) RL for American Options November 16, 2018 6 / 13
Ingredients of Longstaff-Schwartz Algorithm
m Monte-Carlo paths indexed i = 0,1,...,m − 1
n + 1 time steps indexed j = n,n − 1,...,1,0 (we move back in time)
Infinitesimal Risk-free rate at time tj denoted rtj
Simulation paths of prices of underlying as input 2-dim array SP[i,j]
At each time step, CF[i] is PV of current+future cashflow for path i
si,j denotes state for (i,j) = (time tj , price history SP[i, (j + 1)])
Payoff (si,j ) denotes Option Payoff at (i,j)
φ0(si,j ),...,φr−1(si,j ) represent feature functions (of state si,j )
w0,...,wr−1 are the regression weights
Regression function f (si,j ) = w ⋅ φ(si,j ) = ∑r−1
l=0 wl ⋅ φl (si,j )
f (⋅) is estimate of continuation value for in-the-money states
Ashwin Rao (Stanford) RL for American Options November 16, 2018 7 / 13
The Longstaff-Schwartz Algorithm
Algorithm 2.1: LongstaffSchwartz(SP[0 m,0 n + 1])
comment: si,j is shorthand for state at (i,j) = (tj , SP[i, (j + 1)])
CF[0 m] ← [Payoff (si,n) for i in range(m)]
for j ← n − 1 downto 1
do
⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪
⎨
⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩
CF[0 m] ← CF[0 m] ∗ e−rtj
(tj+1−tj )
X ← [φ(si,j ) for i in range(m) if Payoff (si,j ) > 0]
Y ← [CF[i] for i in range(m) if Payoff (si,j ) > 0]
w ← (XT
⋅ X)−1
⋅ XT
⋅ Y
comment: Above regression gives estimate of continuation value
for i ← 0 to m − 1
do CF[i] ← Payoff (si,j ) if Payoff (si,j ) > w ⋅ φ(si,j )
exercise ← Payoff (s0,0)
continue ← e−r0(t1−t0)
⋅ mean(CF[0 m])
return (max(exercise,continue))
Ashwin Rao (Stanford) RL for American Options November 16, 2018 8 / 13
RL as an alternative to Longstaff-Schwartz
RL is straightforward if we clearly define the MDP
State is [Current Time, History of Underlying Security Prices]
Action is Boolean: Exercise (i.e., Stop) or Continue
Reward always 0, except upon Exercise (= Payoff)
State-transitions governed by Underlying Price’s Stochastic Process
Key is function approximation of state-conditioned continuation value
Continuation Value ⇒ Optimal Stopping ⇒ Option Price
We outline two RL Algorithms:
Least Squares Policy Iteration (LSPI)
Fitted Q-Iteration (FQI)
Both Algorithms are batch methods solving a linear system
More details in Li, Szepesvari, Schuurmans paper
Ashwin Rao (Stanford) RL for American Options November 16, 2018 9 / 13
Least Squares Policy Iteration for Continuation Value
Algorithm 3.1: LSPI-AmericanPricing(SP[0 m,0 n + 1])
comment: si,j is shorthand for state at (i,j) = (tj , SP[i, (j + 1)])
comment: A is an r × r matrix, b and w are r-length vectors
comment: Ai,j ← φ(si,j ) ⋅ (φ(si,j ) − γIw⋅φ(si,j+1)≥Payoff (si,j+1) ∗ φ(si,j+1))T
comment: bi,j ← γIw⋅φ(si,j+1)<Payoff (si,j+1) ∗ Payoff (si,j+1) ∗ φ(si,j )
A ← 0,B ← 0,w ← 0
for i ← 0 to m − 1
do
⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪
⎨
⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩
for j ← 0 to n − 1
do
⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪
⎨
⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩
Q ← Payoff (si,j+1)
P ← φ(si,j+1) if j < n − 1 and Q ≤ w ⋅ φ(si,j+1) else 0
R ← Q if Q > w ⋅ P else 0
A ← A + φ(si,j ) ⋅ (φ(si,j ) − e−rtj
(tj+1−tj )
∗ P)T
B ← B + e−rtj
(tj+1−tj )
∗ R ∗ φ(si,j )
w ← A−1
⋅ b,A ← 0,b ← 0 if (i + 1)%BatchSize == 0
Ashwin Rao (Stanford) RL for American Options November 16, 2018 10 / 13
Fitted Q-Iteration for Continuation Value
Algorithm 3.2: FQI-AmericanPricing(SP[0 m,0 n + 1])
comment: si,j is shorthand for state at (i,j) = (tj , SP[i, (j + 1)])
comment: A is an r × r matrix, b and w are r-length vectors
comment: Ai,j ← φ(si,j ) ⋅ φ(si,j )T
comment: bi,j ← γ max(Payoff (si,j+1),w ⋅ φ(si,j+1)) ∗ φ(si,j )
A ← 0,B ← 0,w ← 0
for i ← 0 to m − 1
do
⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪
⎨
⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩
for j ← 0 to n − 1
do
⎧⎪⎪⎪⎪⎪⎪
⎨
⎪⎪⎪⎪⎪⎪⎩
Q ← Payoff (si,j+1)
P ← φ(si,j+1) if j < n − 1 else 0
A ← A + φ(si,j ) ⋅ φ(si,j )T
B ← B + e−rtj
(tj+1−tj )
∗ max(Payoff (si,j+1),w ⋅ P) ∗ φ(si,j )
w ← A−1
⋅ b,A ← 0,b ← 0 if (i + 1)%BatchSize == 0
Ashwin Rao (Stanford) RL for American Options November 16, 2018 11 / 13
Feature functions
Li, Szepesvari, Schuurmans recommend Laguerre polynomials (first 3)
Over S′
= St/K where St is underlying price and K is strike
φ0(St) = 1,φ1(St) = e− S′
2 ,φ2(St) = e− S′
2 ⋅ (1 − S′
),φ3(St) =
e− S′
2 ⋅ (1 − 2S′
+ S′2
/2)
They used these for Longstaff-Schwartz as well as for LSPI and FQI
For LSPI and FQI, we also need feature functions for time
They recommend
φt
0(t) = sin(
π(T−t)
2T ),φt
1(t) = log(T − t),φt
2(t) = ( t
T )2
They claim LSPI and FQI perform better than Longstaff-Schwartz
with this choice of features functions
We will code up these algorithms to validate this claim
Ashwin Rao (Stanford) RL for American Options November 16, 2018 12 / 13
Extending LSPI and FQI to model-based settings
Li, Szepesvari, Schuurmans have shown a model-free solution
But the same algorithms can be extended to model-based settings
Wherein we have explicit knowledge of the transition probabilities
We simply change the TD target to a probability-weighted target
This should help with speed of convergence
Discussion: What other RL techniques could be used here?
Ashwin Rao (Stanford) RL for American Options November 16, 2018 13 / 13

Contenu connexe

Plus de Ashwin Rao

Overview of Stochastic Calculus Foundations
Overview of Stochastic Calculus FoundationsOverview of Stochastic Calculus Foundations
Overview of Stochastic Calculus FoundationsAshwin Rao
 
Risk-Aversion, Risk-Premium and Utility Theory
Risk-Aversion, Risk-Premium and Utility TheoryRisk-Aversion, Risk-Premium and Utility Theory
Risk-Aversion, Risk-Premium and Utility TheoryAshwin Rao
 
Value Function Geometry and Gradient TD
Value Function Geometry and Gradient TDValue Function Geometry and Gradient TD
Value Function Geometry and Gradient TDAshwin Rao
 
Stanford CME 241 - Reinforcement Learning for Stochastic Control Problems in ...
Stanford CME 241 - Reinforcement Learning for Stochastic Control Problems in ...Stanford CME 241 - Reinforcement Learning for Stochastic Control Problems in ...
Stanford CME 241 - Reinforcement Learning for Stochastic Control Problems in ...Ashwin Rao
 
HJB Equation and Merton's Portfolio Problem
HJB Equation and Merton's Portfolio ProblemHJB Equation and Merton's Portfolio Problem
HJB Equation and Merton's Portfolio ProblemAshwin Rao
 
Policy Gradient Theorem
Policy Gradient TheoremPolicy Gradient Theorem
Policy Gradient TheoremAshwin Rao
 
A Quick and Terse Introduction to Efficient Frontier Mathematics
A Quick and Terse Introduction to Efficient Frontier MathematicsA Quick and Terse Introduction to Efficient Frontier Mathematics
A Quick and Terse Introduction to Efficient Frontier MathematicsAshwin Rao
 
Towards Improved Pricing and Hedging of Agency Mortgage-backed Securities
Towards Improved Pricing and Hedging of Agency Mortgage-backed SecuritiesTowards Improved Pricing and Hedging of Agency Mortgage-backed Securities
Towards Improved Pricing and Hedging of Agency Mortgage-backed SecuritiesAshwin Rao
 
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural NetworkRecursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural NetworkAshwin Rao
 
Demystifying the Bias-Variance Tradeoff
Demystifying the Bias-Variance TradeoffDemystifying the Bias-Variance Tradeoff
Demystifying the Bias-Variance TradeoffAshwin Rao
 
Category Theory made easy with (ugly) pictures
Category Theory made easy with (ugly) picturesCategory Theory made easy with (ugly) pictures
Category Theory made easy with (ugly) picturesAshwin Rao
 
Risk Pooling sensitivity to Correlation
Risk Pooling sensitivity to CorrelationRisk Pooling sensitivity to Correlation
Risk Pooling sensitivity to CorrelationAshwin Rao
 
Abstract Algebra in 3 Hours
Abstract Algebra in 3 HoursAbstract Algebra in 3 Hours
Abstract Algebra in 3 HoursAshwin Rao
 
OmniChannelNewsvendor
OmniChannelNewsvendorOmniChannelNewsvendor
OmniChannelNewsvendorAshwin Rao
 
The Newsvendor meets the Options Trader
The Newsvendor meets the Options TraderThe Newsvendor meets the Options Trader
The Newsvendor meets the Options TraderAshwin Rao
 
The Fuss about || Haskell | Scala | F# ||
The Fuss about || Haskell | Scala | F# ||The Fuss about || Haskell | Scala | F# ||
The Fuss about || Haskell | Scala | F# ||Ashwin Rao
 
Career Advice at University College of London, Mathematics Department.
Career Advice at University College of London, Mathematics Department.Career Advice at University College of London, Mathematics Department.
Career Advice at University College of London, Mathematics Department.Ashwin Rao
 
Careers outside Academia - USC Computer Science Masters and Ph.D. Students
Careers outside Academia - USC Computer Science Masters and Ph.D. StudentsCareers outside Academia - USC Computer Science Masters and Ph.D. Students
Careers outside Academia - USC Computer Science Masters and Ph.D. StudentsAshwin Rao
 
IIT Bombay - First Steps to a Successful Career
IIT Bombay - First Steps to a Successful CareerIIT Bombay - First Steps to a Successful Career
IIT Bombay - First Steps to a Successful CareerAshwin Rao
 
Stanford FinMath - Careers in Quant Finance
Stanford FinMath - Careers in Quant FinanceStanford FinMath - Careers in Quant Finance
Stanford FinMath - Careers in Quant FinanceAshwin Rao
 

Plus de Ashwin Rao (20)

Overview of Stochastic Calculus Foundations
Overview of Stochastic Calculus FoundationsOverview of Stochastic Calculus Foundations
Overview of Stochastic Calculus Foundations
 
Risk-Aversion, Risk-Premium and Utility Theory
Risk-Aversion, Risk-Premium and Utility TheoryRisk-Aversion, Risk-Premium and Utility Theory
Risk-Aversion, Risk-Premium and Utility Theory
 
Value Function Geometry and Gradient TD
Value Function Geometry and Gradient TDValue Function Geometry and Gradient TD
Value Function Geometry and Gradient TD
 
Stanford CME 241 - Reinforcement Learning for Stochastic Control Problems in ...
Stanford CME 241 - Reinforcement Learning for Stochastic Control Problems in ...Stanford CME 241 - Reinforcement Learning for Stochastic Control Problems in ...
Stanford CME 241 - Reinforcement Learning for Stochastic Control Problems in ...
 
HJB Equation and Merton's Portfolio Problem
HJB Equation and Merton's Portfolio ProblemHJB Equation and Merton's Portfolio Problem
HJB Equation and Merton's Portfolio Problem
 
Policy Gradient Theorem
Policy Gradient TheoremPolicy Gradient Theorem
Policy Gradient Theorem
 
A Quick and Terse Introduction to Efficient Frontier Mathematics
A Quick and Terse Introduction to Efficient Frontier MathematicsA Quick and Terse Introduction to Efficient Frontier Mathematics
A Quick and Terse Introduction to Efficient Frontier Mathematics
 
Towards Improved Pricing and Hedging of Agency Mortgage-backed Securities
Towards Improved Pricing and Hedging of Agency Mortgage-backed SecuritiesTowards Improved Pricing and Hedging of Agency Mortgage-backed Securities
Towards Improved Pricing and Hedging of Agency Mortgage-backed Securities
 
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural NetworkRecursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
 
Demystifying the Bias-Variance Tradeoff
Demystifying the Bias-Variance TradeoffDemystifying the Bias-Variance Tradeoff
Demystifying the Bias-Variance Tradeoff
 
Category Theory made easy with (ugly) pictures
Category Theory made easy with (ugly) picturesCategory Theory made easy with (ugly) pictures
Category Theory made easy with (ugly) pictures
 
Risk Pooling sensitivity to Correlation
Risk Pooling sensitivity to CorrelationRisk Pooling sensitivity to Correlation
Risk Pooling sensitivity to Correlation
 
Abstract Algebra in 3 Hours
Abstract Algebra in 3 HoursAbstract Algebra in 3 Hours
Abstract Algebra in 3 Hours
 
OmniChannelNewsvendor
OmniChannelNewsvendorOmniChannelNewsvendor
OmniChannelNewsvendor
 
The Newsvendor meets the Options Trader
The Newsvendor meets the Options TraderThe Newsvendor meets the Options Trader
The Newsvendor meets the Options Trader
 
The Fuss about || Haskell | Scala | F# ||
The Fuss about || Haskell | Scala | F# ||The Fuss about || Haskell | Scala | F# ||
The Fuss about || Haskell | Scala | F# ||
 
Career Advice at University College of London, Mathematics Department.
Career Advice at University College of London, Mathematics Department.Career Advice at University College of London, Mathematics Department.
Career Advice at University College of London, Mathematics Department.
 
Careers outside Academia - USC Computer Science Masters and Ph.D. Students
Careers outside Academia - USC Computer Science Masters and Ph.D. StudentsCareers outside Academia - USC Computer Science Masters and Ph.D. Students
Careers outside Academia - USC Computer Science Masters and Ph.D. Students
 
IIT Bombay - First Steps to a Successful Career
IIT Bombay - First Steps to a Successful CareerIIT Bombay - First Steps to a Successful Career
IIT Bombay - First Steps to a Successful Career
 
Stanford FinMath - Careers in Quant Finance
Stanford FinMath - Careers in Quant FinanceStanford FinMath - Careers in Quant Finance
Stanford FinMath - Careers in Quant Finance
 

Dernier

8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
 
An introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxAn introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxPurva Nikam
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 

Dernier (20)

8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
An introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxAn introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptx
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 

Pricing American Options with Reinforcement Learning

  • 1. Pricing American Options with Reinforcement Learning Ashwin Rao ICME, Stanford University November 16, 2018 Ashwin Rao (Stanford) RL for American Options November 16, 2018 1 / 13
  • 2. Overview 1 Review of Optimal Stopping and its MDP Formulation 2 Longstaff-Schwartz Algorithm 3 RL Algorithms for American Option Pricing: LSPI and FQI 4 Choice of feature functions and Model-based adaptation Ashwin Rao (Stanford) RL for American Options November 16, 2018 2 / 13
  • 3. Stopping Time Stopping time τ is a “random time” (random variable) interpreted as time at which a given stochastic process exhibits certain behavior Stopping time often defined by a “stopping policy” to decide whether to continue/stop a process based on present position and past events Random variable τ such that Pr[τ ≤ t] is in σ-algebra Ft, for all t Deciding whether τ ≤ t only depends on information up to time t Hitting time of a Borel set A for a process Xt is the first time Xt takes a value within the set A Hitting time is an example of stopping time. Formally, TX,A = min{t ∈ R Xt ∈ A} eg: Hitting time of a process to exceed a certain fixed level Ashwin Rao (Stanford) RL for American Options November 16, 2018 3 / 13
  • 4. Optimal Stopping Problem Optimal Stopping problem for Stochastic Process Xt: V (x) = max τ E[G(Xτ ) X0 = x] where τ is a set of stopping times of Xt, V (⋅) is called the Value function, and G is the Reward function. Note that sometimes we can have several stopping times that maximize E[G(Xτ )] and we say that the optimal stopping time is the smallest stopping time achieving the maximum value. Example of Optimal Stopping: Optimal Exercise of American Options Xt is stochastic process for underlying security’s price x is underlying security’s current price τ is set of exercise times corresponding to various stopping policies V (⋅) is American option price as function of underlying’s current price G(⋅) is the option payoff function Ashwin Rao (Stanford) RL for American Options November 16, 2018 4 / 13
  • 5. Optimal Stopping Problems as Markov Decision Processes We formulate Stopping Time problems as Markov Decision Processes State is a suitable function of the history of Stochastic Process Xt Action is Boolean: Stop or Continue Reward always 0, except upon Stopping (when it is = G(Xτ )) State-transitions governed by Underlying Price Stochastic Process Ashwin Rao (Stanford) RL for American Options November 16, 2018 5 / 13
  • 6. Mainstream approaches to American Option Pricing American Option Pricing is Optimal Stopping, and hence an MDP So can be tackled with Dynamic Programming or RL algorithms But let us first review the mainstream approaches For some American options, just price the European, eg: vanilla call When payoff is not path-dependent and state dimension is not large, we can do backward induction on a binomial/trinomial tree/grid Otherwise, the standard approach is Longstaff-Schwartz algorithm Longstaff-Schwartz algorithm combines 3 ideas: Valuation based on Monte-Carlo simulation Function approximation of continuation value for in-the-money states Backward-recursive determination of early exercise states Ashwin Rao (Stanford) RL for American Options November 16, 2018 6 / 13
  • 7. Ingredients of Longstaff-Schwartz Algorithm m Monte-Carlo paths indexed i = 0,1,...,m − 1 n + 1 time steps indexed j = n,n − 1,...,1,0 (we move back in time) Infinitesimal Risk-free rate at time tj denoted rtj Simulation paths of prices of underlying as input 2-dim array SP[i,j] At each time step, CF[i] is PV of current+future cashflow for path i si,j denotes state for (i,j) = (time tj , price history SP[i, (j + 1)]) Payoff (si,j ) denotes Option Payoff at (i,j) φ0(si,j ),...,φr−1(si,j ) represent feature functions (of state si,j ) w0,...,wr−1 are the regression weights Regression function f (si,j ) = w ⋅ φ(si,j ) = ∑r−1 l=0 wl ⋅ φl (si,j ) f (⋅) is estimate of continuation value for in-the-money states Ashwin Rao (Stanford) RL for American Options November 16, 2018 7 / 13
  • 8. The Longstaff-Schwartz Algorithm Algorithm 2.1: LongstaffSchwartz(SP[0 m,0 n + 1]) comment: si,j is shorthand for state at (i,j) = (tj , SP[i, (j + 1)]) CF[0 m] ← [Payoff (si,n) for i in range(m)] for j ← n − 1 downto 1 do ⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪ ⎨ ⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩ CF[0 m] ← CF[0 m] ∗ e−rtj (tj+1−tj ) X ← [φ(si,j ) for i in range(m) if Payoff (si,j ) > 0] Y ← [CF[i] for i in range(m) if Payoff (si,j ) > 0] w ← (XT ⋅ X)−1 ⋅ XT ⋅ Y comment: Above regression gives estimate of continuation value for i ← 0 to m − 1 do CF[i] ← Payoff (si,j ) if Payoff (si,j ) > w ⋅ φ(si,j ) exercise ← Payoff (s0,0) continue ← e−r0(t1−t0) ⋅ mean(CF[0 m]) return (max(exercise,continue)) Ashwin Rao (Stanford) RL for American Options November 16, 2018 8 / 13
  • 9. RL as an alternative to Longstaff-Schwartz RL is straightforward if we clearly define the MDP State is [Current Time, History of Underlying Security Prices] Action is Boolean: Exercise (i.e., Stop) or Continue Reward always 0, except upon Exercise (= Payoff) State-transitions governed by Underlying Price’s Stochastic Process Key is function approximation of state-conditioned continuation value Continuation Value ⇒ Optimal Stopping ⇒ Option Price We outline two RL Algorithms: Least Squares Policy Iteration (LSPI) Fitted Q-Iteration (FQI) Both Algorithms are batch methods solving a linear system More details in Li, Szepesvari, Schuurmans paper Ashwin Rao (Stanford) RL for American Options November 16, 2018 9 / 13
  • 10. Least Squares Policy Iteration for Continuation Value Algorithm 3.1: LSPI-AmericanPricing(SP[0 m,0 n + 1]) comment: si,j is shorthand for state at (i,j) = (tj , SP[i, (j + 1)]) comment: A is an r × r matrix, b and w are r-length vectors comment: Ai,j ← φ(si,j ) ⋅ (φ(si,j ) − γIw⋅φ(si,j+1)≥Payoff (si,j+1) ∗ φ(si,j+1))T comment: bi,j ← γIw⋅φ(si,j+1)<Payoff (si,j+1) ∗ Payoff (si,j+1) ∗ φ(si,j ) A ← 0,B ← 0,w ← 0 for i ← 0 to m − 1 do ⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪ ⎨ ⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩ for j ← 0 to n − 1 do ⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪ ⎨ ⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩ Q ← Payoff (si,j+1) P ← φ(si,j+1) if j < n − 1 and Q ≤ w ⋅ φ(si,j+1) else 0 R ← Q if Q > w ⋅ P else 0 A ← A + φ(si,j ) ⋅ (φ(si,j ) − e−rtj (tj+1−tj ) ∗ P)T B ← B + e−rtj (tj+1−tj ) ∗ R ∗ φ(si,j ) w ← A−1 ⋅ b,A ← 0,b ← 0 if (i + 1)%BatchSize == 0 Ashwin Rao (Stanford) RL for American Options November 16, 2018 10 / 13
  • 11. Fitted Q-Iteration for Continuation Value Algorithm 3.2: FQI-AmericanPricing(SP[0 m,0 n + 1]) comment: si,j is shorthand for state at (i,j) = (tj , SP[i, (j + 1)]) comment: A is an r × r matrix, b and w are r-length vectors comment: Ai,j ← φ(si,j ) ⋅ φ(si,j )T comment: bi,j ← γ max(Payoff (si,j+1),w ⋅ φ(si,j+1)) ∗ φ(si,j ) A ← 0,B ← 0,w ← 0 for i ← 0 to m − 1 do ⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪ ⎨ ⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩ for j ← 0 to n − 1 do ⎧⎪⎪⎪⎪⎪⎪ ⎨ ⎪⎪⎪⎪⎪⎪⎩ Q ← Payoff (si,j+1) P ← φ(si,j+1) if j < n − 1 else 0 A ← A + φ(si,j ) ⋅ φ(si,j )T B ← B + e−rtj (tj+1−tj ) ∗ max(Payoff (si,j+1),w ⋅ P) ∗ φ(si,j ) w ← A−1 ⋅ b,A ← 0,b ← 0 if (i + 1)%BatchSize == 0 Ashwin Rao (Stanford) RL for American Options November 16, 2018 11 / 13
  • 12. Feature functions Li, Szepesvari, Schuurmans recommend Laguerre polynomials (first 3) Over S′ = St/K where St is underlying price and K is strike φ0(St) = 1,φ1(St) = e− S′ 2 ,φ2(St) = e− S′ 2 ⋅ (1 − S′ ),φ3(St) = e− S′ 2 ⋅ (1 − 2S′ + S′2 /2) They used these for Longstaff-Schwartz as well as for LSPI and FQI For LSPI and FQI, we also need feature functions for time They recommend φt 0(t) = sin( π(T−t) 2T ),φt 1(t) = log(T − t),φt 2(t) = ( t T )2 They claim LSPI and FQI perform better than Longstaff-Schwartz with this choice of features functions We will code up these algorithms to validate this claim Ashwin Rao (Stanford) RL for American Options November 16, 2018 12 / 13
  • 13. Extending LSPI and FQI to model-based settings Li, Szepesvari, Schuurmans have shown a model-free solution But the same algorithms can be extended to model-based settings Wherein we have explicit knowledge of the transition probabilities We simply change the TD target to a probability-weighted target This should help with speed of convergence Discussion: What other RL techniques could be used here? Ashwin Rao (Stanford) RL for American Options November 16, 2018 13 / 13