SlideShare une entreprise Scribd logo
1  sur  9
Télécharger pour lire hors ligne
Real-world Derivatives Hedging with
Deep Reinforcement Learning
Ashwin Rao
ICME, Stanford University
November 15, 2019
Ashwin Rao (Stanford) Deep Hedging November 15, 2019 1 / 9
Classical Pricing and Hedging of Derivatives
Classical Pricing/Hedging Theory assumes frictionless markets
Frictionless := continuous trading, any volume, no transaction costs
Assumptions of arbitrage-free and completeness lead to (dynamic and
exact) replication of derivatives with basic market securities
Replication strategy gives us the pricing and hedging solutions
This is the foundation of the famous Black-Scholes formulas
However, the real-world has many frictions ⇒ Incomplete Market
... where Derivatives cannot be exactly replicated
Ashwin Rao (Stanford) Deep Hedging November 15, 2019 2 / 9
Pricing and Hedging in an Incomplete Market
In an incomplete market, we have multiple risk-neutral measures
So, multiple derivative prices (each consistent with no-arbitrage)
The market/trader “chooses” a risk-neutral measure (hence, price)
Based on a specified preference for trading risk versus return
This preference is equivalent to specifying a Utility function
Maximizing “risk-adjusted return” of the derivative plus hedges
Reminiscent of the Portfolio Optimization problem
Likewise, we can set this up as a stochastic control (MDP) problem
Where the decision at each time step is: Trades in the hedges
So what’s the best way to solve this MDP?
Ashwin Rao (Stanford) Deep Hedging November 15, 2019 3 / 9
Deep Reinforcement Learning (DRL)
Dynamic Programming not suitable in practice due to:
Curse of Dimensionality
Curse of Modeling
So we solve the MDP with Deep Reinforcement Learning (DRL)
The idea is to use real market data and real market frictions
Developing realistic simulations to derive the optimal policy
The optimal policy gives us the (practical) hedging strategy
The optimal value function gives us the price (valuation)
Formulation based on Deep Hedging paper by J.P.Morgan researchers
More details in the prior paper by some of the same authors
Ashwin Rao (Stanford) Deep Hedging November 15, 2019 4 / 9
Problem Setup
We will simplify the problem setup a bit for ease of exposition
This model works for more complex, more frictionful markets too
Assume time is in discrete (finite) steps t = 0, 1, . . . , T
Assume we have a position (portfolio) D in m derivatives
Assume each of these m derivatives expires in time ≤ T
Portfolio-aggregated Contingent Cashflows at time t denoted Xt ∈ R
Assume we have n basic market securities as potential hedges
Hedge positions (units held) at time t denoted αt ∈ Rn
Cashflows per unit of hedges held at time t denoted Yt ∈ Rn
Prices per unit of hedges at time t denoted Pt ∈ Rn
PnL position at time t is denoted as βt ∈ R
Ashwin Rao (Stanford) Deep Hedging November 15, 2019 5 / 9
States and Actions
Denote state space at time t as St, state at time t as st ∈ St
Among other things, st contains t, αt, Pt, βt, D
st will include any market information relevant to trading actions
For simplicity, we assume st is just the tuple (t, αt, Pt, βt, D)
Denote action space at time t as At, action at time t as at ∈ At
at represents units of hedges traded (positive for buy, negative for sell)
Trading restrictions (eg: no short-selling) define At as a function of st
State transitions Pt+1|Pt available from a simulator, whose internals
are estimated from real market data and realistic assumptions
Ashwin Rao (Stanford) Deep Hedging November 15, 2019 6 / 9
Sequence of events at each time step t = 0, . . . , T
1 Observe state st = (t, αt, Pt, βt, D)
2 Realize cashflows (from holding positions) = Xt + αt · Yt
3 Perform action (trades) at to produce trading PnL = −at · Pt
4 Trading transaction costs, example = −γPt · |at| for some γ > 0
5 Update αt as: αt+1 = αt + at for t = 0, . . . , T
(force-liquidation at termination means aT = αT )
6 Update PnL βt as:
βt+1 = βt + Xt + αt · Yt − at · Pt − γPt · |at| for t = 0, . . . , T
7 Reward rt = 0 for all t = 0, . . . , T − 1 and rT = U(βT+1) for an
appropriate concave Utility function U (based on risk-aversion)
8 Simulator evolves hedge prices from Pt to Pt+1
Ashwin Rao (Stanford) Deep Hedging November 15, 2019 7 / 9
Pricing and Hedging
Assume we now want to enter into an incremental position (portfolio)
D in m derivatives (denote the combined position as D ∪ D )
We want to determine the Price of the incremental position D , as
well as the hedging strategy for D
Denote the Optimal Value Function at time t as V ∗
t : St → R
Pricing of D is based on the principle that introducing the
incremental position of D together with a calibrated cashflow at
t = 0 should leave the Optimal Value (at t = 0) unchanged
Precisely, Price of D is the value x such that
V ∗
0 ((0, α0, P0, β0 − x, D ∪ D )) = V ∗
0 ((0, α0, P0, β0, D))
The hedging strategy at time t is given by the Optimal Policy
π∗
t : St → At
Ashwin Rao (Stanford) Deep Hedging November 15, 2019 8 / 9
DRL Approach a Breakthrough for Practical Trading?
The industry practice/tradition has been to start with Complete
Market assumption, and then layer ad-hoc/unsatisfactory adjustments
There is some past work on pricing/hedging in incomplete markets
But it’s theoretical and not usable in real trading (eg: Superhedging)
My view: This DRL approach is a breakthrough for practical trading
Key advantages of this DRL approach:
Algorithm for pricing/hedging independent of market dynamics
Computational cost scales efficiently with size m of derivatives portfolio
Enables one to faithfully capture practical trading situations/constraints
Deep Neural Networks provide great function approximation for RL
Ashwin Rao (Stanford) Deep Hedging November 15, 2019 9 / 9

Contenu connexe

Plus de Ashwin Rao

Overview of Stochastic Calculus Foundations
Overview of Stochastic Calculus FoundationsOverview of Stochastic Calculus Foundations
Overview of Stochastic Calculus FoundationsAshwin Rao
 
Risk-Aversion, Risk-Premium and Utility Theory
Risk-Aversion, Risk-Premium and Utility TheoryRisk-Aversion, Risk-Premium and Utility Theory
Risk-Aversion, Risk-Premium and Utility TheoryAshwin Rao
 
Value Function Geometry and Gradient TD
Value Function Geometry and Gradient TDValue Function Geometry and Gradient TD
Value Function Geometry and Gradient TDAshwin Rao
 
Stanford CME 241 - Reinforcement Learning for Stochastic Control Problems in ...
Stanford CME 241 - Reinforcement Learning for Stochastic Control Problems in ...Stanford CME 241 - Reinforcement Learning for Stochastic Control Problems in ...
Stanford CME 241 - Reinforcement Learning for Stochastic Control Problems in ...Ashwin Rao
 
HJB Equation and Merton's Portfolio Problem
HJB Equation and Merton's Portfolio ProblemHJB Equation and Merton's Portfolio Problem
HJB Equation and Merton's Portfolio ProblemAshwin Rao
 
Policy Gradient Theorem
Policy Gradient TheoremPolicy Gradient Theorem
Policy Gradient TheoremAshwin Rao
 
A Quick and Terse Introduction to Efficient Frontier Mathematics
A Quick and Terse Introduction to Efficient Frontier MathematicsA Quick and Terse Introduction to Efficient Frontier Mathematics
A Quick and Terse Introduction to Efficient Frontier MathematicsAshwin Rao
 
Towards Improved Pricing and Hedging of Agency Mortgage-backed Securities
Towards Improved Pricing and Hedging of Agency Mortgage-backed SecuritiesTowards Improved Pricing and Hedging of Agency Mortgage-backed Securities
Towards Improved Pricing and Hedging of Agency Mortgage-backed SecuritiesAshwin Rao
 
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural NetworkRecursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural NetworkAshwin Rao
 
Demystifying the Bias-Variance Tradeoff
Demystifying the Bias-Variance TradeoffDemystifying the Bias-Variance Tradeoff
Demystifying the Bias-Variance TradeoffAshwin Rao
 
Category Theory made easy with (ugly) pictures
Category Theory made easy with (ugly) picturesCategory Theory made easy with (ugly) pictures
Category Theory made easy with (ugly) picturesAshwin Rao
 
Risk Pooling sensitivity to Correlation
Risk Pooling sensitivity to CorrelationRisk Pooling sensitivity to Correlation
Risk Pooling sensitivity to CorrelationAshwin Rao
 
Abstract Algebra in 3 Hours
Abstract Algebra in 3 HoursAbstract Algebra in 3 Hours
Abstract Algebra in 3 HoursAshwin Rao
 
OmniChannelNewsvendor
OmniChannelNewsvendorOmniChannelNewsvendor
OmniChannelNewsvendorAshwin Rao
 
The Newsvendor meets the Options Trader
The Newsvendor meets the Options TraderThe Newsvendor meets the Options Trader
The Newsvendor meets the Options TraderAshwin Rao
 
The Fuss about || Haskell | Scala | F# ||
The Fuss about || Haskell | Scala | F# ||The Fuss about || Haskell | Scala | F# ||
The Fuss about || Haskell | Scala | F# ||Ashwin Rao
 
Career Advice at University College of London, Mathematics Department.
Career Advice at University College of London, Mathematics Department.Career Advice at University College of London, Mathematics Department.
Career Advice at University College of London, Mathematics Department.Ashwin Rao
 
Careers outside Academia - USC Computer Science Masters and Ph.D. Students
Careers outside Academia - USC Computer Science Masters and Ph.D. StudentsCareers outside Academia - USC Computer Science Masters and Ph.D. Students
Careers outside Academia - USC Computer Science Masters and Ph.D. StudentsAshwin Rao
 
IIT Bombay - First Steps to a Successful Career
IIT Bombay - First Steps to a Successful CareerIIT Bombay - First Steps to a Successful Career
IIT Bombay - First Steps to a Successful CareerAshwin Rao
 
Stanford FinMath - Careers in Quant Finance
Stanford FinMath - Careers in Quant FinanceStanford FinMath - Careers in Quant Finance
Stanford FinMath - Careers in Quant FinanceAshwin Rao
 

Plus de Ashwin Rao (20)

Overview of Stochastic Calculus Foundations
Overview of Stochastic Calculus FoundationsOverview of Stochastic Calculus Foundations
Overview of Stochastic Calculus Foundations
 
Risk-Aversion, Risk-Premium and Utility Theory
Risk-Aversion, Risk-Premium and Utility TheoryRisk-Aversion, Risk-Premium and Utility Theory
Risk-Aversion, Risk-Premium and Utility Theory
 
Value Function Geometry and Gradient TD
Value Function Geometry and Gradient TDValue Function Geometry and Gradient TD
Value Function Geometry and Gradient TD
 
Stanford CME 241 - Reinforcement Learning for Stochastic Control Problems in ...
Stanford CME 241 - Reinforcement Learning for Stochastic Control Problems in ...Stanford CME 241 - Reinforcement Learning for Stochastic Control Problems in ...
Stanford CME 241 - Reinforcement Learning for Stochastic Control Problems in ...
 
HJB Equation and Merton's Portfolio Problem
HJB Equation and Merton's Portfolio ProblemHJB Equation and Merton's Portfolio Problem
HJB Equation and Merton's Portfolio Problem
 
Policy Gradient Theorem
Policy Gradient TheoremPolicy Gradient Theorem
Policy Gradient Theorem
 
A Quick and Terse Introduction to Efficient Frontier Mathematics
A Quick and Terse Introduction to Efficient Frontier MathematicsA Quick and Terse Introduction to Efficient Frontier Mathematics
A Quick and Terse Introduction to Efficient Frontier Mathematics
 
Towards Improved Pricing and Hedging of Agency Mortgage-backed Securities
Towards Improved Pricing and Hedging of Agency Mortgage-backed SecuritiesTowards Improved Pricing and Hedging of Agency Mortgage-backed Securities
Towards Improved Pricing and Hedging of Agency Mortgage-backed Securities
 
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural NetworkRecursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
 
Demystifying the Bias-Variance Tradeoff
Demystifying the Bias-Variance TradeoffDemystifying the Bias-Variance Tradeoff
Demystifying the Bias-Variance Tradeoff
 
Category Theory made easy with (ugly) pictures
Category Theory made easy with (ugly) picturesCategory Theory made easy with (ugly) pictures
Category Theory made easy with (ugly) pictures
 
Risk Pooling sensitivity to Correlation
Risk Pooling sensitivity to CorrelationRisk Pooling sensitivity to Correlation
Risk Pooling sensitivity to Correlation
 
Abstract Algebra in 3 Hours
Abstract Algebra in 3 HoursAbstract Algebra in 3 Hours
Abstract Algebra in 3 Hours
 
OmniChannelNewsvendor
OmniChannelNewsvendorOmniChannelNewsvendor
OmniChannelNewsvendor
 
The Newsvendor meets the Options Trader
The Newsvendor meets the Options TraderThe Newsvendor meets the Options Trader
The Newsvendor meets the Options Trader
 
The Fuss about || Haskell | Scala | F# ||
The Fuss about || Haskell | Scala | F# ||The Fuss about || Haskell | Scala | F# ||
The Fuss about || Haskell | Scala | F# ||
 
Career Advice at University College of London, Mathematics Department.
Career Advice at University College of London, Mathematics Department.Career Advice at University College of London, Mathematics Department.
Career Advice at University College of London, Mathematics Department.
 
Careers outside Academia - USC Computer Science Masters and Ph.D. Students
Careers outside Academia - USC Computer Science Masters and Ph.D. StudentsCareers outside Academia - USC Computer Science Masters and Ph.D. Students
Careers outside Academia - USC Computer Science Masters and Ph.D. Students
 
IIT Bombay - First Steps to a Successful Career
IIT Bombay - First Steps to a Successful CareerIIT Bombay - First Steps to a Successful Career
IIT Bombay - First Steps to a Successful Career
 
Stanford FinMath - Careers in Quant Finance
Stanford FinMath - Careers in Quant FinanceStanford FinMath - Careers in Quant Finance
Stanford FinMath - Careers in Quant Finance
 

Dernier

Stock Market Brief Deck for 3/22/2024.pdf
Stock Market Brief Deck for 3/22/2024.pdfStock Market Brief Deck for 3/22/2024.pdf
Stock Market Brief Deck for 3/22/2024.pdfMichael Silva
 
20240314 Calibre March 2024 Investor Presentation (FINAL).pdf
20240314 Calibre March 2024 Investor Presentation (FINAL).pdf20240314 Calibre March 2024 Investor Presentation (FINAL).pdf
20240314 Calibre March 2024 Investor Presentation (FINAL).pdfAdnet Communications
 
Lundin Gold March 2024 Corporate Presentation - PDAC v1.pdf
Lundin Gold March 2024 Corporate Presentation - PDAC v1.pdfLundin Gold March 2024 Corporate Presentation - PDAC v1.pdf
Lundin Gold March 2024 Corporate Presentation - PDAC v1.pdfAdnet Communications
 
The CBR Covered Bond Investor Roundtable 2024
The CBR Covered Bond Investor Roundtable 2024The CBR Covered Bond Investor Roundtable 2024
The CBR Covered Bond Investor Roundtable 2024Neil Day
 
Introduction to Entrepreneurship and Characteristics of an Entrepreneur
Introduction to Entrepreneurship and Characteristics of an EntrepreneurIntroduction to Entrepreneurship and Characteristics of an Entrepreneur
Introduction to Entrepreneurship and Characteristics of an Entrepreneurabcisahunter
 
What Key Factors Should Risk Officers Consider When Using Generative AI
What Key Factors Should Risk Officers Consider When Using Generative AIWhat Key Factors Should Risk Officers Consider When Using Generative AI
What Key Factors Should Risk Officers Consider When Using Generative AI360factors
 
ACCOUNTING FOR BUSINESS.II BRANCH ACCOUNTS NOTES
ACCOUNTING FOR BUSINESS.II BRANCH ACCOUNTS NOTESACCOUNTING FOR BUSINESS.II BRANCH ACCOUNTS NOTES
ACCOUNTING FOR BUSINESS.II BRANCH ACCOUNTS NOTESKumarJayaraman3
 
Sarlat Advisory - Corporate Brochure - 2024
Sarlat Advisory - Corporate Brochure - 2024Sarlat Advisory - Corporate Brochure - 2024
Sarlat Advisory - Corporate Brochure - 2024Guillaume Ⓥ Sarlat
 
Monthly Market Risk Update: March 2024 [SlideShare]
Monthly Market Risk Update: March 2024 [SlideShare]Monthly Market Risk Update: March 2024 [SlideShare]
Monthly Market Risk Update: March 2024 [SlideShare]Commonwealth
 
ACADEMIC BANK OF CREDIT: A WORLDWIDE VIEWPOINT
ACADEMIC BANK OF CREDIT: A WORLDWIDE VIEWPOINTACADEMIC BANK OF CREDIT: A WORLDWIDE VIEWPOINT
ACADEMIC BANK OF CREDIT: A WORLDWIDE VIEWPOINTindexPub
 
India Economic Survey Complete for the year of 2022 to 2023
India Economic Survey Complete for the year of 2022 to 2023India Economic Survey Complete for the year of 2022 to 2023
India Economic Survey Complete for the year of 2022 to 2023SkillCircle
 
Contracts with Interdependent Preferences
Contracts with Interdependent PreferencesContracts with Interdependent Preferences
Contracts with Interdependent PreferencesGRAPE
 
LIC PRIVATISATION its a bane or boon.pptx
LIC PRIVATISATION its a bane or boon.pptxLIC PRIVATISATION its a bane or boon.pptx
LIC PRIVATISATION its a bane or boon.pptxsonamyadav7097
 
Stock Market Brief Deck for March 26.pdf
Stock Market Brief Deck for March 26.pdfStock Market Brief Deck for March 26.pdf
Stock Market Brief Deck for March 26.pdfMichael Silva
 

Dernier (20)

Stock Market Brief Deck for 3/22/2024.pdf
Stock Market Brief Deck for 3/22/2024.pdfStock Market Brief Deck for 3/22/2024.pdf
Stock Market Brief Deck for 3/22/2024.pdf
 
Mobile Money Taxes: Knowledge, Perceptions and Politics: The Case of Ghana
Mobile Money Taxes: Knowledge, Perceptions and Politics: The Case of GhanaMobile Money Taxes: Knowledge, Perceptions and Politics: The Case of Ghana
Mobile Money Taxes: Knowledge, Perceptions and Politics: The Case of Ghana
 
Monthly Economic Monitoring of Ukraine No.230, March 2024
Monthly Economic Monitoring of Ukraine No.230, March 2024Monthly Economic Monitoring of Ukraine No.230, March 2024
Monthly Economic Monitoring of Ukraine No.230, March 2024
 
20240314 Calibre March 2024 Investor Presentation (FINAL).pdf
20240314 Calibre March 2024 Investor Presentation (FINAL).pdf20240314 Calibre March 2024 Investor Presentation (FINAL).pdf
20240314 Calibre March 2024 Investor Presentation (FINAL).pdf
 
Lundin Gold March 2024 Corporate Presentation - PDAC v1.pdf
Lundin Gold March 2024 Corporate Presentation - PDAC v1.pdfLundin Gold March 2024 Corporate Presentation - PDAC v1.pdf
Lundin Gold March 2024 Corporate Presentation - PDAC v1.pdf
 
The CBR Covered Bond Investor Roundtable 2024
The CBR Covered Bond Investor Roundtable 2024The CBR Covered Bond Investor Roundtable 2024
The CBR Covered Bond Investor Roundtable 2024
 
Effects & Policies Of Bank Consolidation
Effects & Policies Of Bank ConsolidationEffects & Policies Of Bank Consolidation
Effects & Policies Of Bank Consolidation
 
Commercial Bank Economic Capsule - March 2024
Commercial Bank Economic Capsule - March 2024Commercial Bank Economic Capsule - March 2024
Commercial Bank Economic Capsule - March 2024
 
Introduction to Entrepreneurship and Characteristics of an Entrepreneur
Introduction to Entrepreneurship and Characteristics of an EntrepreneurIntroduction to Entrepreneurship and Characteristics of an Entrepreneur
Introduction to Entrepreneurship and Characteristics of an Entrepreneur
 
What Key Factors Should Risk Officers Consider When Using Generative AI
What Key Factors Should Risk Officers Consider When Using Generative AIWhat Key Factors Should Risk Officers Consider When Using Generative AI
What Key Factors Should Risk Officers Consider When Using Generative AI
 
ACCOUNTING FOR BUSINESS.II BRANCH ACCOUNTS NOTES
ACCOUNTING FOR BUSINESS.II BRANCH ACCOUNTS NOTESACCOUNTING FOR BUSINESS.II BRANCH ACCOUNTS NOTES
ACCOUNTING FOR BUSINESS.II BRANCH ACCOUNTS NOTES
 
Sarlat Advisory - Corporate Brochure - 2024
Sarlat Advisory - Corporate Brochure - 2024Sarlat Advisory - Corporate Brochure - 2024
Sarlat Advisory - Corporate Brochure - 2024
 
Monthly Market Risk Update: March 2024 [SlideShare]
Monthly Market Risk Update: March 2024 [SlideShare]Monthly Market Risk Update: March 2024 [SlideShare]
Monthly Market Risk Update: March 2024 [SlideShare]
 
ACADEMIC BANK OF CREDIT: A WORLDWIDE VIEWPOINT
ACADEMIC BANK OF CREDIT: A WORLDWIDE VIEWPOINTACADEMIC BANK OF CREDIT: A WORLDWIDE VIEWPOINT
ACADEMIC BANK OF CREDIT: A WORLDWIDE VIEWPOINT
 
India Economic Survey Complete for the year of 2022 to 2023
India Economic Survey Complete for the year of 2022 to 2023India Economic Survey Complete for the year of 2022 to 2023
India Economic Survey Complete for the year of 2022 to 2023
 
Contracts with Interdependent Preferences
Contracts with Interdependent PreferencesContracts with Interdependent Preferences
Contracts with Interdependent Preferences
 
LIC PRIVATISATION its a bane or boon.pptx
LIC PRIVATISATION its a bane or boon.pptxLIC PRIVATISATION its a bane or boon.pptx
LIC PRIVATISATION its a bane or boon.pptx
 
Stock Market Brief Deck for March 26.pdf
Stock Market Brief Deck for March 26.pdfStock Market Brief Deck for March 26.pdf
Stock Market Brief Deck for March 26.pdf
 
E-levy and Merchant Payment Exemption in Ghana
E-levy and Merchant Payment Exemption in GhanaE-levy and Merchant Payment Exemption in Ghana
E-levy and Merchant Payment Exemption in Ghana
 
Digital Financial Services Taxation in Africa
Digital Financial Services Taxation in AfricaDigital Financial Services Taxation in Africa
Digital Financial Services Taxation in Africa
 

Real-World Derivatives Hedging with Deep Reinforcement Learning

  • 1. Real-world Derivatives Hedging with Deep Reinforcement Learning Ashwin Rao ICME, Stanford University November 15, 2019 Ashwin Rao (Stanford) Deep Hedging November 15, 2019 1 / 9
  • 2. Classical Pricing and Hedging of Derivatives Classical Pricing/Hedging Theory assumes frictionless markets Frictionless := continuous trading, any volume, no transaction costs Assumptions of arbitrage-free and completeness lead to (dynamic and exact) replication of derivatives with basic market securities Replication strategy gives us the pricing and hedging solutions This is the foundation of the famous Black-Scholes formulas However, the real-world has many frictions ⇒ Incomplete Market ... where Derivatives cannot be exactly replicated Ashwin Rao (Stanford) Deep Hedging November 15, 2019 2 / 9
  • 3. Pricing and Hedging in an Incomplete Market In an incomplete market, we have multiple risk-neutral measures So, multiple derivative prices (each consistent with no-arbitrage) The market/trader “chooses” a risk-neutral measure (hence, price) Based on a specified preference for trading risk versus return This preference is equivalent to specifying a Utility function Maximizing “risk-adjusted return” of the derivative plus hedges Reminiscent of the Portfolio Optimization problem Likewise, we can set this up as a stochastic control (MDP) problem Where the decision at each time step is: Trades in the hedges So what’s the best way to solve this MDP? Ashwin Rao (Stanford) Deep Hedging November 15, 2019 3 / 9
  • 4. Deep Reinforcement Learning (DRL) Dynamic Programming not suitable in practice due to: Curse of Dimensionality Curse of Modeling So we solve the MDP with Deep Reinforcement Learning (DRL) The idea is to use real market data and real market frictions Developing realistic simulations to derive the optimal policy The optimal policy gives us the (practical) hedging strategy The optimal value function gives us the price (valuation) Formulation based on Deep Hedging paper by J.P.Morgan researchers More details in the prior paper by some of the same authors Ashwin Rao (Stanford) Deep Hedging November 15, 2019 4 / 9
  • 5. Problem Setup We will simplify the problem setup a bit for ease of exposition This model works for more complex, more frictionful markets too Assume time is in discrete (finite) steps t = 0, 1, . . . , T Assume we have a position (portfolio) D in m derivatives Assume each of these m derivatives expires in time ≤ T Portfolio-aggregated Contingent Cashflows at time t denoted Xt ∈ R Assume we have n basic market securities as potential hedges Hedge positions (units held) at time t denoted αt ∈ Rn Cashflows per unit of hedges held at time t denoted Yt ∈ Rn Prices per unit of hedges at time t denoted Pt ∈ Rn PnL position at time t is denoted as βt ∈ R Ashwin Rao (Stanford) Deep Hedging November 15, 2019 5 / 9
  • 6. States and Actions Denote state space at time t as St, state at time t as st ∈ St Among other things, st contains t, αt, Pt, βt, D st will include any market information relevant to trading actions For simplicity, we assume st is just the tuple (t, αt, Pt, βt, D) Denote action space at time t as At, action at time t as at ∈ At at represents units of hedges traded (positive for buy, negative for sell) Trading restrictions (eg: no short-selling) define At as a function of st State transitions Pt+1|Pt available from a simulator, whose internals are estimated from real market data and realistic assumptions Ashwin Rao (Stanford) Deep Hedging November 15, 2019 6 / 9
  • 7. Sequence of events at each time step t = 0, . . . , T 1 Observe state st = (t, αt, Pt, βt, D) 2 Realize cashflows (from holding positions) = Xt + αt · Yt 3 Perform action (trades) at to produce trading PnL = −at · Pt 4 Trading transaction costs, example = −γPt · |at| for some γ > 0 5 Update αt as: αt+1 = αt + at for t = 0, . . . , T (force-liquidation at termination means aT = αT ) 6 Update PnL βt as: βt+1 = βt + Xt + αt · Yt − at · Pt − γPt · |at| for t = 0, . . . , T 7 Reward rt = 0 for all t = 0, . . . , T − 1 and rT = U(βT+1) for an appropriate concave Utility function U (based on risk-aversion) 8 Simulator evolves hedge prices from Pt to Pt+1 Ashwin Rao (Stanford) Deep Hedging November 15, 2019 7 / 9
  • 8. Pricing and Hedging Assume we now want to enter into an incremental position (portfolio) D in m derivatives (denote the combined position as D ∪ D ) We want to determine the Price of the incremental position D , as well as the hedging strategy for D Denote the Optimal Value Function at time t as V ∗ t : St → R Pricing of D is based on the principle that introducing the incremental position of D together with a calibrated cashflow at t = 0 should leave the Optimal Value (at t = 0) unchanged Precisely, Price of D is the value x such that V ∗ 0 ((0, α0, P0, β0 − x, D ∪ D )) = V ∗ 0 ((0, α0, P0, β0, D)) The hedging strategy at time t is given by the Optimal Policy π∗ t : St → At Ashwin Rao (Stanford) Deep Hedging November 15, 2019 8 / 9
  • 9. DRL Approach a Breakthrough for Practical Trading? The industry practice/tradition has been to start with Complete Market assumption, and then layer ad-hoc/unsatisfactory adjustments There is some past work on pricing/hedging in incomplete markets But it’s theoretical and not usable in real trading (eg: Superhedging) My view: This DRL approach is a breakthrough for practical trading Key advantages of this DRL approach: Algorithm for pricing/hedging independent of market dynamics Computational cost scales efficiently with size m of derivatives portfolio Enables one to faithfully capture practical trading situations/constraints Deep Neural Networks provide great function approximation for RL Ashwin Rao (Stanford) Deep Hedging November 15, 2019 9 / 9