SlideShare une entreprise Scribd logo
1  sur  80
Reinforcement Learning
Overview
Introduction to Reinforcement
Learning
Chapter 1 – Reinforcement Learning: An Introduction
Imitation Learning Lecture Slides from CMU Deep
Reinforcement Learning Course
What is Reinforcement Learning?
Exploration versus Exploitation
Reinforcement Learning Systems
Policy
Reward Signal
Value Function (1)
Value Function (2)
Model-free versus Model-based
On-policy versus Off-policy
Credit Assignment Problem
Reward Design
What is Deep Reinforcement Learning?
Finite Markov Decision Processes
Chapter 3 – Reinforcement Learning: An Introduction
Markov Decision Process (MDP)
Time Discounting
Agent-Environment Interaction (1)
Agent-Environment Interaction (2)
Action Selection
MDP Dynamics
State Transition Probabilities
Expected Rewards
State-Value Function (1)
State-Value Function (2)
Action-Value Function
Bellman Equation (1)
Bellman Equation (2)
Optimality
Temporal-Difference Learning
Chapter 6 – Reinforcement Learning: An Introduction
Playing Atari with Deep Reinforcement Learning
Asynchronous Methods for Deep Reinforcement Learning
David Silver’s Tutorial on Deep Reinforcement Learning
What is TD learning?
Value-based Reinforcement Learning
Update Rule for TD(0)
Update Rule Intuition
Tabular TD(0) Algorithm
SARSA – On-policy TD Control
SARSA Update Rule
SARSA Algorithm
Q-learning – Off-policy TD Control
One-step Q-learning Algorithm
Epsilon-greedy Policy
Deep Q-Networks (DQN)
Q-Networks
Experience Replay
State representation
Q-Network Training
Loss Function Gradient Derivation
DQN Algorithm
Comments
Policy Gradient Methods
Chapter 13 – Reinforcement Learning: An Introduction
Policy Gradient Lecture Slides from David Silver’s
Reinforcement Learning Course
David Silver’s Tutorial on Deep Reinforcement Learning
What are Policy Gradient Methods?
Policy-based Reinforcement Learning
Notation
Policy Approximation
Types of Policy Gradient Method
Finite Difference Policy Gradient
REINFORCE: Monte Carlo Policy Gradient
REINFORCE Properties
REINFORCE Algorithm
Actor-Critic Methods
One-step Actor-Critic Update Rules
One-step Actor-Critic Algorithm
Asynchronous Reinforcement
Learning
Asynchronous Methods for Deep Reinforcement Learning
What is Asynchronous Reinforcement Learning?
Parallelism (1)
Parallelism (2)
No Experience Replay
Asynchronous Algorithms
Asynchronous one-step Q-learning
Exploration
Asynchronous one-step Q-learning Algorithm
Asynchronous one-step SARSA
n-step Q-learning
n-step Returns
Asynchronous n-step Q-learning Algorithm
A3C
Advantage Definition
A3C Algorithm
Summary

Contenu connexe

Similaire à Reinforcement Learning and deep reinforcement learning

Deep reinforcement learning from scratch
Deep reinforcement learning from scratchDeep reinforcement learning from scratch
Deep reinforcement learning from scratchJie-Han Chen
 
reinforcement learning in artificial intelligence
reinforcement learning in artificial intelligencereinforcement learning in artificial intelligence
reinforcement learning in artificial intelligencepanditadesh123
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learningJie-Han Chen
 
acai01-updated.ppt
acai01-updated.pptacai01-updated.ppt
acai01-updated.pptbutest
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning Chandra Meena
 
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityHung Le
 
Machine Learning: A gentle Introduction
Machine Learning: A gentle IntroductionMachine Learning: A gentle Introduction
Machine Learning: A gentle IntroductionMatthias Zimmermann
 
Real-world Reinforcement Learning
Real-world Reinforcement LearningReal-world Reinforcement Learning
Real-world Reinforcement LearningMax Pagels
 
Real-world Reinforcement Learning
Real-world Reinforcement LearningReal-world Reinforcement Learning
Real-world Reinforcement LearningMax Pagels
 
An AHP-based Framework for Quality and Security Evaluation
An AHP-based Framework for Quality and Security EvaluationAn AHP-based Framework for Quality and Security Evaluation
An AHP-based Framework for Quality and Security EvaluationPorfirio Tramontana
 

Similaire à Reinforcement Learning and deep reinforcement learning (11)

Deep reinforcement learning from scratch
Deep reinforcement learning from scratchDeep reinforcement learning from scratch
Deep reinforcement learning from scratch
 
reinforcement learning in artificial intelligence
reinforcement learning in artificial intelligencereinforcement learning in artificial intelligence
reinforcement learning in artificial intelligence
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learning
 
acai01-updated.ppt
acai01-updated.pptacai01-updated.ppt
acai01-updated.ppt
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning
 
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
 
Machine Learning: A gentle Introduction
Machine Learning: A gentle IntroductionMachine Learning: A gentle Introduction
Machine Learning: A gentle Introduction
 
Real-world Reinforcement Learning
Real-world Reinforcement LearningReal-world Reinforcement Learning
Real-world Reinforcement Learning
 
Similarity learning
  Similarity learning  Similarity learning
Similarity learning
 
Real-world Reinforcement Learning
Real-world Reinforcement LearningReal-world Reinforcement Learning
Real-world Reinforcement Learning
 
An AHP-based Framework for Quality and Security Evaluation
An AHP-based Framework for Quality and Security EvaluationAn AHP-based Framework for Quality and Security Evaluation
An AHP-based Framework for Quality and Security Evaluation
 

Dernier

litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdfAlexander Litvinenko
 
5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...archanaece3
 
What is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsWhat is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsVIEW
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docxrahulmanepalli02
 
Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxMustafa Ahmed
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdfAldoGarca30
 
Augmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxAugmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxMustafa Ahmed
 
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...josephjonse
 
01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...
01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...
01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...AshwaniAnuragi1
 
Databricks Generative AI Fundamentals .pdf
Databricks Generative AI Fundamentals  .pdfDatabricks Generative AI Fundamentals  .pdf
Databricks Generative AI Fundamentals .pdfVinayVadlagattu
 
Working Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdfWorking Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdfSkNahidulIslamShrabo
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxkalpana413121
 
Dr Mrs A A Miraje C Programming PPT.pptx
Dr Mrs A A Miraje C Programming PPT.pptxDr Mrs A A Miraje C Programming PPT.pptx
Dr Mrs A A Miraje C Programming PPT.pptxProfAAMiraje
 
History of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationHistory of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationEmaan Sharma
 
Degrees of freedom for the robots 1.pptx
Degrees of freedom for the robots 1.pptxDegrees of freedom for the robots 1.pptx
Degrees of freedom for the robots 1.pptxMostafa Mahmoud
 
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...ssuserdfc773
 
Fundamentals of Structure in C Programming
Fundamentals of Structure in C ProgrammingFundamentals of Structure in C Programming
Fundamentals of Structure in C ProgrammingChandrakantDivate1
 
Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)ChandrakantDivate1
 
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfInstruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfEr.Sonali Nasikkar
 

Dernier (20)

litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
 
5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...
 
What is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsWhat is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, Functions
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
 
Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptx
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Augmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxAugmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptx
 
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
 
01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...
01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...
01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...
 
Databricks Generative AI Fundamentals .pdf
Databricks Generative AI Fundamentals  .pdfDatabricks Generative AI Fundamentals  .pdf
Databricks Generative AI Fundamentals .pdf
 
Working Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdfWorking Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdf
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptx
 
Signal Processing and Linear System Analysis
Signal Processing and Linear System AnalysisSignal Processing and Linear System Analysis
Signal Processing and Linear System Analysis
 
Dr Mrs A A Miraje C Programming PPT.pptx
Dr Mrs A A Miraje C Programming PPT.pptxDr Mrs A A Miraje C Programming PPT.pptx
Dr Mrs A A Miraje C Programming PPT.pptx
 
History of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationHistory of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & Modernization
 
Degrees of freedom for the robots 1.pptx
Degrees of freedom for the robots 1.pptxDegrees of freedom for the robots 1.pptx
Degrees of freedom for the robots 1.pptx
 
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
 
Fundamentals of Structure in C Programming
Fundamentals of Structure in C ProgrammingFundamentals of Structure in C Programming
Fundamentals of Structure in C Programming
 
Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)
 
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfInstruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
 

Reinforcement Learning and deep reinforcement learning