Reinforcement Learning and deep reinforcement learning

•Télécharger en tant que PPTX, PDF•

0 j'aime•5 vues

Reinforcement Learning: An Introduction Imitation Learning Lecture Slides from CMU Deep Reinforcement Learning Course We want a reinforcement learning agent to earn lots of reward The agent must prefer past actions that have been found to be effective at producing reward The agent must exploit what it already knows to obtain reward The agent must select untested actions to discover reward-producing actions The agent must explore actions to make better action selections in the future Trade-off between exploration and exploitation Reinforcement learning systems have 4 main elements: Policy Reward signal Value function Optional model of the environment Networks) Policy Gradient Methods (Finite Difference Policy Gradient, REINFORCE, Actor-Critic) Asynchronous Reinforcement Learning The reward signal defines the goal On each time step, the environment sends a single number called the reward to the reinforcement learning agent The agent’s objective is to maximise the total reward that it receives over the long run The reward signal is used to alter the policy Use the values to make and evaluate decisions Action choices are made based on value judgements Prefer actions that bring about states of highest value instead of highest reward Rewards are given directly by the environment Values must continually be re-estimated from the sequence of observations that an agent makes over its lifetime A model of the environment allows inferences to be made about how the environment will behave Example: Given a state and an action to be taken while in that state, the model could predict the next state and the next reward Models are used for planning, which means deciding on a course of action by considering possible future situations before they are experienced Model-based methods use models and planning. Think of this as modelling the dynamics p(s’ | s, a) Model-free methods learn exclusively from trial-and-error (i.e. no modelling of the environment) This presentation focuses on model-free methods

Ingénierie

Introduction to Reinforcement
Learning
Chapter 1 – Reinforcement Learning: An Introduction
Imitation Learning Lecture Slides from CMU Deep
Reinforcement Learning Course

Finite Markov Decision Processes
Chapter 3 – Reinforcement Learning: An Introduction

Temporal-Difference Learning
Chapter 6 – Reinforcement Learning: An Introduction
Playing Atari with Deep Reinforcement Learning
Asynchronous Methods for Deep Reinforcement Learning
David Silver’s Tutorial on Deep Reinforcement Learning

Policy Gradient Methods
Chapter 13 – Reinforcement Learning: An Introduction
Policy Gradient Lecture Slides from David Silver’s
Reinforcement Learning Course
David Silver’s Tutorial on Deep Reinforcement Learning

Asynchronous Reinforcement
Learning
Asynchronous Methods for Deep Reinforcement Learning

What is Asynchronous Reinforcement Learning?

Asynchronous one-step Q-learning Algorithm

Asynchronous n-step Q-learning Algorithm

Recommandé

25 introduction reinforcement_learningAndres Mendez-Vazquez

Introduction of Deep Reinforcement LearningNAVER Engineering

Reinforcement LearningDongHyun Kwak

How to formulate reinforcement learning in illustrative waysYasutoTamura1

Shanghai deep learning meetup 4Xiaohu ZHU

Harm van Seijen, Research Scientist, Maluuba at MLconf SF 2016MLconf

Reinforcement learningDongHyun Kwak

Reinforcement LearningSVijaylakshmi

Recommandé

25 introduction reinforcement_learningAndres Mendez-Vazquez

Introduction of Deep Reinforcement LearningNAVER Engineering

Reinforcement LearningDongHyun Kwak

How to formulate reinforcement learning in illustrative waysYasutoTamura1

Shanghai deep learning meetup 4Xiaohu ZHU

Harm van Seijen, Research Scientist, Maluuba at MLconf SF 2016MLconf

Reinforcement learningDongHyun Kwak

Reinforcement LearningSVijaylakshmi

Deep reinforcement learning from scratchJie-Han Chen

reinforcement learning in artificial intelligencepanditadesh123

An introduction to reinforcement learningJie-Han Chen

acai01-updated.pptbutest

Reinforcement learning Chandra Meena

Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityHung Le

Machine Learning: A gentle IntroductionMatthias Zimmermann

Real-world Reinforcement LearningMax Pagels

Similarity learningLearnbay Datascience

Real-world Reinforcement LearningMax Pagels

An AHP-based Framework for Quality and Security EvaluationPorfirio Tramontana

litvinenko_Henry_Intrusion_Hong-Kong_2024.pdfAlexander Litvinenko

5G and 6G refer to generations of mobile network technology, each representin...archanaece3

What is Coordinate Measuring Machine? CMM Types, Features, FunctionsVIEW

21P35A0312 Internship eccccccReport.docxrahulmanepalli02

Worksharing and 3D Modeling with Revit.pptxMustafa Ahmed

1_Introduction + EAM Vocabulary + how to navigate in EAM.pdfAldoGarca30

Augmented Reality (AR) with Augin Software.pptxMustafa Ahmed

8th International Conference on Soft Computing, Mathematics and Control (SMC ...josephjonse

01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...AshwaniAnuragi1

Databricks Generative AI Fundamentals .pdfVinayVadlagattu

Working Principle of Echo Sounder and Doppler Effect.pdfSkNahidulIslamShrabo

Contenu connexe

Similaire à Reinforcement Learning and deep reinforcement learning

Deep reinforcement learning from scratchJie-Han Chen

reinforcement learning in artificial intelligencepanditadesh123

An introduction to reinforcement learningJie-Han Chen

acai01-updated.pptbutest

Reinforcement learning Chandra Meena

Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityHung Le

Machine Learning: A gentle IntroductionMatthias Zimmermann

Real-world Reinforcement LearningMax Pagels

Similarity learningLearnbay Datascience

Real-world Reinforcement LearningMax Pagels

An AHP-based Framework for Quality and Security EvaluationPorfirio Tramontana

Similaire à Reinforcement Learning and deep reinforcement learning (11)

Deep reinforcement learning from scratch

reinforcement learning in artificial intelligence

An introduction to reinforcement learning

acai01-updated.ppt

Reinforcement learning

Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity

Machine Learning: A gentle Introduction

Real-world Reinforcement Learning

Similarity learning

Real-world Reinforcement Learning

An AHP-based Framework for Quality and Security Evaluation

Dernier

litvinenko_Henry_Intrusion_Hong-Kong_2024.pdfAlexander Litvinenko

5G and 6G refer to generations of mobile network technology, each representin...archanaece3

What is Coordinate Measuring Machine? CMM Types, Features, FunctionsVIEW

21P35A0312 Internship eccccccReport.docxrahulmanepalli02

Worksharing and 3D Modeling with Revit.pptxMustafa Ahmed

1_Introduction + EAM Vocabulary + how to navigate in EAM.pdfAldoGarca30

Augmented Reality (AR) with Augin Software.pptxMustafa Ahmed

8th International Conference on Soft Computing, Mathematics and Control (SMC ...josephjonse

01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...AshwaniAnuragi1

Databricks Generative AI Fundamentals .pdfVinayVadlagattu

Working Principle of Echo Sounder and Doppler Effect.pdfSkNahidulIslamShrabo

UNIT 4 PTRP final Convergence in probability.pptxkalpana413121

Signal Processing and Linear System AnalysisNational Chung Hsing University

Dr Mrs A A Miraje C Programming PPT.pptxProfAAMiraje

History of Indian Railways - the story of Growth & ModernizationEmaan Sharma

Degrees of freedom for the robots 1.pptxMostafa Mahmoud

Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...ssuserdfc773

Fundamentals of Structure in C ProgrammingChandrakantDivate1

Introduction to Artificial Intelligence ( AI)ChandrakantDivate1

Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfEr.Sonali Nasikkar

Dernier (20)

litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf

5G and 6G refer to generations of mobile network technology, each representin...

What is Coordinate Measuring Machine? CMM Types, Features, Functions

21P35A0312 Internship eccccccReport.docx

Worksharing and 3D Modeling with Revit.pptx

1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf

Augmented Reality (AR) with Augin Software.pptx

8th International Conference on Soft Computing, Mathematics and Control (SMC ...

01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...

Databricks Generative AI Fundamentals .pdf

Working Principle of Echo Sounder and Doppler Effect.pdf

UNIT 4 PTRP final Convergence in probability.pptx

Signal Processing and Linear System Analysis

Dr Mrs A A Miraje C Programming PPT.pptx

History of Indian Railways - the story of Growth & Modernization

Degrees of freedom for the robots 1.pptx

Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...

Fundamentals of Structure in C Programming

Introduction to Artificial Intelligence ( AI)

Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf

Reinforcement Learning and deep reinforcement learning

1. Reinforcement Learning

2. Overview

3. Introduction to Reinforcement Learning Chapter 1 – Reinforcement Learning: An Introduction Imitation Learning Lecture Slides from CMU Deep Reinforcement Learning Course

4. What is Reinforcement Learning?

5. Exploration versus Exploitation

6. Reinforcement Learning Systems

7. Policy

8. Reward Signal

9. Value Function (1)

10. Value Function (2)

11. Model-free versus Model-based

12. On-policy versus Off-policy

13. Credit Assignment Problem

14. Reward Design

15. What is Deep Reinforcement Learning?

16. Finite Markov Decision Processes Chapter 3 – Reinforcement Learning: An Introduction

17. Markov Decision Process (MDP)

18. Time Discounting

19. Agent-Environment Interaction (1)

20. Agent-Environment Interaction (2)

21. Action Selection

22. MDP Dynamics

23. State Transition Probabilities

24. Expected Rewards

25. State-Value Function (1)

26. State-Value Function (2)

27. Action-Value Function

28. Bellman Equation (1)

29. Bellman Equation (2)

30. Optimality

31. Temporal-Difference Learning Chapter 6 – Reinforcement Learning: An Introduction Playing Atari with Deep Reinforcement Learning Asynchronous Methods for Deep Reinforcement Learning David Silver’s Tutorial on Deep Reinforcement Learning

32. What is TD learning?

33. Value-based Reinforcement Learning

34. Update Rule for TD(0)

35. Update Rule Intuition

36. Tabular TD(0) Algorithm

37. SARSA – On-policy TD Control

38. SARSA Update Rule

39. SARSA Algorithm

40. Q-learning – Off-policy TD Control

41. One-step Q-learning Algorithm

42. Epsilon-greedy Policy

43. Deep Q-Networks (DQN)

44. Q-Networks

45. Experience Replay

46. State representation

47. Q-Network Training

48. Loss Function Gradient Derivation

49. DQN Algorithm

50. Comments

51. Policy Gradient Methods Chapter 13 – Reinforcement Learning: An Introduction Policy Gradient Lecture Slides from David Silver’s Reinforcement Learning Course David Silver’s Tutorial on Deep Reinforcement Learning

52. What are Policy Gradient Methods?

53. Policy-based Reinforcement Learning

54. Notation

55. Policy Approximation

56. Types of Policy Gradient Method

57. Finite Difference Policy Gradient

58. REINFORCE: Monte Carlo Policy Gradient

59. REINFORCE Properties

60. REINFORCE Algorithm

61. Actor-Critic Methods

62. One-step Actor-Critic Update Rules

63. One-step Actor-Critic Algorithm