SlideShare une entreprise Scribd logo
1  sur  15
Télécharger pour lire hors ligne
論⽂紹介
Skill Preferences:
Learning to Extract and
Execute Robotic Skills
from Human Feedback
ア ザ ラ シ
D E C . 1 1 . 2 0 2 1
⾃⼰紹介
• HN:アザラシ
• 好きなボード:M5Stack,Raspberry Pi
• 趣味:ロボット製作,⼭歩き,数学
• 興味:強化学習,四⾜歩⾏ロボット,微分幾何,圏論
• 専⾨:同次システム@⾮線形制御理論
• 棲息地:鴨川
強化学習とは
• 探索データから報酬関数に従って,より良い動作を獲得する⼿法
• つまり,報酬設計が超重要.
この論⽂の貢献
• (先⾏研究)
強化学習における報酬設計は,相当なエンジニアリングコストが課題
Human-in-the-loop RLは,訓練中に⼈との対話的なフィードバックを実施することで,ハ
ンドメイドの報酬設計を不要にした
• (この論⽂の課題)
タスクの複雑さが増すと,適切な⽅策を獲得するまでに⾮現実的な数の
⼈との対話的フィードバックを必要とすることが課題
⽐較的少ないフィードバック数で,⼈の好みだけでなく好みから抜け落ちたスキル抽出まで
実施する
⼿法の概要
SAC
アルゴリズムのプロセス(1/2)
Step 1: Collect offline dataset ℬ
(Expert demo + Random policy)
Step 2: Provide labels “good/bad”
for 10% of ℬ 𝒟
Step 3: Train preference classifier 𝑃!
for 𝒟
Step 4: Train decoder 𝑝"!
and encoder 𝑞""
with 𝑃!
アルゴリズムのプロセス(2/2)
Step 1: Execute yellow loop
(if iteration % K == 0)
Step 2: Execute red loop
Step 3: Update Agent SAC
actor 𝜋#!
and critic 𝑄#"
, 𝑄$
#"
[Note] Use learned decoder 𝑝"!
SAC
𝑃!の学習
• Preference classifier 𝑃!(𝑦|𝜏),
where 𝑦 ∈ 0, 1 , 𝜏 is trajectory(state-action) sequences
• Update 𝜓 by maximizing loss function(cross entropy):
𝔼 %,' ∼𝒟 𝑦 ⋅ log 𝑃! 𝜏 + 1 − 𝑦 ⋅ log 1 − 𝑃! 𝜏
• [Note] オフラインデータセットの部分集合にラベル付け(𝑦 ∈ 0,1 )したものから学習
enc, decの学習
• skill-encoder 𝑞""
𝑧 𝜏 and skill-decoder 𝑝"!
𝑎*, … , 𝑎*+,-. 𝑠*, 𝑧
where 𝑧 ∈ 𝒵 is skill, 𝑠 ∈ 𝒮 is state, 𝑎 ∈ 𝒜 is action, 𝜏 is trajectory(state-
action) sequences.
• Update 𝑝"!
and 𝑞""
by maximizing loss function(ELBO of 𝛽-VAE with Gaussian prior with 𝑃!)
𝔼'∼𝒟,/∼0#(/|') 𝑃! 𝜏 ℒ456789:4;6:<78 + 𝛽 ⋅ ℒ45=;>?4<@?:<78
"
ℛ"の学習
• Update 𝜂 by minimizing loss function(binary cross-entropy):
where is distribution of Bradley-Terry model.
• [Note] 演算⼦ A ≻ B は,AがBよりも優先されることの意味.
実験(1/4)
• 以下の作業を実⾏できるか確認する
• Baselineとして,PEBBLE(PMLR2021)と⽐較する
実験(2/4)
• Skill Extractionは以下のように⾏う.
実験(3/4)
• SkiPは,2つ以上の連続する複雑なタスクでも成功している
実験(4/4)
• SkiPは,⼈からのフィードバックがないと成功しない
考察
• 👍 / 👎 する作業はやりたくないな
• Human Interactionの分野はこれからの発展が楽しみ😃
• 試⾏回数はまだ多いなって印象

Contenu connexe

Similaire à 論文紹介 Skill preferences

林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning
台灣資料科學年會
 
Deep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesDeep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep Features
Turi, Inc.
 
lec01.pptx
lec01.pptxlec01.pptx
lec01.pptx
Basavaraju43
 
27332020002_PC-CS601_Robotics_Debjit Doira.pdf
27332020002_PC-CS601_Robotics_Debjit Doira.pdf27332020002_PC-CS601_Robotics_Debjit Doira.pdf
27332020002_PC-CS601_Robotics_Debjit Doira.pdf
Adharchandsaha
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
PyData
 
Hadoop Summit 2010 Machine Learning Using Hadoop
Hadoop Summit 2010 Machine Learning Using HadoopHadoop Summit 2010 Machine Learning Using Hadoop
Hadoop Summit 2010 Machine Learning Using Hadoop
Yahoo Developer Network
 

Similaire à 論文紹介 Skill preferences (20)

林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning
 
Deep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesDeep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep Features
 
lec01.pptx
lec01.pptxlec01.pptx
lec01.pptx
 
Scalable image recognition model with deep embedding
Scalable image recognition model with deep embeddingScalable image recognition model with deep embedding
Scalable image recognition model with deep embedding
 
27332020002_PC-CS601_Robotics_Debjit Doira.pdf
27332020002_PC-CS601_Robotics_Debjit Doira.pdf27332020002_PC-CS601_Robotics_Debjit Doira.pdf
27332020002_PC-CS601_Robotics_Debjit Doira.pdf
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotDeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François Garillot
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotDeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François Garillot
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotDeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François Garillot
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
MILA DL & RL summer school highlights
MILA DL & RL summer school highlights MILA DL & RL summer school highlights
MILA DL & RL summer school highlights
 
Number Crunching in Python
Number Crunching in PythonNumber Crunching in Python
Number Crunching in Python
 
Optimization
OptimizationOptimization
Optimization
 
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
 
Introduction to machine_learning
Introduction to machine_learningIntroduction to machine_learning
Introduction to machine_learning
 
Hadoop Summit 2010 Machine Learning Using Hadoop
Hadoop Summit 2010 Machine Learning Using HadoopHadoop Summit 2010 Machine Learning Using Hadoop
Hadoop Summit 2010 Machine Learning Using Hadoop
 
AI Technology Overview and Career Advice
AI Technology Overview and Career AdviceAI Technology Overview and Career Advice
AI Technology Overview and Career Advice
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications
 
Modern recommender system in large content website
Modern recommender system in large content websiteModern recommender system in large content website
Modern recommender system in large content website
 

Dernier

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
FIDO Alliance
 

Dernier (20)

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهله
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 

論文紹介 Skill preferences