論文紹介 Skill preferences

•

0 j'aime•206 vues

A

CoRL2021読み会で紹介

論⽂紹介
Skill Preferences:
Learning to Extract and
Execute Robotic Skills
from Human Feedback
アザラシ
D E C . 1 1 . 2 0 2 1

⾃⼰紹介
• HN：アザラシ
• 好きなボード：M5Stack，Raspberry Pi
• 趣味：ロボット製作，⼭歩き，数学
• 興味：強化学習，四⾜歩⾏ロボット，微分幾何，圏論
• 専⾨：同次システム＠⾮線形制御理論
• 棲息地：鴨川

強化学習とは
• 探索データから報酬関数に従って，より良い動作を獲得する⼿法
• つまり，報酬設計が超重要．

この論⽂の貢献
• (先⾏研究)
強化学習における報酬設計は，相当なエンジニアリングコストが課題
Human-in-the-loop RLは，訓練中に⼈との対話的なフィードバックを実施することで，ハ
ンドメイドの報酬設計を不要にした
• (この論⽂の課題)
タスクの複雑さが増すと，適切な⽅策を獲得するまでに⾮現実的な数の
⼈との対話的フィードバックを必要とすることが課題
⽐較的少ないフィードバック数で，⼈の好みだけでなく好みから抜け落ちたスキル抽出まで
実施する

⼿法の概要
SAC

アルゴリズムのプロセス(1/2)
Step 1: Collect offline dataset ℬ
(Expert demo + Random policy)
Step 2: Provide labels “good/bad”
for 10% of ℬ 𝒟
Step 3: Train preference classifier 𝑃!
for 𝒟
Step 4: Train decoder 𝑝"!
and encoder 𝑞""
with 𝑃!

アルゴリズムのプロセス(2/2)
Step 1: Execute yellow loop
(if iteration % K == 0)
Step 2: Execute red loop
Step 3: Update Agent SAC
actor 𝜋#!
and critic 𝑄#"
, 𝑄$
#"
[Note] Use learned decoder 𝑝"!
SAC

𝑃!の学習
• Preference classifier 𝑃!(𝑦|𝜏),
where 𝑦 ∈ 0, 1 , 𝜏 is trajectory(state-action) sequences
• Update 𝜓 by maximizing loss function(cross entropy)：
𝔼 %,' ∼𝒟 𝑦 ⋅ log 𝑃! 𝜏 + 1 − 𝑦 ⋅ log 1 − 𝑃! 𝜏
• [Note] オフラインデータセットの部分集合にラベル付け(𝑦 ∈ 0,1 )したものから学習

enc, decの学習
• skill-encoder 𝑞""
𝑧 𝜏 and skill-decoder 𝑝"!
𝑎*, … , 𝑎*+,-. 𝑠*, 𝑧
where 𝑧 ∈ 𝒵 is skill, 𝑠 ∈ 𝒮 is state, 𝑎 ∈ 𝒜 is action, 𝜏 is trajectory(state-
action) sequences.
• Update 𝑝"!
and 𝑞""
by maximizing loss function(ELBO of 𝛽-VAE with Gaussian prior with 𝑃!)
𝔼'∼𝒟,/∼0#(/|') 𝑃! 𝜏 ℒ456789:4;6:<78 + 𝛽 ⋅ ℒ45=;>?4<@?:<78

"
ℛ"の学習
• Update 𝜂 by minimizing loss function(binary cross-entropy):
where is distribution of Bradley-Terry model.
• [Note] 演算⼦ A ≻ B は，AがBよりも優先されることの意味．

実験(1/4)
• 以下の作業を実⾏できるか確認する
• Baselineとして，PEBBLE(PMLR2021)と⽐較する

実験(2/4)
• Skill Extractionは以下のように⾏う．

実験(3/4)
• SkiPは，2つ以上の連続する複雑なタスクでも成功している

実験(4/4)
• SkiPは，⼈からのフィードバックがないと成功しない

考察
• 👍 / 👎 する作業はやりたくないな
• Human Interactionの分野はこれからの発展が楽しみ😃
• 試⾏回数はまだ多いなって印象

Contenu connexe

Similaire à 論文紹介 Skill preferences

Shou-de Lin is currently a full professor in the CSIE department of National Taiwan University. He holds a BS in EE department from National Taiwan University, an MS-EE from the University of Michigan, and an MS in Computational Linguistics and PhD in Computer Science both from the University of Southern California. He leads the Machine Discovery and Social Network Mining Lab in NTU. Before joining NTU, he was a post-doctoral research fellow at the Los Alamos National Lab. Prof. Lin's research includes the areas of machine learning and data mining, social network analysis, and natural language processing. His international recognition includes the best paper award in IEEE Web Intelligent conference 2003, Google Research Award in 2007, Microsoft research award in 2008, merit paper award in TAAI 2010, best paper award in ASONAM 2011, US Aerospace AFOSR/AOARD research award winner for 5 years. He is the all-time winners in ACM KDD Cup, leading or co-leading the NTU team to win 5 championships. He also leads a team to win WSDM Cup 2016 Champion. He has served as the senior PC for SIGKDD and area chair for ACL. He is currently the associate editor for International Journal on Social Network Mining, Journal of Information Science and Engineering, and International Journal of Computational Linguistics and Chinese Language Processing. He receives the Young Scholars' Creativity Award from Foundation for the Advancement of Outstanding Scholarship and Ta-You Wu Memorial Award.

林守德/Practical Issues in Machine Learning

林守德/Practical Issues in Machine Learning

林守德/Practical Issues in Machine Learning

台灣資料科學年會

Deep Learning Made Easy with Deep Features

Deep Learning Made Easy with Deep Features

Deep Learning Made Easy with Deep Features

deep learningFeature selection using Deep Neural Networks March 18, 2016 CSI 991 Kevin Ham 2 Neural Networks Basics Perceptron Equivalent performance to least mean square algorithm (linear regression) Activation Function Sigmoid, Hyperbolic Tangent Multi Layer Perceptrons Chains of perceptrons, perform feature extraction Training the network Training set, validation set, generalization set Back propagation 3 Perceptron and Activation Function The basic building block of Neural Networks (1) Summation of weighted inputs Bias performs change in y-intercept Output is present when the activation Threshold is overcome Activation function must be differentiable  w1 w3 w2 Activation function Input 1 Input 2 Input 3 output Bias 4 Multi-Layer Perceptrons and Training Classification with 20 node MLP NN (4) Feature extraction with 5 layered Convolutional Neural Network (2) Feature Extraction with MLP NN (4) 5 Article Objectives … we propose a supervised approach for task-aware selection of features using Deep Neural Networks (DNN) in the context of action recognition (e.g. walking, running, jumping). (1) … selected features are found to give better classification performance than the original high-dimensional features. (1) It is also shown that the classification performance of the proposed feature selection technique is superior to the low-dimensional representation obtained by principal component analysis (PCA). (1) 6 Methodology … analyze the contribution of each of the input dimensions to identify the features (inputs) important for classification (1) … to correctly analyze the contribution of an input feature, we study its activation potential (averaged over all training values of the input and hidden neurons) relative to the total activation potential (1) The higher the activation potential contribution of an input dimension, the more likely is its participation in hidden neuronal activity and consequently, classification. (1)Feature selection using Deep Neural Networks March 18, 2016 CSI 991 Kevin Ham 2 Neural Networks Basics Perceptron Equivalent performance to least mean square algorithm (linear regression) Activation Function Sigmoid, Hyperbolic Tangent Multi Layer Perceptrons Chains of perceptrons, perform feature extraction Training the network Training set, validation set, generalization set Back propagation 3 Perceptron and Activation Function The basic building block of Neural Networks (1) Summation of weighted inputs Bias performs change in y-intercept Output is present when the activation Threshold is overcome Activation function must be differentiable  w1 w3 w2 Activation function Input 1 Input 2 Input 3 output Bias 4 Multi-Layer Perceptrons and Training Classification with 20 node MLP NN (4) Feature extraction with 5 layered Convolutional Neural Network (2) Feature Extraction with MLP NN (4) 5 Article Objectives … we propose a supervised approach fo

lec01.pptx

Scalable image recognition model with deep embedding

Scalable image recognition model with deep embedding

Scalable image recognition model with deep embedding

27332020002_PC-CS601_Robotics_Debjit Doira.pdf

27332020002_PC-CS601_Robotics_Debjit Doira.pdf

27332020002_PC-CS601_Robotics_Debjit Doira.pdf

At the recent sold-out Spark & Machine Learning Meetup in Brussels, François Garillot of Skymind delivered a lightning talk called DeepLearning4J and Spark: Successes and Challenges. Specifically, François offered a tour of the DeepLearning4J architecture intermingled with applications. He went over the main blocks of this deep learning solution for the JVM that includes GPU acceleration, a custom n-dimensional array library, a parallelized data-loading swiss army tool, deep learning and reinforcement learning libraries — all with an easy-access interface. Along the way, he pointed out the strategic points of parallelization of computation across machines and gave insight on where Spark helps — and where it doesn't.

DeepLearning4J and Spark: Successes and Challenges - François Garillot

DeepLearning4J and Spark: Successes and Challenges - François Garillot

DeepLearning4J and Spark: Successes and Challenges - François Garillot

At the recent sold-out Spark & Machine Learning Meetup in Brussels, François Garillot of Skymind delivered a lightning talk called DeepLearning4J and Spark: Successes and Challenges. Specifically, François offered a tour of the DeepLearning4J architecture intermingled with applications. He went over the main blocks of this deep learning solution for the JVM that includes GPU acceleration, a custom n-dimensional array library, a parallelized data-loading swiss army tool, deep learning and reinforcement learning libraries — all with an easy-access interface. Along the way, he pointed out the strategic points of parallelization of computation across machines and gave insight on where Spark helps — and where it doesn't.

DeepLearning4J and Spark: Successes and Challenges - François Garillot

DeepLearning4J and Spark: Successes and Challenges - François Garillot

DeepLearning4J and Spark: Successes and Challenges - François Garillot

At the recent Spark & Machine Learning Meetup in Brussels, François Garillot of Skymind delivered this lightning talk to a sold-out crowd. Specifically, François offered a tour of the DeepLearning4J architecture intermingled with applications. He went over the main blocks of this deep learning solution for the JVM that includes GPU acceleration, a custom n-dimensional array library, a parallelized data-loading swiss army tool, deep learning and reinforcement learning libraries — all with an easy-access interface. Along the way, he pointed out the strategic points of parallelization of computation across machines and gave insight on where Spark helps — and where it doesn't.

DeepLearning4J and Spark: Successes and Challenges - François Garillot

DeepLearning4J and Spark: Successes and Challenges - François Garillot

DeepLearning4J and Spark: Successes and Challenges - François Garillot

Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak

Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak

Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

MILA DL & RL summer school highlights

MILA DL & RL summer school highlights

MILA DL & RL summer school highlights

Natalia Díaz Rodríguez

Number Crunching in Python

Number Crunching in Python

Number Crunching in Python

Innovations in technology has revolutionized financial services to an extent that large financial institutions like Goldman Sachs are claiming to be technology companies! It is no secret that technological innovations like Data science and AI are changing fundamentally how financial products are created, tested and delivered. While it is exciting to learn about technologies themselves, there is very little guidance available to companies and financial professionals should retool and gear themselves towards the upcoming revolution. In this master class, we will discuss key innovations in Data Science and AI and connect applications of these novel fields in forecasting and optimization. Through case studies and examples, we will demonstrate why now is the time you should invest to learn about the topics that will reshape the financial services industry of the future! Topic - Frontier topics in Optimization

Optimization

QuantUniversity

Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015

Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015

Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015

Introduction to machine_learning

Introduction to machine_learning

Introduction to machine_learning

Hadoop Summit 2010 Machine Learning Using Hadoop

Hadoop Summit 2010 Machine Learning Using Hadoop

Hadoop Summit 2010 Machine Learning Using Hadoop

Yahoo Developer Network

AI Technology Overview and Career Advice

AI Technology Overview and Career Advice

AI Technology Overview and Career Advice

Introduction of Deep Reinforcement Learning

Introduction of Deep Reinforcement Learning

Introduction of Deep Reinforcement Learning

NAVER Engineering

Speaker: Arjun Bansal, co-founder of Nervana Systems Arjun Bansal’s workshop focused on neon, an open-source python based deep learning framework that has been build from the ground up for speed and ease of use. The workshop highlights how to use neon, build Recurrent Recurrent Neural Networks to generate and analyze text, and build Convolutional Autoencoders to generate images and to localize objects. Arjun also demoed the integration of neon with the Nervana cloud (in private beta) for multi-GPU training of deep networks.

Startup.Ml: Using neon for NLP and Localization Applications

Startup.Ml: Using neon for NLP and Localization Applications

Startup.Ml: Using neon for NLP and Localization Applications

Modern recommender system in large content website

Modern recommender system in large content website

Modern recommender system in large content website

Cyrus Chien-Ching Chiu

Similaire à 論文紹介 Skill preferences (20)

林守德/Practical Issues in Machine Learning

林守德/Practical Issues in Machine Learning

林守德/Practical Issues in Machine Learning

Deep Learning Made Easy with Deep Features

Deep Learning Made Easy with Deep Features

Deep Learning Made Easy with Deep Features

lec01.pptx

Scalable image recognition model with deep embedding

Scalable image recognition model with deep embedding

Scalable image recognition model with deep embedding

27332020002_PC-CS601_Robotics_Debjit Doira.pdf

27332020002_PC-CS601_Robotics_Debjit Doira.pdf

27332020002_PC-CS601_Robotics_Debjit Doira.pdf

DeepLearning4J and Spark: Successes and Challenges - François Garillot

DeepLearning4J and Spark: Successes and Challenges - François Garillot

DeepLearning4J and Spark: Successes and Challenges - François Garillot

DeepLearning4J and Spark: Successes and Challenges - François Garillot

DeepLearning4J and Spark: Successes and Challenges - François Garillot

DeepLearning4J and Spark: Successes and Challenges - François Garillot

DeepLearning4J and Spark: Successes and Challenges - François Garillot

DeepLearning4J and Spark: Successes and Challenges - François Garillot

DeepLearning4J and Spark: Successes and Challenges - François Garillot

Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak

Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak

Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

MILA DL & RL summer school highlights

MILA DL & RL summer school highlights

MILA DL & RL summer school highlights

Number Crunching in Python

Number Crunching in Python

Number Crunching in Python

Optimization

Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015

Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015

Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015

Introduction to machine_learning

Introduction to machine_learning

Introduction to machine_learning

Hadoop Summit 2010 Machine Learning Using Hadoop

Hadoop Summit 2010 Machine Learning Using Hadoop

Hadoop Summit 2010 Machine Learning Using Hadoop

AI Technology Overview and Career Advice

AI Technology Overview and Career Advice

AI Technology Overview and Career Advice

Introduction of Deep Reinforcement Learning

Introduction of Deep Reinforcement Learning

Introduction of Deep Reinforcement Learning

Startup.Ml: Using neon for NLP and Localization Applications

Startup.Ml: Using neon for NLP and Localization Applications

Startup.Ml: Using neon for NLP and Localization Applications

Modern recommender system in large content website

Modern recommender system in large content website

Modern recommender system in large content website

Dernier

The microservices honeymoon is over. When starting a new project or revamping a legacy monolith, teams started looking for alternatives to microservices. The Modular Monolith, or 'Modulith', is an architecture that reaps the benefits of (vertical) functional decoupling without the high costs associated with separate deployments. This talk will delve into the advantages and challenges of this progressive architecture, beginning with exploring the concept of a 'module', its internal structure, public API, and inter-module communication patterns. Supported by spring-modulith, the talk provides practical guidance on addressing the main challenges of a Modultith Architecture: finding and guarding module boundaries, data decoupling, and integration module-testing. You should not miss this talk if you are a software architect or tech lead seeking practical, scalable solutions. About the author With two decades of experience, Victor is a Java Champion working as a trainer for top companies in Europe. Five thousands developers in 120 companies attended his workshops, so he gets to debate every week the challenges that various projects struggle with. In return, Victor summarizes key points from these workshops in conference talks and online meetups for the European Software Crafters, the world’s largest developer community around architecture, refactoring, and testing. Discover how Victor can help you on victorrentea.ro : company training catalog, consultancy and YouTube playlists.

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Juan lago vázquez

Understanding the FAA Part 107 License ..

Understanding the FAA Part 107 License ..

Understanding the FAA Part 107 License ..

Christopher Logan Kennedy

الأمن السيبراني - ما لا يسع للمستخدم جهله

الأمن السيبراني - ما لا يسع للمستخدم جهله

الأمن السيبراني - ما لا يسع للمستخدم جهله

Mohamed Sweelam

Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows. We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases. This video focuses on the deployment of external web forms using Jotform for Bonterra Impact Management. This solution can be customized to your organization’s needs and deployed to support the common use cases below: - Intake and consent - Assessments - Surveys - Applications - Program registration Interested in deploying web form automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Jeffrey Haguewood

Six Myths about Ontologies: The Basics of Formal Ontology

Six Myths about Ontologies: The Basics of Formal Ontology

Six Myths about Ontologies: The Basics of Formal Ontology

johnbeverley2021

Vector Search -An Introduction in Oracle Database 23ai.pptx

Vector Search -An Introduction in Oracle Database 23ai.pptx

Vector Search -An Introduction in Oracle Database 23ai.pptx

Remote DBA Services

Design and Development of a Provenance Capture Platform for Data Science

Design and Development of a Provenance Capture Platform for Data Science

Design and Development of a Provenance Capture Platform for Data Science

DBX First Quarter 2024 Investor Presentation

DBX First Quarter 2024 Investor Presentation

DBX First Quarter 2024 Investor Presentation

At its core, the challenge of managing Human Resources data is an integration challenge: estimates range from 2-3 HR systems in use at a typical SMB, up to a few dozen systems implemented amongst enterprise HR departments, and these systems seldom integrate seamlessly between themselves. Providing a multi-tenant, cloud-native solution to integrate these hundreds of HR-related systems, normalize their disparate data models and then render that consolidated information for stakeholder decision making has been a substantial undertaking, but one significantly eased by leveraging Ballerina. In this session, we’ll cover: The overall software architecture for VHR’s Cloud Data Platform Critical decision points leading to adoption of Ballerina for the CDP Ballerina’s role in multiple evolutionary steps to the current architecture Roadmap for the CDP architecture and plans for Ballerina WSO2’s partnership in bringing continual success for the CD

Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform

Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform

Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform

In this keynote, Asanka Abeysinghe, CTO,WSO2 will explore the shift towards platformless technology ecosystems and their importance in driving digital adaptability and innovation. We will discuss strategies for leveraging decentralized architectures and integrating diverse technologies, with a focus on building resilient, flexible, and future-ready IT infrastructures. We will also highlight WSO2's roadmap, emphasizing our commitment to supporting this transformative journey with our evolving product suite.

Platformless Horizons for Digital Adaptability

Platformless Horizons for Digital Adaptability

Platformless Horizons for Digital Adaptability

Following the popularity of "Cloud Revolution: Exploring the New Wave of Serverless Spatial Data," we're thrilled to announce this much-anticipated encore webinar. In this sequel, we'll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you're building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

The presentation was made in “Web3 Fusion: Embracing AI and Beyond” is more than a conference; it's a journey into the heart of digital transformation. The conference a provided a platform where the future of technology meets practical application. This three-day hybrid event, set in the heart of innovation, served as a gateway to the latest trends and transformative discussions in AI, Blockchain, IoT, AR/VR, and their collective impact on the information space.

AI in Action: Real World Use Cases by Anitaraj

AI in Action: Real World Use Cases by Anitaraj

AI in Action: Real World Use Cases by Anitaraj

How to Check CNIC Information Online with Pakdata cf

How to Check CNIC Information Online with Pakdata cf

How to Check CNIC Information Online with Pakdata cf

Tales from a Passkey Provider Progress from Awareness to Implementation.pptx

Tales from a Passkey Provider Progress from Awareness to Implementation.pptx

Tales from a Passkey Provider Progress from Awareness to Implementation.pptx

Elevate Developer Efficiency & build GenAI Application with Amazon Q

Elevate Developer Efficiency & build GenAI Application with Amazon Q

Elevate Developer Efficiency & build GenAI Application with Amazon Q

Bhuvaneswari Subramani

In the dynamic field of DevOps, the quest for efficiency and productivity is endless. This talk introduces a revolutionary toolkit: Large Language Models (LLMs), including ChatGPT, Gemini, and Claude, extending far beyond traditional coding assistance. We'll explore how LLMs can automate not just code generation, but also transform day-to-day operations such as crafting compelling cover letters for TPS reports, streamlining client communications, and architecting innovative DevOps solutions. Attendees will learn effective prompting strategies and examine real-life use cases, demonstrating LLMs' potential to redefine productivity in the DevOps landscape. Join us to discover how to harness the power of LLMs for a comprehensive productivity boost across your DevOps activities.

ChatGPT and Beyond - Elevating DevOps Productivity

ChatGPT and Beyond - Elevating DevOps Productivity

ChatGPT and Beyond - Elevating DevOps Productivity

VictorSzoltysek

Intro to Passkeys and the State of Passwordless.pptx

Intro to Passkeys and the State of Passwordless.pptx

Intro to Passkeys and the State of Passwordless.pptx

Design Guidelines for Passkeys 2024.pptx

Design Guidelines for Passkeys 2024.pptx

Design Guidelines for Passkeys 2024.pptx

CNIC Information System with Pakdata Cf In Pakistan

CNIC Information System with Pakdata Cf In Pakistan

CNIC Information System with Pakdata Cf In Pakistan

Dernier (20)

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Understanding the FAA Part 107 License ..

Understanding the FAA Part 107 License ..

Understanding the FAA Part 107 License ..

الأمن السيبراني - ما لا يسع للمستخدم جهله

الأمن السيبراني - ما لا يسع للمستخدم جهله

الأمن السيبراني - ما لا يسع للمستخدم جهله

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Six Myths about Ontologies: The Basics of Formal Ontology

Six Myths about Ontologies: The Basics of Formal Ontology

Six Myths about Ontologies: The Basics of Formal Ontology

Vector Search -An Introduction in Oracle Database 23ai.pptx

Vector Search -An Introduction in Oracle Database 23ai.pptx

Vector Search -An Introduction in Oracle Database 23ai.pptx

Design and Development of a Provenance Capture Platform for Data Science

Design and Development of a Provenance Capture Platform for Data Science

Design and Development of a Provenance Capture Platform for Data Science

DBX First Quarter 2024 Investor Presentation

DBX First Quarter 2024 Investor Presentation

DBX First Quarter 2024 Investor Presentation

Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform

Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform

Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform

Platformless Horizons for Digital Adaptability

Platformless Horizons for Digital Adaptability

Platformless Horizons for Digital Adaptability

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

AI in Action: Real World Use Cases by Anitaraj

AI in Action: Real World Use Cases by Anitaraj

AI in Action: Real World Use Cases by Anitaraj

How to Check CNIC Information Online with Pakdata cf

How to Check CNIC Information Online with Pakdata cf

How to Check CNIC Information Online with Pakdata cf

Tales from a Passkey Provider Progress from Awareness to Implementation.pptx

Tales from a Passkey Provider Progress from Awareness to Implementation.pptx

Tales from a Passkey Provider Progress from Awareness to Implementation.pptx

Elevate Developer Efficiency & build GenAI Application with Amazon Q

Elevate Developer Efficiency & build GenAI Application with Amazon Q

Elevate Developer Efficiency & build GenAI Application with Amazon Q

ChatGPT and Beyond - Elevating DevOps Productivity

ChatGPT and Beyond - Elevating DevOps Productivity

ChatGPT and Beyond - Elevating DevOps Productivity

Intro to Passkeys and the State of Passwordless.pptx

Intro to Passkeys and the State of Passwordless.pptx

Intro to Passkeys and the State of Passwordless.pptx

Design Guidelines for Passkeys 2024.pptx

Design Guidelines for Passkeys 2024.pptx

Design Guidelines for Passkeys 2024.pptx

CNIC Information System with Pakdata Cf In Pakistan

CNIC Information System with Pakdata Cf In Pakistan

CNIC Information System with Pakdata Cf In Pakistan

論文紹介 Skill preferences

1. 論⽂紹介 Skill Preferences: Learning to Extract and Execute Robotic Skills from Human Feedback アザラシ D E C . 1 1 . 2 0 2 1

2. ⾃⼰紹介 • HN：アザラシ • 好きなボード：M5Stack，Raspberry Pi • 趣味：ロボット製作，⼭歩き，数学 • 興味：強化学習，四⾜歩⾏ロボット，微分幾何，圏論 • 専⾨：同次システム＠⾮線形制御理論 • 棲息地：鴨川

3. 強化学習とは • 探索データから報酬関数に従って，より良い動作を獲得する⼿法 • つまり，報酬設計が超重要．

4. この論⽂の貢献 • (先⾏研究) 強化学習における報酬設計は，相当なエンジニアリングコストが課題 Human-in-the-loop RLは，訓練中に⼈との対話的なフィードバックを実施することで，ハンドメイドの報酬設計を不要にした • (この論⽂の課題) タスクの複雑さが増すと，適切な⽅策を獲得するまでに⾮現実的な数の⼈との対話的フィードバックを必要とすることが課題⽐較的少ないフィードバック数で，⼈の好みだけでなく好みから抜け落ちたスキル抽出まで実施する

5. ⼿法の概要 SAC

6. アルゴリズムのプロセス(1/2) Step 1: Collect offline dataset ℬ (Expert demo + Random policy) Step 2: Provide labels “good/bad” for 10% of ℬ 𝒟 Step 3: Train preference classifier 𝑃! for 𝒟 Step 4: Train decoder 𝑝"! and encoder 𝑞"" with 𝑃!

7. アルゴリズムのプロセス(2/2) Step 1: Execute yellow loop (if iteration % K == 0) Step 2: Execute red loop Step 3: Update Agent SAC actor 𝜋#! and critic 𝑄#" , 𝑄$ #" [Note] Use learned decoder 𝑝"! SAC

8. 𝑃!の学習 • Preference classifier 𝑃!(𝑦|𝜏), where 𝑦 ∈ 0, 1 , 𝜏 is trajectory(state-action) sequences • Update 𝜓 by maximizing loss function(cross entropy)： 𝔼 %,' ∼𝒟 𝑦 ⋅ log 𝑃! 𝜏 + 1 − 𝑦 ⋅ log 1 − 𝑃! 𝜏 • [Note] オフラインデータセットの部分集合にラベル付け(𝑦 ∈ 0,1 )したものから学習

9. enc, decの学習 • skill-encoder 𝑞"" 𝑧 𝜏 and skill-decoder 𝑝"! 𝑎*, … , 𝑎*+,-. 𝑠*, 𝑧 where 𝑧 ∈ 𝒵 is skill, 𝑠 ∈ 𝒮 is state, 𝑎 ∈ 𝒜 is action, 𝜏 is trajectory(state- action) sequences. • Update 𝑝"! and 𝑞"" by maximizing loss function(ELBO of 𝛽-VAE with Gaussian prior with 𝑃!) 𝔼'∼𝒟,/∼0#(/|') 𝑃! 𝜏 ℒ456789:4;6:<78 + 𝛽 ⋅ ℒ45=;>?4<@?:<78

10. " ℛ"の学習 • Update 𝜂 by minimizing loss function(binary cross-entropy): where is distribution of Bradley-Terry model. • [Note] 演算⼦ A ≻ B は，AがBよりも優先されることの意味．

11. 実験(1/4) • 以下の作業を実⾏できるか確認する • Baselineとして，PEBBLE(PMLR2021)と⽐較する

12. 実験(2/4) • Skill Extractionは以下のように⾏う．

13. 実験(3/4) • SkiPは，2つ以上の連続する複雑なタスクでも成功している

14. 実験(4/4) • SkiPは，⼈からのフィードバックがないと成功しない

15. 考察 • 👍 / 👎 する作業はやりたくないな • Human Interactionの分野はこれからの発展が楽しみ😃 • 試⾏回数はまだ多いなって印象