Humanoid robot iCub learns the skill of archery. After being instructed how to hold the bow and release the arrow, the robot learns by itself to aim and shoot arrows at the target. It learns to hit the center of the target in only 8 trials.
Scale your database traffic with Read & Write split using MySQL Router
Learning the skill of archery by a humanoid robot iCub
1. Learning the skill of archery by a humanoid robot iCub Petar Kormushev, Sylvain Calinon, Ryo Saegusa, Giorgio Metta Italian Institute of Technology (IIT)Advanced Robotics dept., RBCS dept. http://www.iit.it Humanoids 2010 Nashville, TN, USADecember 6-8, 2010
2. Motivation How a robot can learn complex motor skills? Why archery task? bi-manual coordination integration of image processing, motor control and learning parts in one coherent task using tools (bow and arrow) to affect an external object (target) appropriate task for testing different learning algorithms, because the reward is inherently defined by the goal of the task Petar Kormushev, Italian Institute of Technology (IIT) 2/20
3. The archery task Different societies Different embodiments Zashikikarakuri, 18-19th century(Mechanical automatons) Kyudo(Japanese archery) Petar Kormushev, Italian Institute of Technology (IIT) Differences in the learned skill 3/20
4. iCub archery skill iCub is an open-source humanoid robot with dimensions comparable to 3.5 year-old child, 104 cm tall, with 53 DOF. Static grasp of the bow Aiming skill Petar Kormushev, Italian Institute of Technology (IIT) 4/20
5.
6.
7. Prior knowledge about the colors of the target and the arrowPetar Kormushev, Italian Institute of Technology (IIT) 5/20
8. Proposed approach For learning bi-manual aiming: PoWER: EM-based Reinforcement Learning ARCHER: Chained vector regression algorithm For hands position/orientation control: IK motion controller for the two arms For image recognition of the target and arrow: color-based detection based on GMM Petar Kormushev, Italian Institute of Technology (IIT) 6/20
9. Learning algorithm #1: PoWER Policy learning by Weighting Exploration with the Returns (PoWER) Reasons to select PoWER: state-of-the-art EM-based RL algorithm no need of learning rate (unlike policy-gradient methods) efficient use of past experience via importance sampling single rollout enough to update policy Jens Kober and Jan Peters, NIPS 2009 Petar Kormushev, Italian Institute of Technology (IIT) 7/20
10. PoWER - implementation Policy parameters : relative position the two hands(3D vector from right to left hand) Policy update rule: Importance sampling uses best σ rollouts so far relative exploration Petar Kormushev, Italian Institute of Technology (IIT) 8/20
11. PoWER - reward function Return of an arrow shooting rollout : Estimated target center position Estimated arrow tip position Petar Kormushev, Italian Institute of Technology (IIT) 9/20
12. Learning algorithm #2: ARCHER Augmented Reward CHainEd Regression Multi-dimensional reward vector Iteratively converging process Using regression to estimate new parameters ARCHER can be viewed as a linear vector regression with a shrinking support region. Petar Kormushev, Italian Institute of Technology (IIT) 10/20
13. Learning algorithm #2: ARCHER rollouts input parameters observed result target reward matrix form least-norm approximation of the weights: Petar Kormushev, Italian Institute of Technology (IIT) 11/20
14. Learning algorithm #2: ARCHER ARCHER is suitable for problemsfor which: a-priori knowledge about the desired goal reward is known the reward can be decomposed into separate components the task has a smooth solution space Makes use of multi-dimensional reward, unlike standard RL, which only uses scalar reward Petar Kormushev, Italian Institute of Technology (IIT) 12/20
15. Simulation experiment Convergence criteria: distance to the center < 5 cm PoWER ARCHER 19 rollouts to converge 5 rollouts to converge Petar Kormushev, Italian Institute of Technology (IIT) 13/20
19. GMM for color-based detectionEstimated reward vector: Petar Kormushev, Italian Institute of Technology (IIT) 15/20
20. Robot motion controller Pattacini et al, IROS 2010 Minimum-jerk IK cartesian controller Hands orientation control Posture and grasping configuration Petar Kormushev, Italian Institute of Technology (IIT) 16/20
32. Future work: use imitation learning to teach the robot the whole movement for grasping and pulling the arrowPetar Kormushev, Italian Institute of Technology (IIT) 19/20
33. Thank you for your kind attention! Petar Kormushev, Italian Institute of Technology (IIT) More information: http://kormushev.com/ 20/20
Notes de l'éditeur
The problem of detecting where the target is, and what isthe relative position of the arrow with respect to the centerof the target, is solved by image processing. We use colorbaseddetection of the target and the tip of the arrow basedon Gaussian Mixture Model (GMM). The color detection isdone in YUV color space, where Y is the luminance, andUV is the chrominance. Only U and V components are usedto ensure robustness to changes in luminosity.In a calibration phase, prior to conducting an archeryexperiment, the user explicitly defines on a camera imagethe position and size of the target and the position of thearrow’s tip. Then, the user manually selects NT pixels lyinginside the target in the image, and NA pixels from the arrow’stip in the image. The selected points produce two datasets:cT 2 R2NT and cA 2 R2NA respectively.From the two datasets cT and cA, a Gaussian MixtureModel (GMM) is used to learn a compact model of the colorcharacteristics in UV space of the relevant objects. EachGMM is described by the set of parameters fk; k;kgKk=1,representing respectively the prior probabilities, centers andcovariance matrices of the model (full covariances are consideredhere). The prior probabilities k satisfy k 2 R[0;1]andPKk=1 k = 1. A Bayesian Information Criterion (BIC)[13] is used to select the appropriate number of GaussiansKT and KA to represent effectively the features to track.After each reproduction attempt, a camera snapshot istaken to re-estimate the position of the arrow and the target.2From the image cI 2 R2NxNy of NxNy pixels in UVcolor space, the center m of each object on the image isestimated through the weighted sum