Fast Online Model Learning for Controlling Complex Real-World Robots

Loviken, Pontus

View/Open

Thesis - full version (20.96Mb)

license.txt (3.016Kb)

Date

2019
2019

Author

Loviken, Pontus

Subject

Model learning

Reinforcement learning

Online learning

Goal babbling

Inverse models

Micro data learning

Developmental Robotics

Real-world robots

Sensorimotor control

Metadata

Show full item record

Abstract

How can real robots with many degrees of freedom - without previous knowledge of themselves or their environment - act and use the resulting observations to efficiently develop the ability to generate a wide set of useful behaviours?

This thesis presents a novel framework that enables physical robots with many degrees of freedom to rapidly learn models for control from scratch. This can be done in previously inaccessible problem domains characterised by a lack of direct mappings from motor actions to outcomes, as well as state and action spaces too large for the full forward dynamics to be learned and used explicitly. The proposed framework is able to cope with these issues by the use of a set of local Goal Babbling models, that maps every outcome in a low dimensional task space to a specific action, together with a sparse higher level Reinforcement Learning model, that learns to navigate between the contexts from which each Goal Babbling model can be used. The two types of models can then be learned online an in parallel, using only the data a robot can collect by interacting with its environment.

To show the potential of the approach we present two possible implementations of the framework, over two separate robot platforms: a simulated planar arm with up to 1, 000 degrees of freedom, and a real humanoid robot with 25 degrees of freedom. The results show that learning is rapid and essentially unaffected by the number of degrees of freedom of the robot, allowing for the generation of complex behaviours and skills after a relatively short training time. The planar arm is able to strategically plan series of motions in order to move its end-effector between any two parts of a crowded environment, within 10, 000 iterations. The humanoid robot is able to freely transition between states such as lying on the back, belly, and sides, and occasionally also sitting up, within only 1, 000 iterations. This corresponds to 30 − 60 minutes of real-world interactions.

The main contribution of this thesis is to provide a framework for solving a control learning problem, previously largely unexplored with no obvious solutions, but with strong analogies to, for example, early learning of body orientation control in infants. This thesis examined two quite different implementations of the proposed framework, and showed success in both cases for two different control learning problem.

Publisher

University of Plymouth

Commissioning body

School of Engineering, Computing and Mathematics

The following license files are associated with this item:

Except where otherwise noted, this item's license is described as CC0 1.0 Universal