Fast Online Model Learning for Controlling Complex Real-World Robots

Loviken, Pontus

dc.contributor.supervisor	Hemion, Nikolas
dc.contributor.author	Loviken, Pontus
dc.contributor.other	School of Engineering, Computing and Mathematics	en_US
dc.date.accessioned	2019-10-29T09:41:54Z
dc.date.available	2019-10-29T09:41:54Z
dc.date.issued	2019
dc.date.issued	2019
dc.identifier	10564570	en_US
dc.identifier.uri	http://hdl.handle.net/10026.1/15078
dc.description.abstract	How can real robots with many degrees of freedom - without previous knowledge of themselves or their environment - act and use the resulting observations to efficiently develop the ability to generate a wide set of useful behaviours? This thesis presents a novel framework that enables physical robots with many degrees of freedom to rapidly learn models for control from scratch. This can be done in previously inaccessible problem domains characterised by a lack of direct mappings from motor actions to outcomes, as well as state and action spaces too large for the full forward dynamics to be learned and used explicitly. The proposed framework is able to cope with these issues by the use of a set of local Goal Babbling models, that maps every outcome in a low dimensional task space to a specific action, together with a sparse higher level Reinforcement Learning model, that learns to navigate between the contexts from which each Goal Babbling model can be used. The two types of models can then be learned online an in parallel, using only the data a robot can collect by interacting with its environment. To show the potential of the approach we present two possible implementations of the framework, over two separate robot platforms: a simulated planar arm with up to 1, 000 degrees of freedom, and a real humanoid robot with 25 degrees of freedom. The results show that learning is rapid and essentially unaffected by the number of degrees of freedom of the robot, allowing for the generation of complex behaviours and skills after a relatively short training time. The planar arm is able to strategically plan series of motions in order to move its end-effector between any two parts of a crowded environment, within 10, 000 iterations. The humanoid robot is able to freely transition between states such as lying on the back, belly, and sides, and occasionally also sitting up, within only 1, 000 iterations. This corresponds to 30 − 60 minutes of real-world interactions. The main contribution of this thesis is to provide a framework for solving a control learning problem, previously largely unexplored with no obvious solutions, but with strong analogies to, for example, early learning of body orientation control in infants. This thesis examined two quite different implementations of the proposed framework, and showed success in both cases for two different control learning problem.	en_US
dc.language.iso	en
dc.publisher	University of Plymouth
dc.rights	CC0 1.0 Universal	*
dc.rights.uri	http://creativecommons.org/publicdomain/zero/1.0/	*
dc.subject	Model learning	en_US
dc.subject	Reinforcement learning	en_US
dc.subject	Online learning	en_US
dc.subject	Goal babbling	en_US
dc.subject	Inverse models	en_US
dc.subject	Micro data learning	en_US
dc.subject	Developmental Robotics	en_US
dc.subject	Real-world robots	en_US
dc.subject	Sensorimotor control	en_US
dc.subject.classification	PhD	en_US
dc.title	Fast Online Model Learning for Controlling Complex Real-World Robots	en_US
dc.type	Thesis
plymouth.version	publishable	en_US
dc.identifier.doi	http://dx.doi.org/10.24382/458
dc.rights.embargoperiod	No embargo	en_US
dc.type.qualification	Doctorate	en_US
rioxxterms.funder	Horizon 2020	en_US
rioxxterms.identifier.project	APRIL	en_US
rioxxterms.version	NA