Abstract
Developing artificial agents able to autonomously discover new goals, to select them and learn the related skills is an important challenge for robotics. This becomes even crucial if we want robots to interact with real environments where they have to face many unpredictable problems and where it is not clear which skills will be the more suitable to solve them. The ability to learn and store multiple skills in order to use them when required is one of the main characteristics of biological agents: forming ample repertoires of actions is important to widen the possibility for an agent to better adapt to different environments and to improve its chance of survival and reproduction. Moreover, humans and other mammals explore the environment and learn new skills not only on the basis of reward-related stimuli but also on the basis of novel or unexpected neutral stimuli. The mechanisms related to this kind of learning processes have been studied under the heading of “Intrinsic Motivations” (IMs), and in the last decades the concept of IMs have been used in developmental and autonomous robotics to foster an artificial curiosity that can improve the autonomy and versatility of artificial agents. In the research presented in this thesis I focus on the development of open-ended learning robots able to autonomously discover interesting events in the environment and autonomously learn the skills necessary to reproduce those events. In particular, this research focuses on the role that IMs can play in fostering those processes and in improving the autonomy and versatility of artificial agents. Taking inspiration from recent and past research in this field, I tackle some of the interesting open challenges related to IMs and to the implementation of intrinsically motivated robots. I first focus on the neurophysiology underlying IM learning signals, and in particular on the relations between IMs and phasic dopamine (DA). With the support of a first computational model, I propose a new hypothesis that addresses the dispute over the nature and the functions of phasic DA activations: reconciling two contrasting theories in the literature and taking xi into account the different experimental data, I suggest that phasic DA can be considered as a reinforcement prediction error learning signal determined by both unexpected changes in the environment (temporary, intrinsic reinforcements) and biological rewards (permanent, extrinsic reinforcements). The results obtained with my computational model support the presented hypothesis, showing how such a learning signal can serve two important functions: driving both the discovery and acquisition of novel actions and the maximisation of rewards. Moreover, those results provide a first example of the power of IMs to guide artificial agents in the cumulative learning of complex behaviours that would not be learnt simply providing a direct reward for the final tasks. In a second work, I move to investigate the issues related to the implementation of IMs signal in robots. Since the literature still lacks a specific analysis of which is the best IM signal to drive skill acquisition, I compare in a robotic setup different typologies of IMs, as well as the different mechanisms used to implement them. The results provide two important contributions: 1) they show how IM signals based on the competence of the system are able to generate a better guidance for skill acquisition with respect to the signals based on the knowledge of the agent; 2) they identify a proper mechanism to generate a competence-based IM signal, showing that the stronger the link between the IM signal and the competence of the system, the better the performance. Following the aim of widening the autonomy and the versatility of artificial agents, in a third work I focus on the improvement of the control architecture of the robot. I build a new 3-level architecture that allows the system to select the goals to pursue, to search for the best way to achieve them, and acquire the related skills. I implement this architecture in a simulated iCub robot and test it in a 3D experimental scenario where the agent has to learn, on the basis of IMs, a reaching task where it is not clear which arm of the robot is the most suitable to reach the different targets. The performance of the system is compared to the one of my previous 2-level architecture system, where tasks and computational resources are associated at design time. The better performance of the system endowed with the new 3-level architecture highlights the importance of developing robots with different levels of autonomy, and in particular both the high-level of goal selection and the low-level of motor control. Finally, I focus on a crucial issue for autonomous robotics: the development of a system that is able not only to select its own goals, but also to discover them through the interaction with the environment. In the last work I present GRAIL, a Goal-discovering Robotic Architecture for Intrisically-motivated Learning. Building on the insights provided by my previous research, GRAIL is a 4-level hierarchical architecture that for the first time assembles in unique system different features necessary for the development of truly autonomous robots. GRAIL is able to autonomously 1) discover new goals, 2) create and store representations of the events associated to those goals, 3) select the goal to pursue, 4) select the computational resources to learn to achieve the desired goal, and 5) self-generate its own learning signals on the basis of the achievement of the selected goals. I implement GRAIL in a simulated iCub and test it in three different 3D experimental setup, comparing its performance to my previous systems, showing its capacity to generate new goals in unknown scenarios, and testing its ability to cope with stochastic environments. The experiments highlight on the one hand the importance of an appropriate hierarchical architecture for supporting the development of autonomous robots, and on the other hand how IMs (together with goals) can play a crucial role in the autonomous learning of multiple skills.
Keywords
Open-ended learning, Intrinsic motivations, Reinforcement learning, Artificial Neural Networks, Autonomous robotics
Document Type
Thesis
Publication Date
2016
Recommended Citation
Santucci, V. (2016) Autonomous learning of multiple skills through intrinsic motivations: A study with computational embodied models. Thesis. University of Plymouth. Retrieved from https://pearl.plymouth.ac.uk/secam-theses/93