Learning What To Say And What To Do: A Model For Grounding Language And Actions

Antunes, Alexandre

dc.contributor.supervisor	Cangelosi, Angelo
dc.contributor.author	Antunes, Alexandre
dc.contributor.other	School of Engineering, Computing and Mathematics	en_US
dc.date.accessioned	2020-07-27T10:58:45Z
dc.date.available	2020-07-27T10:58:45Z
dc.date.issued	2020
dc.identifier	10565827	en_US
dc.identifier.uri	http://hdl.handle.net/10026.1/16104
dc.description.abstract	Automation is becoming increasingly important in nowadays society, with robots performing a lot of repetitive tasks in industry and even entering our households in the form of vacuum cleaners and lawn mowers. When considering regular tasks outside of the controlled environments of industry, robots tend to perform poorly. In particular, in situations where robots have to interact with humans, a problem arises: how can a robot understand what the human means? While a lot of work has been made in the past towards visual perception and classification of objects, but understanding what action a verb translates into has still been an unexplored area. In solving this challenge, we would enable robots to execute commands given in natural language, and also to verbalise what actions they are performing when prompted. This work studies how a robot can learn the meaning behind the sentences humans use, how it translates into its perception and the real world, but also how to translate its actions into sentences humans understand. To achieve this we propose a novel Bidirectional machine learning model, along with a data collection module that can be used by non-technical users. The main idea behind this model is the ability to generalise to novel concepts, being able to compose new sentences and actions from what it learned previously. Humans show this ability to generalise from a young age, and it is a desirable feature for this model. By using humans natural teaching instincts to teach the robot together with this generalisation ability we hope to obtain a model that allows people everywhere to teach the robot to perform the actions we desire. We validate the model in a number of tasks, using an iCub and Pepper robots physically interacting with objects in order to complete a natural language command. We test different actions, including motor actions and emotional displays, while using both transitive and intransitive verbs in the natural language commands. The main contribution of this thesis is the development of a Bidirectional Learning Algorithm, applied to a Multiple Timescale Recurrent Neural Network enabling these models to link action and language in a bidirectional way. A second contribution sees the extension of Multiple Timescale architectures to Long Short-Term Memory models, increasing the capabilities of these models. Finally the third contribution is in the form of data collection modules, with the development of an easy-to-use module based on physical interaction and speech to provide the iCub and Pepper robots with the data to be learned.	en_US
dc.language.iso	en
dc.publisher	University of Plymouth
dc.rights	Attribution 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by/3.0/us/	*
dc.subject	Machine Learning	en_US
dc.subject	Developmental Robotics	en_US
dc.subject.classification	PhD	en_US
dc.title	Learning What To Say And What To Do: A Model For Grounding Language And Actions	en_US
dc.type	Thesis
plymouth.version	publishable	en_US
dc.identifier.doi	http://dx.doi.org/10.24382/952
dc.rights.embargoperiod	No embargo	en_US
dc.type.qualification	Doctorate	en_US
rioxxterms.funder	Horizon 2020	en_US
rioxxterms.identifier.project	APRIL	en_US
rioxxterms.version	NA
plymouth.orcid_id	0000-0002-9429-7882	en_US