Representations for Policy Learning of Embodied Agents

Authors

Oksana Hagen

Abstract

This thesis investigates how embodied autonomous agents can obtain useful representations of their environment and further apply those representations to learn goal-directed behaviours. It explores the potential of using sensory-motor data obtained directly from the agent’s interactions with its environment, without any additional inputs, task-dependent cues and minimal amount of heuristics. The methodology focuses on using unsupervised and self-supervised learning techniques to extract meaningful patterns from sensory data. This thesis proposes a few different representation learning architectures,with final versions composed as a combination of sensory encoding, sequence modeling and predictive training objective, and tests them in a simulated mobile platform robot scenario. First, the training data is obtained via random interaction of the agent with the environment as a stream of sensory-motor information. Then, this data is used to train a representation learning model. Finally, policy learning performance in spatial learning tasks is used to estimate how informative the representations are. The results demonstrate that it is possible to obtain unsupervised, task-agnostic representations that can be used for policy learning in embodied agents. The resulting policies perform on par with or better than benchmarks in some of the test environments, improving the performance and the robustness of the learning. The findings particularly highlight the importance of incorporating memory via sequence modelling and action information into the representation learning process, as it improves task performance, compared to using only sensory information. More generally, the findings show that even models relying on minimal heuristics or task-specific cues in representation learning can yield meaningful and useful representations. The results contribute to the field of developmental roboticsby providing evidence that an embodied agent can obtain useful representations of the environment through autonomous interaction with the environment.

Awarding Institution(s)

University of Plymouth

Supervisor

Pablo Borja, Swen Gaudl, Tony Belpaeme

Keywords

Embodied agents, machine learning, autonomous agents, Policy learning, Developmental Robotics, Goal-directed control

Document Type

Thesis

Publication Date

2025

Embargo Period

2027-01-12

Deposit Date

January 2026

Creative Commons License

Creative Commons Attribution-NonCommercial 4.0 International License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License

This document is currently not available here.

This item is under embargo until 12 January 2027

Share

COinS