Automatic discovery of concepts for unknown environments


This thesis explores how an agent can autonomously learn about its environment

just by interacting with it. This is not an easy task, since traditional

machine learning algorithms strongly depend on the user's intervention to define

the data to use and the experimental conditions under which the learning

process takes place. Designing an agent that autonomously drives its own

learning process poses several interesting challenges. How to explore the environment,

how to gather and represent the information obtained from the

environment (what to learn, when to learn, and how to organize the new

knowledge) and how to evaluate the knowledge acquired. In this thesis, an

algorithm called ADC which combines different machine learning techniques

in novel ways, is proposed to answer these questions. In particular, a novel

exploration strategy is proposed based on an asymmetric Wundt's curve and

biased actions to guide an agent through the environment and the learning

process. ADC incrementally builds, during exploration, a graph-based

representation of the environment using some initial background knowledge.

Frequent sub-graphs are automatically identified as instances of potentially

useful concepts from which relational concepts are induced. These concepts

are organized in a lattice and incorporated into its background knowledge

so that they can be used for learning new concepts. ADC also learns how

to perform new tasks by reinforcement learning with intrinsic motivation,

relational concepts are used to define states where actions are learned. The

learned behavior policies are stored for solving future tasks. ADC was tested

on simulated environments (floors, polygons, furniture, mobility and stability

of objects) and the concepts learned by the system were validated by

independent users (different to the author of this thesis) with encouraging

results. Among the learned concepts are basic structures (e.g., room), polygons

(e.g., pentagon, triangle), furniture (e.g., table, chair), movable objects,

and examples of simple stable structures.

Doctoral thesis

Concept learning Reinforcement learning Predicate invention Inductive logic programming CIENCIAS FÍSICO MATEMÁTICAS Y CIENCIAS DE LA TIERRA MATEMÁTICAS CIENCIA DE LOS ORDENADORES

Instrucción de tareas a un robot con retroalimentación en línea proporcionada por voz


Robots are increasingly common in our daily lives and therefore they need

to work in environments shared with humans. In service robotics they need

to adapt to changing environments, interact naturally with non-expert users

and also work with time restrictions. Trying to solve these needs, some methods

have been proposed to program robots for those situations, between

them, reinforcement learning and learning by demonstration. These methods

have been widely used and although good results have been obtained with

them, they have some problems that should be solved. Reinforcement learning

has long times for training and some problems with methods working

in continuous spaces, which require a lot of experience and therefore spend

so long, and sometimes they do not converge. Reward shaping has been used

in reinforcement learning algorithms to accelerate learning, however, it requires

a priori domain knowledge and therefore, it is static because it can

not be adjusted during the learning process. On the other hand, the success

of learning by demonstration is based on the knowledge and the abilities of

the user who provides examples to the robot, and also this learning does

not cover all the space of possibilities in the task domain. Addressing these

problems, this thesis presents an algorithm of reinforcement learning based

on Sarsa(λ ), with initial task demonstration by voice, and includes additional

on-line feedback to the traditional reinforcements, feedback is provided

through commands and qualifiers by voice. Speech provides a way of natural

instruction, accessible to non-expert users, and its inclusion works as a reward

shaping method in the learning algorithm. Unless the most widely used

reward shaping approaches, additional feedback provided by voice is variable

along time, so it works as a dynamic method of reward shaping that does

not need a prior knowledge or designs (of functions). At the same time a new

simple representation to work on-line with continuous spaces is proposed.

Experiments done with navigation tasks and one handling task show how

the proposed algorithm works with continuous spaces and on-line feedback,

and how learning time can be reduced significantly compared to traditional

reinforcement learning algorithms, obtaining very similar policies.

Los robots se encuentran inmersos en nuestra vida diaria cada vez más y

por consiguiente necesitan estar capacitados para cumplir satisfactoriamente

tareas comunes en ambientes compartidos con humanos. Específicamente,

los robots de servicio requieren adaptarse a los ambientes cambiantes donde

operan, interactuar de forma natural con humanos inexpertos (sin conocimientos

de robótica) y trabajar con restricciones de tiempo. Tratando

de resolver estas necesidades han surgido diversos métodos, entre ellos, el

aprendizaje por refuerzo y el aprendizaje por demostración. Estos métodos

han sido ampliamente usados y aunque han dado buenos resultados, tienen

algunos problemas que deben ser resueltos. En aprendizaje por refuerzo se

tienen largos tiempos de entrenamiento y problemas con los métodos que

trabajan con espacios continuos, que suelen requerir mucha experiencia y

por lo tanto, consumir mucho tiempo en el entrenamiento, e incluso pueden

no llegar a converger. Reward shaping ha sido usado en los algoritmos de

aprendizaje por refuerzo para acelerar el aprendizaje, sin embargo, requiere

conocimiento a priori y sus funciones son estáticas porque no son ajustables

durante el proceso de aprendizaje. Por otro lado, el éxito del aprendizaje por

demostración es dependiente de las habilidades del usuario que proporciona

los ejemplos de la tarea al robot, y no cubre todo el espacio de posibilidades

dentro del dominio de la tarea. Abordando esta problemática, en esta tesis

se presenta un algoritmo de aprendizaje por refuerzo basado en Sarsa(λ)

que incluye demostración inicial de la tarea proporcionada por voz, además

de recibir los refuerzos usuales de los algoritmos tradicionales incluye retroalimentación

en línea a través de comandos y calificativos proporcionados

verbalmente por un usuario. La voz proporciona un medio de instrucción natural,

asequible para usuarios inexpertos, y su inclusión funciona como una

técnica de reward shaping sobre el algoritmo de aprendizaje. Pero a diferencia

de los enfoques de reward shaping más usados, la retroalimentación por voz

puede variar en el tiempo, por lo cual funciona como una técnica dinámica

que no requiere conocimiento ni diseños previos. Al mismo tiempo se propone

una representación sencilla y novedosa para espacios continuos que puede

ser construida en línea.

Master thesis