A bioinspired hierarchical reinforcement learning architecture for modeling learning of multiple skills with continuous states and actions (original) (raw)

Abstract

Organisms, and especially primates, are able to learn several skills while avoiding catastrophic interference and enhancing generalisation. This paper proposes a novel reinforcement learning (RL) architecture which has a number of features that make it suitable to investigate these phenomena. The model instantiates a mixture of expert architecture within a neural-network actor-critic system trained with the TD(λ) RL algorithm. The "responsibility signals" provided by the gating network are used both to weight the outputs of the multiple "expert" controllers and to modulate their learning. The model is tested in a simulated dynamic 2D robotic arm which autonomously learns to reach a target in (up to) three different conditions. The results show that the model is able to train same or different experts to solve the task(s) in the various conditions depending on the similarity of the sensorimotor mappings they require.

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (16)

Baldassarre, G. (2002). A modular neural-network model of the basal ganglia's role in learning and selecting motor behaviours. Journal of Cognitive Systems Research, 3:5-13.
Barto, A. G. and Mahadevan, S. (2003). Recent ad- vances in hierarchical reinforcement learning. Dis- crete Event Dynamic Systems, 13(4):341-379.
Berthier, N. E., Clifton, R. K., McCall, D. D., and Robin, D. J. (1999). Proximodistal structure of early reaching in human infants. Exp Brain Res, 127:259-269.
Berthier, N. E., Rosenstein, M. T., and Barto, A. G. (2005). Approximate optimal control as a model for motor learning. Psychol Rev, 112:329-346.
Caligiore, D., Ferrauto, T., Parisi, D., Accornero, N., Capozza, M., and Baldassarre, G. (2008). Using motor babbling and hebb rules for modeling the development of reaching with obstacles and grasp- ing. In Dillmann, R., Maloney, C., Sandini, G., Asfour, T., Cheng, G., Metta, G., and Ude, A., (Eds.), Proc. of COGSYS 2008, Karlsruhe, Ger- many. Springer.
Caligiore, D., Guglielmelli, E., Borghi, A. M., Parisi, D., and Baldassarre, G. (inpr). A reinforcement learning model of reaching integrating kinematic and dynamic control in a simulated arm robot. In Proceedings of ICDL 2010.
Doya, K. (2000). Reinforcement learning in contin- uous time and space. Neural Comput, 12(1):219- 245.
Doya, K., Samejima, K., Katagiri, K.-i., and Kawato, M. (2002). Multiple model-based reinforcement learning. Neural Computation, 14(6):1347-1369.
Houk, J. C., Davis, J., and Beiser, D., (Eds.) (1995). Models of Information Processing in the Basal Ganglia. The MITT Press, Cambridge, MA.
Jacobs, R. A., Jordan, M. I., Nowlan, S. J., and Hin- ton, G. E. (1991). Adaptive mixtures of local ex- perts. Neural Computation, 3:79-87.
Konidaris, G. D. and Barto, A. G. (2009). Skill discovery in continuous reinforcement learning do- mains using skill chaining. In Bengio, Y. e. a., (Ed.), Advances in Neural Information Processing Systems 22 (NIPS09), pages 1015-1023.
Mugan, J. and Kuipers, B. (inpr). Autonomous exploration and the qualitative learner of action and perception, qlap. IEEE Transactions on Au- tonomous Mental Development.
Peters, J. and Schaal, S. (2008). Natural actor-critic. Neurocomputing, 71:1180-1190.
Piaget, J. (1953). The Origins of Intelligence in Chil- dren. Routledge and Kegan Paul, London.
Pouget, A. and Latham, P. E. (2003). Population codes. In Arbib, M. A., (Ed.), The Handbook of Brain Theory and Neural Networks. The MIT Press, Cambridge, MA, USA, second edition.
Sutton, R. S. and Barto, A. G. (1998). Reinforce- ment Learning: An Introduction. The MIT Press, Cambridge MA, USA.