REINFORCEMENT LEARNING and POMDPs, POLICY GRADIENTS, EVOLUTIONARY REINFORCEMENT LEARNING, UNIVERSAL REINFORCEMENT LEARNERS (original) (raw)

REINFORCEMENT LEARNING IN PARTIALLY OBSERVABLE WORLDS

Realistic environments are not fully observable. General learning agents need an internal state to memorize important events in case of POMDPs. The essential question is: how can they learn to identify and store those events relevant for further optimal action selection? To address this issue, Schmidhuber has studied reinforcement learners with (a) recurrent neural network value function approximators (1990 -), (b) recurrent network world models (1990 -), (c) actions that address and set internal storage cells, trained by the success-story algorithm (1994 -),(d) direct search in a space of event-memorizing algorithms bypolicy gradients orartificial evolution or the OOPS or other methods.

78. J. Schmidhuber. On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models. Report arXiv:1210.0118 [cs.AI], 2015.

77. J. Schmidhuber. Deep Learning in Neural Networks: An Overview. (Section 6 is on Deep Reinforcement Learning.)Neural Networks, Volume 61, January 2015, Pages 85-117 (DOI: 10.1016/j.neunet.2014.09.003), published online in 2014. Draft (88 pages, 888 references): Preprint IDSIA-03-14 / arXiv:1404.7828 [cs.NE].HTML overview page.

76. V. R. Kompella, M. Stollenga, M. Luciw, J. Schmidhuber. Continual curiosity-driven skill acquisition from high-dimensional video inputs for humanoid robots. Artificial Intelligence, 2015, Doi:10.1016/j.artint.2015.02.001.

75. J. Koutnik, G. Cuccu, J. Schmidhuber, F. Gomez. Evolving Large-Scale Neural Networks for Vision-Based Reinforcement Learning. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), Amsterdam, July 2013.PDF.

74. R. K. Srivastava, F. Gomez, J. Schmidhuber. Generalized Compressed Network Search. In C. Coello Coello, V. Cutello, K. Deb, S. Forrest, G. Nicosia, M. Pavone, eds.,12th Int. Conf. on Parallel Problem Solving from Nature - PPSN XII, Taormina, 2012.PDF.

73. J. Schmidhuber.POWERPLAY: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem. Frontiers in Cognitive Science, 2013. ArXiv preprint (2011): arXiv:1112.5309 [cs.AI]

72. L. Pape, C. M. Oddo, M. Controzzi, C. Cipriani, A. Foerster, M. C. Carrozza, J. Schmidhuber.Learning tactile skills through curious exploration. Frontiers in Neurorobotics 6:6, 2012, doi: 10.3389/fnbot.2012.00006

71. Sun Yi, F. Gomez, J. Schmidhuber. On the Size of the Online Kernel Sparsification Dictionary. Proc. International Conference on Machine Learning ICML 2012, Edinburgh.PDF.

70. L. Gisslen, M. Ring, M. Luciw, J. Schmidhuber. Modular Value Iteration Through Regional Decomposition. In Proc. Fifth Conference on Artificial General Intelligence (AGI-12), Oxford, UK, 2012. PDF.

69. V. R. Kompella, M. Luciw, M. Stollenga, L. Pape, J. Schmidhuber. Autonomous Learning of Abstractions using Curiosity-Driven Modular Incremental Slow Feature Analysis. Proc. IEEE Conference on Development and Learning / EpiRob 2012(ICDL-EpiRob'12), San Diego, 2012.

68. R. K. Srivastava, B. Steunebrink, J. Schmidhuber Continually Adding Self-Invented Problems to the Repertoire: First Experiments with PowerPlay. Proc. IEEE Conference on Development and Learning / EpiRob 2012(ICDL-EpiRob'12), San Diego, 2012. PDF.

67. M. Luciw, J. Schmidhuber. Low Complexity Proto-Value Function Updating with Incremental Slow Feature Analysis. Proc. International Conference on Artificial Neural Networks (ICANN 2012), Lausanne, 2012.PDF.

66. H. Ngo, M. Luciw, A. Foerster, J. Schmidhuber. Learning Skills from Play: Artificial Curiosity on a Katana Robot Arm. Proc. IJCNN 2012.PDF.Video.

65. M. Ring, T. Schaul, J. Schmidhuber. The Two-Dimensional Organization of Behavior. In Proc. Joint IEEE International Conference on Development and Learning (ICDL) and on Epigenetic Robotics (ICDL-EpiRob 2011), Frankfurt, 2011.PDF.

64. Yi Sun, F. Gomez, M. Ring, J. Schmidhuber. Incremental Basis Construction from Temporal Difference Error. Proceedings of the 28th International Conference on Machine Learning (ICML-11), 2011.PDF.

63. V. Graziano, J. Koutnik, J. Schmidhuber. Unsupervised Modeling of Partially Observable Environments.22nd European Conference on Machine Learning ECML, Athens, 2011.PDF.

62. T. Schaul, Yi Sun, D. Wierstra, F. Gomez, J. Schmidhuber. Curiosity-Driven Optimization. IEEE Congress on Evolutionary Computation (CEC-2011), 2011.PDF.

61. G. Cuccu, M. Luciw, J. Schmidhuber, F. Gomez. Intrinsically Motivated Evolutionary Search for Vision-Based Reinforcement Learning. In Proc. Joint IEEE International Conference on Development and Learning (ICDL) and on Epigenetic Robotics (ICDL-EpiRob 2011), Frankfurt, 2011.PDF.

60. H. Ngo, M. Ring, J. Schmidhuber. Curiosity Drive based on Compression Progress for Learning Environment Regularities. In Proc. Joint IEEE International Conference on Development and Learning (ICDL) and on Epigenetic Robotics (ICDL-EpiRob 2011), Frankfurt, 2011.

59. M. Luciw, V. Graziano, M. Ring, J. Schmidhuber. Artificial Curiosity with Planning for Autonomous Visual and Perceptual Development. In Proc. Joint IEEE International Conference on Development and Learning (ICDL) and on Epigenetic Robotics (ICDL-EpiRob 2011), Frankfurt, 2011.PDF.

58. Sun Yi, F. Gomez, J. Schmidhuber. Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments. In Proc. Fourth Conference on Artificial General Intelligence (AGI-11), Google, Mountain View, California, 2011.PDF.

57. T. Glasmachers, J. Schmidhuber. Optimal Direct Policy Search. In Proc. Fourth Conference on Artificial General Intelligence (AGI-11), Google, Mountain View, California, 2011.PDF.

56. L. Gisslen, M. Luciw, V. Graziano, J. Schmidhuber. Sequential Constant Size Compressors and Reinforcement Learning. In Proc. Fourth Conference on Artificial General Intelligence (AGI-11), Google, Mountain View, California, 2011.PDF.Kurzweil Prize for Best AGI Paper 2011.

55. B. Steunebrink, J. Schmidhuber. A Family of Gödel Machine Implementations. In Proc. Fourth Conference on Artificial General Intelligence (AGI-11), Google, Mountain View, California, 2011.PDF.

54. J. Schmidhuber. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990-2010). IEEE Transactions on Autonomous Mental Development, 2(3):230-247, 2010.IEEE link.PDF of draft.

53. T. Schaul and J. Schmidhuber. Metalearning. Scholarpedia, 5(6):4650, 2010.

52. F. Sehnke, C. Osendorfer, T. Rückstiess, A. Graves, J. Peters, J. Schmidhuber. Parameter-exploring policy gradients. Neural Networks 23(2), 2010.PDF.

51. T. Schaul, J. Bayer, D. Wierstra, S. Yi, M. Felder, F. Sehnke, T. Rückstiess, J. Schmidhuber. PyBrain. Journal of Machine Learning Research (JMLR), 11:743-746, 2010. PDF. (See Pybrain video.)

50. T. Rückstiess, F. Sehnke, T. Schaul, D. Wierstra, S. Yi, J. Schmidhuber. Exploring Parameter Space in Reinforcement Learning.Paladyn Journal of Behavioral Robotics, 2010. PDF.

49. D. Wierstra, A. Förster, J. Peters, J. Schmidhuber. Recurrent Policy Gradients.Logic Journal of IGPL, 18:620-634, 2010 (doi:10.1093/jigpal/jzp049; advance access published 2009).PDF.

48. J. Schmidhuber. Ultimate Cognition à la Gödel.Cognitive Computation 1(2):177-193, 2009. PDF. (Springer.)

47. J. Schmidhuber. Simple Algorithmic Theory of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes. Journal of SICE, 48(1):21-32, 2009.PDF. Extended version (2008, revised 2009): arXiv:0812.4360;PDF (Dec 2008);PDF (April 2009).

46. D. Wierstra, A. Foerster, J. Peters, J. Schmidhuber. Recurrent Policy Gradients.Journal of Algorithms, 2009, in press.PDF.

45. S. Yi, D. Wierstra, T. Schaul, J. Schmidhuber. Stochastic Search using the Natural Gradient.Proceedings of the 26th International Conference on Machine Learning (ICML-09), Montreal, 2009.PDF.

44. J. Togelius, T. Schaul, D. Wierstra, C. Igel, F. Gomez, J. Schmidhuber. Ontogenetic and Phylogenetic Reinforcement Learning. Kuenstliche Intelligenz, 2009, in press.PDF.

43. F. J. Gomez, J. Togelius, J. Schmidhuber. Measuring and Optimizing Behavioral Complexity for Evolutionary Reinforcement Learning . Proceedings of the 19th International Conference on Artificial Neural Networks (ICANN-09), Cyprus, 2009.PDF.

42. F. Gomez, J. Schmidhuber, R. Miikkulainen. Accelerated Neural Evolution through Cooperatively Coevolved Synapses._Journal of Machine Learning Research (JMLR),_9:937-965, 2008. PDF.

41. J. Schmidhuber. Driven by Compression Progress. In Knowledge-Based Intelligent Information and Engineering Systems KES-2008, Lecture Notes in Computer Science LNCS 5177, p 11, Springer, 2008. (Abstract of invited keynote talk.)PDF.

40. T. Rückstiess, M. Felder, J. Schmidhuber. State-Dependent Exploration for Policy Gradient Methods._19th European Conference on Machine Learning ECML,_2008.PDF.

39. T. Schaul and J. Schmidhuber. A Scalable Neural Network Architecture for Board Games. Proceedings of the 2008 IEEE Symposium on Computational Intelligence in Games CIG-2008, Perth, Australia, 2008, in press.PDF.

38. F. Sehnke, C. Osendorfer, T. Rückstiess, A. Graves, J. Peters, and J. Schmidhuber. Policy gradients with parameter-based exploration for control. In J. Koutnik V. Kurkova, R. Neruda, editors, Proceedings of the_International Conference on Artificial Neural Networks ICANN-2008_ ICANN 2008, Prague, LNCS 5163, pages 387-396. Springer-Verlag Berlin Heidelberg, 2008.PDF.

37. D. Wierstra, T. Schaul, J. Peters, J. Schmidhuber. Episodic Reinforcement Learning by Logistic Reward-Weighted Regression. In J. Koutnik V. Kurkova, R. Neruda, editors, Proceedings of the_International Conference on Artificial Neural Networks ICANN-2008_ ICANN 2008, Prague. Springer-Verlag Berlin Heidelberg, 2008.PDF.

36. D. Wierstra, T. Schaul, J. Peters, J. Schmidhuber. Fitness Expectation Maximization.Proceedings of Parallel Problem Solving from Nature PPSN-2008, Dortmund, 2008.PDF.

35. D. Wierstra, T. Schaul, J. Peters, J. Schmidhuber. Natural Evolution Strategies. Proceedings of IEEE Congress on Evolutionary Computation CEC-2008, Hongkong, 2008.PDF.

34. D. Wierstra, J. Schmidhuber. Policy Gradient Critics._18th European Conference on Machine Learning ECML,_Warszaw, 2007.PDF.

33. D. Wierstra, A. Foerster, J. Peters, J. Schmidhuber. Solving Deep Memory POMDPs with Recurrent Policy Gradients. _Intl. Conf. on Artificial Neural Networks ICANN'07,_2007.PDF.

32. J. Schmidhuber. Developmental Robotics, Optimal Artificial Curiosity, Creativity, Music, and the Fine Arts.Connection Science, 18(2): 173-187, June 2006.PDF.

31. F. Gomez, J. Schmidhuber, and R. Miikkulainen (2006). Efficient Non-Linear Control through Neuroevolution. Proceedings of the European Conference on Machine Learning (ECML-06, Berlin).PDF.A new, general method that outperforms many others on difficult control tasks.

30. J. Schmidhuber. Completely Self-Referential Optimal Reinforcement Learners. In W. Duch et al. (Eds.):_Proc. Intl. Conf. on Artificial Neural Networks ICANN'05,_LNCS 3697, pp. 223-233, Springer-Verlag Berlin Heidelberg, 2005 (plenary talk).PDF.

29. F. J. Gomez and J. Schmidhuber. Evolving modular fast-weight networks for control. In W. Duch et al. (Eds.):_Proc. Intl. Conf. on Artificial Neural Networks ICANN'05,_LNCS 3697, pp. 383-389, Springer-Verlag Berlin Heidelberg, 2005.Featuring a 3-wheeled reinforcement learning robot (with distance sensors) that learns without a teacher to balance two poles with a joint indefinitely in a confined 3D environment. PDF.

28. B. Bakker and J. Schmidhuber.Hierarchical Reinforcement Learning Based on Subgoal Discovery and Subpolicy Specialization (PDF). In F. Groen, N. Amato, A. Bonarini, E. Yoshida, and B. Kr�se (Eds.),Proceedings of the 8-th Conference on Intelligent Autonomous Systems, IAS-8, Amsterdam, The Netherlands, p. 438-445, 2004.

27. J. Schmidhuber. Optimal Ordered Problem Solver. Machine Learning, 54, 211-254, 2004.PDF.HTML. HTML overview.

26. B. Bakker, V. Zhumatiy, G. Gruener, and J. Schmidhuber.A Robot that Reinforcement-Learns to Identify and Memorize Important Previous Observations (PDF). In Proceedings of the 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2003.

25. J. Schmidhuber.Bias-Optimal Incremental Problem Solving.In S. Becker, S. Thrun, K. Obermayer, eds.,_Advances in Neural Information Processing Systems 15, NIPS'15,_MIT Press, Cambridge MA, p. 1571-1578, 2003. PDF . HTML.(Compact version ofOptimal Ordered Problem Solver.)

24. B. Bakker, F. Linaker, J. Schmidhuber. Reinforcement Learning in Partially Observable Mobile Robot Domains Using Unsupervised Event Extraction. In Proceedings of the 2002 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2002), Lausanne, 2002. PDF .

23. B. Bakker. Reinforcement Learning with Long Short-Term Memory. Advances in Neural Information Processing Systems 13 (NIPS'13), 2002. (On J. Schmidhuber's CSEM grant 2002.)

22. J. Schmidhuber.Sequential decision making based on direct search.In R. Sun and C. L. Giles, eds., Sequence Learning: Paradigms, Algorithms, and Applications. Lecture Notes on AI 1828, p. 203-240, Springer, 2001. PDF . HTML.

21. I. Kwee, M. Hutter, J. Schmidhuber.Market-Based Reinforcement Learning in Partially Observable Worlds.In G. Dorffner, H. Bischof, K. Hornik, eds., Proceedings of Int. Conf. on Artificial Neural Networks ICANN'01, Vienna, LNCS 2130, pages 865-873, Springer, 2001.

21. M. Wiering and J. Schmidhuber.HQ-Learning.Adaptive Behavior 6(2):219-246, 1997 (122 K).PDF.HTML.

20. R. Salustowicz and M. Wiering and J. Schmidhuber.Learning team strategies: soccer case studies.Machine Learning, 1999 (127 K).

19. J. Schmidhuber, J. Zhao, and M. Wiering.Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement.Machine Learning 28:105-130, 1997. PDF . Flawed HTML.

18. J. Schmidhuber, J. Zhao, N. Schraudolph.Reinforcement learning with self-modifying policies.In S. Thrun and L. Pratt, eds.,Learning to learn, Kluwer, pages 293-309, 1997. Postscript; PDF;HTML.

17. R. Salustowicz and J. Schmidhuber.Probabilistic incremental program evolution.Evolutionary Computation, 5(2):123-141, 1997.

16. M. Wiering and J. Schmidhuber.Solving POMDPs using Levin search and EIRA.In L. Saitta, ed.,Machine Learning: Proceedings of the 13th International Conference, pages 534-542, Morgan Kaufmann Publishers, San Francisco, CA, 1996. PDF . HTML.

15. M. Wiering and J. Schmidhuber. HQ-Learning: Discovering Markovian subgoals for non-Markovian reinforcement learning. Technical Report IDSIA-95-96, IDSIA, October 1996.

14. J. Schmidhuber and J. Zhao and M. Wiering.Simple principles of metalearning.Technical Report IDSIA-69-96, IDSIA, June 1996.

13. J. Schmidhuber. On learning how to learn learning strategies. Technical Report FKI-198-94, Fakultät für Informatik, Technische Universität München, November 1994.

12. J. Schmidhuber.Reinforcement learning in Markovian and non-Markovian environments.In D. S. Lippman, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems 3, NIPS'3, pages 500-506. San Mateo, CA: Morgan Kaufmann, 1991. PDF . HTML.

11. J. Schmidhuber and R. Huber.Learning to generate artificial fovea trajectories for target detection.International Journal of Neural Systems, 2(1 & 2):135-141, 1991 (50 K - figures omitted!). PDF . HTML.

10. J. Schmidhuber and R. Huber. Using sequential adaptive neuro-control for efficient learning of rotation and translation invariance. In T. Kohonen, K. Mäkisara, O. Simula, and J. Kangas, editors,Artificial Neural Networks, pages 315-320. Elsevier Science Publishers B.V., North-Holland, 1991.

9. J. Schmidhuber. Learning algorithms for networks with internal and external feedback. In D. S. Touretzky, J. L. Elman, T. J. Sejnowski, and G. E. Hinton, editors, Proc. of the 1990 Connectionist Models Summer School, pages 52-61. San Mateo, CA: Morgan Kaufmann, 1990.

8. J. Schmidhuber. An on-line algorithm for dynamic reinforcement learning and planning in reactive environments. In Proc. IEEE/INNS International Joint Conference on Neural Networks, San Diego, volume 2, pages 253-258, 1990.

7. J. Schmidhuber. Reinforcement learning with interacting continually running fully recurrent networks. In Proc. INNC International Neural Network Conference, Paris, volume 2, pages 817-820, 1990.

6. J. Schmidhuber. Temporal-difference-driven learning in recurrent networks. In R. Eckmiller, G. Hartmann, and G. Hauske, editors, Parallel Processing in Neural Systems and Computers, pages 209-212. North-Holland, 1990.

5. J. Schmidhuber. Reinforcement-Lernen und adaptive Steuerung.Nachrichten Neuronale Netze, 2:1-3, 1990.

4. J. Schmidhuber. Making the world differentiable: On using fully recurrent self-supervised neural networks for dynamic reinforcement learning and planning in non-stationary environments. Technical Report FKI-126-90, Institut für Informatik, Technische Universität München, February 1990 (revised in November).PDF.

3. J. Schmidhuber. Networks adjusting networks. In J. Kindermann and A. Linden, editors, Proceedings of `Distributed Adaptive Neural Information Processing', St.Augustin, 24.-25.5. 1989, pages 197-208. Oldenbourg, 1990. Extended version: TR FKI-125-90 (revised), Institut für Informatik, TUM.

2. J. Schmidhuber. Dynamische neuronale Netze und das fundamentale raumzeitliche Lernproblem. (341 K), (Dynamic neural nets and the fundamental spatio-temporal credit assignment problem.) Dissertation, Institut für Informatik, Technische Universität München, 1990. PDF . HTML.

1. J. Schmidhuber.A local learning algorithm for dynamic feedforward and recurrent networks.Connection Science, 1(4):403-412, 1989. (The Neural Bucket Brigade - figures omitted!). PDF. HTML.


REINFORCEMENT LEARNING IN FULLY OBSERVABLE WORLDS

Most mainstream reinforcement learning assumes that the learner's current input tells it everything about the environmental state (assumption of full observability). This is often unrealistic but makes things much easier. Important work on this dynamic programming-related type of RL has been done by Samuel, Barto, Sutton, Anderson, Watkins, Dayan, Kaelbling, Moore, Dietterich, Singh, Kearns, and many others. Our contributions include:

5. B. Bakker, V. Zhumatiy, G. Gruener, J. Schmidhuber. Quasi-Online Reinforcement Learning for Robots. Proceedings of the International Conference on Robotics and Automation (ICRA-06), Orlando, Florida, 2006.PDF.A reinforcement learning vision-based robot that learns to build a simple model of the world and itself. To figure out how to achieve rewards in the real world, it performs numerous `mental' experiments using the adaptive world model.

4. M. Wiering and J. Schmidhuber.Fast online Q(lambda).Machine Learning, 1998 (80 K).

3. M. Wiering and J. Schmidhuber.Efficient model-based exploration.In R. Pfeiffer, B. Blumberg, J. Meyer, S. W. Wilson, eds.,From Animals to Animats 5: Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior, p. 223-228, MIT Press, 1998.

2. J. Storck, S. Hochreiter, and J. Schmidhuber.Reinforcement-driven information acquisition in non-deterministic environments. In Proc. ICANN'95, vol. 2, pages 159-164. EC2 & CIE, Paris, 1995. PDF . HTML.

1. J. Schmidhuber. Curious model-building control systems. In Proc. International Joint Conference on Neural Networks, Singapore, volume 2, pages 1458-1463. IEEE, 1991. PDF . HTML.