Building machines that learn and think for themselves | Behavioral and Brain Sciences | Cambridge Core (original) (raw)

Abstract

We agree with Lake and colleagues on their list of “key ingredients” for building human-like intelligence, including the idea that model-based reasoning is essential. However, we favor an approach that centers on one additional ingredient: autonomy. In particular, we aim toward agents that can both build and exploit their own internal models, with minimal human hand engineering. We believe an approach centered on autonomous learning has the greatest chance of success as we scale toward real-world complexity, tackling domains for which ready-made formal models are not available. Here, we survey several important examples of the progress that has been made toward building autonomous agents with human-like abilities, and highlight some outstanding challenges.

References

Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M. W., Pfau, D., Schaul, T., Shillingford, B. & de Freitas, N. (2016). Learning to learn by gradient descent by gradient descent. Presented at the 2016 Neural Information Processing Systems conference, Barcelona, Spain, December 5–10, 2016. In: Advances in neural information processing systems 29 (NIPS 2016), ed. Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I. & Garnett, R., pp. 3981–89). Neural Information Processing Systems.Google Scholar

Battaglia, P., Pascanu, R., Lai, M. & Rezende, D. J. (2016) Interaction networks for learning about objects, relations and physics. Presented at the 2016 Neural Information Processing Systems conference, Barcelona, Spain, December 5–10, 2016. In: Advances in neural information processing systems 29 (NIPS 2016), ed. Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I. & Garnett, R., pp. 4502–10. Neural Information Processing Systems.Google Scholar

Bellemare, M., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D. & Munos, R. (2016) Unifying count-based exploration and intrinsic motivation. Presented at the 2016 Neural Information Processing Systems conference, Barcelona, Spain, December 5–10, 2016. In: Advances in neural information processing systems 29 (NIPS 2016), ed. Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I. & Garnett, R., pp. 1471–79. Neural Information Processing Systems.Google Scholar

Blundell, C., Uria, B., Pritzel, A., Li, Y., Ruderman, A., Leibo, J. Z., Rae, J., Wierstra, D. & Hassabis, D. (2016) Model-free episodic control. arXiv preprint 1606.04460. Available at: https://arxiv.org/abs/1606.04460.Google Scholar

Botvinick, M. M. & Cohen, J. D. (2014) The computational and neural basis of cognitive control: Charted territory and new frontiers. Cognitive Science 38:1249–85.CrossRefGoogle ScholarPubMed

Botvinick, M., Weinstein, A., Solway, A. & Barto, A. (2015) Reinforcement learning, efficient coding, and the statistics of natural tasks. Current Opinion in Behavioral Sciences 5:71–77.CrossRefGoogle Scholar

Denil, M., Agrawal, P., Kulkarni, T. D., Erez, T., Battaglia, P. & de Freitas, N. (2016). Learning to perform physics experiments via deep reinforcement learning. arXiv preprint:1611.01843. Available at: https://arxiv.org/abs/1611.01843.Google Scholar

Duan, Y., Schulman, J., Chen, X., Bartlett, P. L., Sutskever, I. & Abbeel, P. (2016) RL2: Fast reinforcement learning via slow reinforcement learning. arXiv preprint 1611.02779. Available at: https://arxiv.org/pdf/1703.07326.pdf.Google Scholar

Eslami, S. M., Heess, N., Weber, T., Tassa, Y., Kavukcuoglu, K. & Hinton, G. E. (2016) Attend, infer, repeat: Fast scene understanding with generative models. Presented at the 2016 Neural Information Processing Systems conference, Barcelona, Spain, December 5–10, 2016. In: Advances in Neural Information Processing Systems 29 (NIPS 2016), ed. Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I. & Garnett, R., pp. 3225–33. Neural Information Processing Systems Foundation.Google Scholar

Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwińska, A., Colmenarejo, S. G., Grefenstette, E., Ramalho, T., Agapiou, J., Badia, A. P., Hermann, K. M., Zwols, Y., Ostrovski, G., Cain, A., King, H., Summerfield, C., Blunsom, P., Kayukcuoglu, K. & Hassabis, D. (2016) Hybrid computing using a neural network with dynamic external memory. Nature 538(7626):471–76.CrossRefGoogle ScholarPubMed

Hamrick, J. B., Ballard, A. J., Pascanu, R., Vinyals, O., Heess, N. & Battaglia, P. W. (2017) Metacontrol for adaptive imagination-based optimization. In: Proceedings of the 5th International Conference on Learning Representations (ICLR).Google Scholar

Hochreiter, S. A., Younger, S. & Conwell, P. R. (2001) Learning to learn using gradient descent. In: International Conference on Artificial Neural Network—ICANN 2001, ed. Dorffner, G., Bischoff, H. & Hornik, K., pp. 87–94. Springer.CrossRefGoogle Scholar

Krizhevsky, A., Sutskever, I. & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Presented at the 25th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, December 3–6, 2012. In: Advances in Neural Information Processing Systems 25 (NIPS 2012), ed. Pereira, F., Burges, C. J. C., Bottou, L. & Weinberger, K. Q., pp. 1097–105. Neural Information Processing Systems Foundation.Google Scholar

Lake, B. M., Salakhutdinov, R. & Tenenbaum, J. B. (2015a) Human-level concept learning through probabilistic program induction. Science 350(6266):1332–38.CrossRefGoogle ScholarPubMed

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglous, I., King, H., Kumaran, D., Wierstra, D. & Hassabis, D. (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–33.CrossRefGoogle ScholarPubMed

Ranzato, M., Szlam, A., Bruna, J., Mathieu, M., Collobert, R. & Chopra, S. (2016) Video (language) modeling: A baseline for generative models of natural videos. arXiv preprint 1412.6604. Available at: https://www.google.com/search?q=arXiv+preprint+1412.6604&ie=utf-8&oe=utf-8.Google Scholar

Raposo, D., Santoro, A., Barrett, D. G. T., Pascanu, R., Lillicrap, T. & Battaglia, P. (2017) Discovering objects and their relations from entangled scene representations. Presented at the Workshop Track at the International Conference on Learning Representations, Toulon, France, April 24–26, 2017. arXiv preprint 1702.05068. Available at: https://openreview.net/pdf?id=Bk2TqVcxe.Google Scholar

Reed, S. & de Freitas, N. (2016) Neural programmer-interpreters. Presented at the 4th International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, May 2–5, 2016. arXiv preprint 1511.06279. Available at: https://arxiv.org/abs/1511.06279.Google Scholar

Rezende, D. J., Mohamed, S., Danihelka, I., Gregor, K. & Wierstra, D. (2016) One-shot generalization in deep generative models. Presented at the International Conference on Machine Learning, New York, NY, June 20–22, 2016. Proceedings of Machine Learning Research 48:1521–29.Google Scholar

Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D. & Lillicrap, T. (2016). Meta-learning with memory-augmented neural networks. Presented at the 33rd International Conference on Machine Learning, New York, NY, June 19–24, 2016. Proceedings of Machine Learning Research 48:1842–50.Google Scholar

Schaul, T., Quan, J., Antonoglou, I. & Silver, D. (2016) Prioritized experience replay. Presented at International Conference on Learning Representations (ICLR), San Diego, CA, May 7–9, 2015. arXiv preprint 1511.05952. Available at: https://arxiv.org/abs/1511.05952.Google Scholar

Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Driessche, G. V. D., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K, Graepel, T. & Hassabis, D. (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7585):484–89.CrossRefGoogle ScholarPubMed

Silver, D., van Hasselt, H., Hessel, M., Schaul, T., Guez, A., Harley, T., Dulac-Arnold, G. Reichert, D., Rabinowitz, N., Barreto, A. & Degris, T. (2017) The predictron: End-to-end learning and planning. In: Proceedings of the 34rd International Conference on Machine Learning, Sydney, Australia, ed. Balcan, M. F. & Weinberger, K. Q..Google Scholar

van den Oord, A., Kalchbrenner, N. & Kavukcuoglu, K. (2016). Pixel recurrent neural networks. Presented at the 33rd International Conference on Machine Learning, New York, NY. Proceedings of Machine Learning Research 48:1747–56.Google Scholar

Vinyals, O., Blundell, C., Lillicrap, T. & Wierstra, D. (2016) Matching networks for one shot learning. Vinyals, O., Blundell, C., Lillicrap, T. Kavukcuoglu, K. & Wierstra, D. (2016). Matching networks for one shot learning. Presented at the 2016 Neural Information Processing Systems conference, Barcelona, Spain, December 5–10, 2016. In: Advances in Neural Information Processing Systems 29 (NIPS 2016), ed. Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I. & Garnett, R., pp. 3630–38. Neural Information Processing Systems Foundation.Google Scholar

Wang, J. X., Kurth-Nelson, Z., Tirumala, D., Soyer, H., Leibo, J. Z., Munos, R., Blundell, C., Kumaran, D. & Botvinick, M. (2017). Learning to reinforcement learn. In: Presented at the 39th Annual Meeting of the Cognitive Science Society, London, July 26–29, 2017. arXiv preprint 1611.05763. Available at: https://arxiv.org/abs/1611.05763.Google Scholar