Curriculum learning | Proceedings of the 26th Annual International Conference on Machine Learning (original) (raw)

Published: 14 June 2009 Publication History

Abstract

Humans and animals learn much better when the examples are not randomly presented but organized in a meaningful order which illustrates gradually more concepts, and gradually more complex ones. Here, we formalize such training strategies in the context of machine learning, and call them "curriculum learning". In the context of recent research studying the difficulty of training in the presence of non-convex training criteria (for deep deterministic and stochastic neural networks), we explore curriculum learning in various set-ups. The experiments show that significant improvements in generalization can be achieved. We hypothesize that curriculum learning has both an effect on the speed of convergence of the training process to a minimum and, in the case of non-convex criteria, on the quality of the local minima obtained: curriculum learning can be seen as a particular form of continuation method (a general strategy for global optimization of non-convex functions).

References

[1]

Allgower, E. L., & Georg, K. (1980). Numerical continuation methods. An introduction. Springer-Verlag.

[2]

Bengio, Y. (2009). Learning deep architectures for AI. Foundations & Trends in Mach. Learn., to appear.

[3]

Bengio, Y., Ducharme, R., & Vincent, P. (2001). A neural probabilistic language model. Adv. Neural Inf. Proc. Sys. 13 (pp. 932--938).

[4]

Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy layer-wise training of deep networks. Adv. Neural Inf. Proc. Sys. 19 (pp. 153--160).

[5]

Cohn, D., Ghahramani, Z., & Jordan, M. (1995). Active learning with statistical models. Adv. Neural Inf. Proc. Sys. 7 (pp. 705--712).

[6]

Coleman, T., & Wu, Z. (1994). Parallel continuation-based global optimization for molecular conformation and protein folding (Technical Report). Cornell University, Dept. of Computer Science.

[7]

Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. Int. Conf. Mach. Learn. 2008 (pp. 160--167).

[8]

Derényi, I., Geszti, T., & Gyöörgyi, G. (1994). Generalization in the programed teaching of a perceptron. Physical Review E, 50, 3192--3200.

[9]

Elman, J. L. (1993). Learning and development in neural networks: The importance of starting small. Cognition, 48, 781--799.

[10]

Erhan, D., Manzagol, P.-A., Bengio, Y., Bengio, S., & Vincent, P. (2009). The difficulty of training deep architectures and the effect of unsupervised pre-training. AI & Stat. '2009.

[11]

Freund, Y., & Haussler, D. (1994). Unsupervised learning of distributions on binary vectors using two layer networks (Technical Report UCSC-CRL-94-25). University of California, Santa Cruz.

[12]

Håstad, J., & Goldmann, M. (1991). On the power of small-depth threshold circuits. Computational Complexity, 1, 113--129.

[13]

Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527--1554.

[14]

Hinton, G. E., & Salakhutdinov, R. (2006). Reducing the dimensionality of data with neural networks. Science, 313, 504--507.

[15]

Krueger, K. A., & Dayan, P. (2009). Flexible shaping: how learning in small steps helps. Cognition, 110, 380--394.

[16]

Larochelle, H., Erhan, D., Courville, A., Bergstra, J., & Bengio, Y. (2007). An empirical evaluation of deep architectures on problems with many factors of variation. Int. Conf. Mach. Learn. (pp. 473--480).

[17]

Peterson, G. B. (2004). A day of great illumination: B. F. Skinner's discovery of shaping. Journal of the Experimental Analysis of Behavior, 82, 317--328.

[18]

Ranzato, M., Boureau, Y., & LeCun, Y. (2008). Sparse feature learning for deep belief networks. Adv. Neural Inf. Proc. Sys. 20 (pp. 1185--1192).

[19]

Ranzato, M., Poultney, C., Chopra, S., & LeCun, Y. (2007). Efficient learning of sparse representations with an energy-based model. Adv. Neural Inf. Proc. Sys. 19 (pp. 1137--1144).

[20]

Rohde, D., & Plaut, D. (1999). Language acquisition in the absence of explicit negative evidence: How important is starting small? Cognition, 72, 67--109.

[21]

Salakhutdinov, R., & Hinton, G. (2007). Learning a nonlinear embedding by preserving class neighbourhood structure. AI & Stat. '2007.

[22]

Salakhutdinov, R., & Hinton, G. (2008). Using Deep Belief Nets to learn covariance kernels for Gaussian processes. Adv. Neural Inf. Proc. Sys. 20 (pp. 1249--1256).

[23]

Salakhutdinov, R., Mnih, A., & Hinton, G. (2007). Restricted Boltzmann machines for collaborative filtering. Int. Conf. Mach. Learn. 2007 (pp. 791--798).

[24]

Sanger, T. D. (1994). Neural network learning control of robot manipulators using gradually increasing task difficulty. IEEE Trans. on Robotics and Automation, 10.

[25]

Schwenk, H., & Gauvain, J.-L. (2002). Connectionist language modeling for large vocabulary continuous speech recognition. International Conference on Acoustics, Speech and Signal Processing (pp. 765--768). Orlando, Florida.

[26]

Skinner, B. F. (1958). Reinforcement today. American Psychologist, 13, 94--99.

[27]

Thrun, S. (1996). Explanation-based neural network learning: A lifelong learning approach. Boston, MA: Kluwer Academic Publishers.

[28]

Vincent, P., Larochelle, H., Bengio, Y., & Manzagol, P.-A. (2008). Extracting and composing robust features with denoising autoencoders. Int. Conf. Mach. Learn. (pp. 1096--1103).

[29]

Weston, J., Ratle, F., & Collobert, R. (2008). Deep learning via semi-supervised embedding. Int. Conf. Mach. Learn. 2008 (pp. 1168--1175).

[30]

Wu, Z. (1997). Global continuation for distance geometry problems. SIAM Journal of Optimization, 7, 814--836.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

June 2009

1331 pages

Copyright © 2009 Copyright 2009 by the author(s)/owner(s).

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2009

Permissions

Request permissions for this article.

Check for updates

Qualifiers

Conference

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

Reflects downloads up to 14 Oct 2024

Other Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Affiliations

Yoshua Bengio

U. Montreal, Montreal, Canada

Jérôme Louradour

U. Montreal, Montreal, Canada and A2iA SA, Paris, France

Ronan Collobert

NEC Laboratories America, Princeton, NJ

Jason Weston

NEC Laboratories America, Princeton, NJ