Curriculum learning | Proceedings of the 26th Annual International Conference on Machine Learning (original) (raw)

Published: 14 June 2009 Publication History

Abstract

Humans and animals learn much better when the examples are not randomly presented but organized in a meaningful order which illustrates gradually more concepts, and gradually more complex ones. Here, we formalize such training strategies in the context of machine learning, and call them "curriculum learning". In the context of recent research studying the difficulty of training in the presence of non-convex training criteria (for deep deterministic and stochastic neural networks), we explore curriculum learning in various set-ups. The experiments show that significant improvements in generalization can be achieved. We hypothesize that curriculum learning has both an effect on the speed of convergence of the training process to a minimum and, in the case of non-convex criteria, on the quality of the local minima obtained: curriculum learning can be seen as a particular form of continuation method (a general strategy for global optimization of non-convex functions).

References

[1]

Allgower, E. L., & Georg, K. (1980). Numerical continuation methods. An introduction. Springer-Verlag.

[2]

Bengio, Y. (2009). Learning deep architectures for AI. Foundations & Trends in Mach. Learn., to appear.

[3]

Bengio, Y., Ducharme, R., & Vincent, P. (2001). A neural probabilistic language model. Adv. Neural Inf. Proc. Sys. 13 (pp. 932--938).

[4]

Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy layer-wise training of deep networks. Adv. Neural Inf. Proc. Sys. 19 (pp. 153--160).

[5]

Cohn, D., Ghahramani, Z., & Jordan, M. (1995). Active learning with statistical models. Adv. Neural Inf. Proc. Sys. 7 (pp. 705--712).

[6]

Coleman, T., & Wu, Z. (1994). Parallel continuation-based global optimization for molecular conformation and protein folding (Technical Report). Cornell University, Dept. of Computer Science.

[7]

Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. Int. Conf. Mach. Learn. 2008 (pp. 160--167).

[8]

Derényi, I., Geszti, T., & Gyöörgyi, G. (1994). Generalization in the programed teaching of a perceptron. Physical Review E, 50, 3192--3200.

[9]

Elman, J. L. (1993). Learning and development in neural networks: The importance of starting small. Cognition, 48, 781--799.

[10]

Erhan, D., Manzagol, P.-A., Bengio, Y., Bengio, S., & Vincent, P. (2009). The difficulty of training deep architectures and the effect of unsupervised pre-training. AI & Stat. '2009.

[11]

Freund, Y., & Haussler, D. (1994). Unsupervised learning of distributions on binary vectors using two layer networks (Technical Report UCSC-CRL-94-25). University of California, Santa Cruz.

[12]

Håstad, J., & Goldmann, M. (1991). On the power of small-depth threshold circuits. Computational Complexity, 1, 113--129.

[13]

Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527--1554.

[14]

Hinton, G. E., & Salakhutdinov, R. (2006). Reducing the dimensionality of data with neural networks. Science, 313, 504--507.

[15]

Krueger, K. A., & Dayan, P. (2009). Flexible shaping: how learning in small steps helps. Cognition, 110, 380--394.

[16]

Larochelle, H., Erhan, D., Courville, A., Bergstra, J., & Bengio, Y. (2007). An empirical evaluation of deep architectures on problems with many factors of variation. Int. Conf. Mach. Learn. (pp. 473--480).

[17]

Peterson, G. B. (2004). A day of great illumination: B. F. Skinner's discovery of shaping. Journal of the Experimental Analysis of Behavior, 82, 317--328.

[18]

Ranzato, M., Boureau, Y., & LeCun, Y. (2008). Sparse feature learning for deep belief networks. Adv. Neural Inf. Proc. Sys. 20 (pp. 1185--1192).

[19]

Ranzato, M., Poultney, C., Chopra, S., & LeCun, Y. (2007). Efficient learning of sparse representations with an energy-based model. Adv. Neural Inf. Proc. Sys. 19 (pp. 1137--1144).

[20]

Rohde, D., & Plaut, D. (1999). Language acquisition in the absence of explicit negative evidence: How important is starting small? Cognition, 72, 67--109.

[21]

Salakhutdinov, R., & Hinton, G. (2007). Learning a nonlinear embedding by preserving class neighbourhood structure. AI & Stat. '2007.

[22]

Salakhutdinov, R., & Hinton, G. (2008). Using Deep Belief Nets to learn covariance kernels for Gaussian processes. Adv. Neural Inf. Proc. Sys. 20 (pp. 1249--1256).

[23]

Salakhutdinov, R., Mnih, A., & Hinton, G. (2007). Restricted Boltzmann machines for collaborative filtering. Int. Conf. Mach. Learn. 2007 (pp. 791--798).

[24]

Sanger, T. D. (1994). Neural network learning control of robot manipulators using gradually increasing task difficulty. IEEE Trans. on Robotics and Automation, 10.

[25]

Schwenk, H., & Gauvain, J.-L. (2002). Connectionist language modeling for large vocabulary continuous speech recognition. International Conference on Acoustics, Speech and Signal Processing (pp. 765--768). Orlando, Florida.

[26]

Skinner, B. F. (1958). Reinforcement today. American Psychologist, 13, 94--99.

[27]

Thrun, S. (1996). Explanation-based neural network learning: A lifelong learning approach. Boston, MA: Kluwer Academic Publishers.

[28]

Vincent, P., Larochelle, H., Bengio, Y., & Manzagol, P.-A. (2008). Extracting and composing robust features with denoising autoencoders. Int. Conf. Mach. Learn. (pp. 1096--1103).

[29]

Weston, J., Ratle, F., & Collobert, R. (2008). Deep learning via semi-supervised embedding. Int. Conf. Mach. Learn. 2008 (pp. 1168--1175).

[30]

Wu, Z. (1997). Global continuation for distance geometry problems. SIAM Journal of Optimization, 7, 814--836.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

June 2009

1331 pages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2009

Permissions

Request permissions for this article.

Check for updates

Qualifiers

Research-article

Conference

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

View Citations
Downloads (Last 12 months)2,126
Downloads (Last 6 weeks)291

Reflects downloads up to 14 Oct 2024

Other Metrics

Citations

Fu YChen XXu SLi JYao XHuang ZWang Y(2025)GSSCL: A framework for Graph Self-Supervised Curriculum Learning based on clustering label smoothingNeural Networks10.1016/j.neunet.2024.106787181(106787)Online publication date: Jan-2025
Lemos MVieira RTavares AMarcolino LChaimowicz L(2025)Enhancing deep reinforcement learning for scale flexibility in real-time strategy gamesEntertainment Computing10.1016/j.entcom.2024.10084352(100843)Online publication date: Jan-2025
Zhou DLi SDong LChen RPeng XYao H(2025)C-KGE: Curriculum learning-based Knowledge Graph EmbeddingComputer Speech & Language10.1016/j.csl.2024.10168989(101689)Online publication date: Jan-2025
Choi MShin WKim MPark HYou YLee MOh H(2024)Collective Navigation Through a Narrow Gap for a Swarm of UAVs Using Curriculum-Based Deep Reinforcement LearningJournal of Korea Robotics Society10.7746/jkros.2024.19.1.11719:1(117-129)Online publication date: 29-Feb-2024
Hooshyar SElahi A(2024)Sequencing Initial Conditions in Physics-Informed Neural NetworksJournal of Chemistry and Environment10.56946/jce.v3i1.3453:1(98-108)Online publication date: 26-Mar-2024
Yang YZhou THan LFang MPechenizkiy MDastani MSichman JAlechina NDignum V(2024)Automatic Curriculum for Unsupervised Reinforcement LearningProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663064(2002-2010)Online publication date: 6-May-2024
Weber JGiriyan DParkar DBertsekas DRicha ADastani MSichman JAlechina NDignum V(2024)Distributed Online Rollout for Multivehicle Routing in Unmapped EnvironmentsProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663054(1910-1918)Online publication date: 6-May-2024
Shperberg SLiu BStone PDastani MSichman JAlechina NDignum V(2024)Relaxed Exploration Constrained Reinforcement LearningProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663034(1727-1735)Online publication date: 6-May-2024
Tang WWu YZhou X(2024)Discrete-time simulated annealing: A convergence analysis via the Eyring–Kramers lawNumerical Algebra, Control and Optimization10.3934/naco.2024015(0-0)Online publication date: 2024
Bussola RFocchi MDel Prete AFontanelli DPalopoli L(2024)Efficient Reinforcement Learning for 3D Jumping MonopodsSensors10.3390/s2415498124:15(4981)Online publication date: 1-Aug-2024
Show More Cited By

View Options

Get Access

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Affiliations

Yoshua Bengio

U. Montreal, Montreal, Canada

Jérôme Louradour

U. Montreal, Montreal, Canada and A2iA SA, Paris, France

Ronan Collobert

NEC Laboratories America, Princeton, NJ

Jason Weston

NEC Laboratories America, Princeton, NJ

Curriculum learning | Proceedings of the 26th Annual International Conference on Machine Learning (original) (raw)

Abstract

References

Information & Contributors

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

Other Metrics

Citations

View Options

Get Access

Login options

Full Access

View options

PDF

eReader

Media

Figures

Other

Tables

Affiliations