Curriculum learning | Proceedings of the 26th Annual International Conference on Machine Learning (original) (raw)
Published: 14 June 2009 Publication History
Abstract
Humans and animals learn much better when the examples are not randomly presented but organized in a meaningful order which illustrates gradually more concepts, and gradually more complex ones. Here, we formalize such training strategies in the context of machine learning, and call them "curriculum learning". In the context of recent research studying the difficulty of training in the presence of non-convex training criteria (for deep deterministic and stochastic neural networks), we explore curriculum learning in various set-ups. The experiments show that significant improvements in generalization can be achieved. We hypothesize that curriculum learning has both an effect on the speed of convergence of the training process to a minimum and, in the case of non-convex criteria, on the quality of the local minima obtained: curriculum learning can be seen as a particular form of continuation method (a general strategy for global optimization of non-convex functions).
References
[1]
Allgower, E. L., & Georg, K. (1980). Numerical continuation methods. An introduction. Springer-Verlag.
[2]
Bengio, Y. (2009). Learning deep architectures for AI. Foundations & Trends in Mach. Learn., to appear.
[3]
Bengio, Y., Ducharme, R., & Vincent, P. (2001). A neural probabilistic language model. Adv. Neural Inf. Proc. Sys. 13 (pp. 932--938).
[4]
Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy layer-wise training of deep networks. Adv. Neural Inf. Proc. Sys. 19 (pp. 153--160).
[5]
Cohn, D., Ghahramani, Z., & Jordan, M. (1995). Active learning with statistical models. Adv. Neural Inf. Proc. Sys. 7 (pp. 705--712).
[6]
Coleman, T., & Wu, Z. (1994). Parallel continuation-based global optimization for molecular conformation and protein folding (Technical Report). Cornell University, Dept. of Computer Science.
[7]
Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. Int. Conf. Mach. Learn. 2008 (pp. 160--167).
[8]
Derényi, I., Geszti, T., & Gyöörgyi, G. (1994). Generalization in the programed teaching of a perceptron. Physical Review E, 50, 3192--3200.
[9]
Elman, J. L. (1993). Learning and development in neural networks: The importance of starting small. Cognition, 48, 781--799.
[10]
Erhan, D., Manzagol, P.-A., Bengio, Y., Bengio, S., & Vincent, P. (2009). The difficulty of training deep architectures and the effect of unsupervised pre-training. AI & Stat. '2009.
[11]
Freund, Y., & Haussler, D. (1994). Unsupervised learning of distributions on binary vectors using two layer networks (Technical Report UCSC-CRL-94-25). University of California, Santa Cruz.
[12]
Håstad, J., & Goldmann, M. (1991). On the power of small-depth threshold circuits. Computational Complexity, 1, 113--129.
[13]
Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527--1554.
[14]
Hinton, G. E., & Salakhutdinov, R. (2006). Reducing the dimensionality of data with neural networks. Science, 313, 504--507.
[15]
Krueger, K. A., & Dayan, P. (2009). Flexible shaping: how learning in small steps helps. Cognition, 110, 380--394.
[16]
Larochelle, H., Erhan, D., Courville, A., Bergstra, J., & Bengio, Y. (2007). An empirical evaluation of deep architectures on problems with many factors of variation. Int. Conf. Mach. Learn. (pp. 473--480).
[17]
Peterson, G. B. (2004). A day of great illumination: B. F. Skinner's discovery of shaping. Journal of the Experimental Analysis of Behavior, 82, 317--328.
[18]
Ranzato, M., Boureau, Y., & LeCun, Y. (2008). Sparse feature learning for deep belief networks. Adv. Neural Inf. Proc. Sys. 20 (pp. 1185--1192).
[19]
Ranzato, M., Poultney, C., Chopra, S., & LeCun, Y. (2007). Efficient learning of sparse representations with an energy-based model. Adv. Neural Inf. Proc. Sys. 19 (pp. 1137--1144).
[20]
Rohde, D., & Plaut, D. (1999). Language acquisition in the absence of explicit negative evidence: How important is starting small? Cognition, 72, 67--109.
[21]
Salakhutdinov, R., & Hinton, G. (2007). Learning a nonlinear embedding by preserving class neighbourhood structure. AI & Stat. '2007.
[22]
Salakhutdinov, R., & Hinton, G. (2008). Using Deep Belief Nets to learn covariance kernels for Gaussian processes. Adv. Neural Inf. Proc. Sys. 20 (pp. 1249--1256).
[23]
Salakhutdinov, R., Mnih, A., & Hinton, G. (2007). Restricted Boltzmann machines for collaborative filtering. Int. Conf. Mach. Learn. 2007 (pp. 791--798).
[24]
Sanger, T. D. (1994). Neural network learning control of robot manipulators using gradually increasing task difficulty. IEEE Trans. on Robotics and Automation, 10.
[25]
Schwenk, H., & Gauvain, J.-L. (2002). Connectionist language modeling for large vocabulary continuous speech recognition. International Conference on Acoustics, Speech and Signal Processing (pp. 765--768). Orlando, Florida.
[26]
Skinner, B. F. (1958). Reinforcement today. American Psychologist, 13, 94--99.
[27]
Thrun, S. (1996). Explanation-based neural network learning: A lifelong learning approach. Boston, MA: Kluwer Academic Publishers.
[28]
Vincent, P., Larochelle, H., Bengio, Y., & Manzagol, P.-A. (2008). Extracting and composing robust features with denoising autoencoders. Int. Conf. Mach. Learn. (pp. 1096--1103).
[29]
Weston, J., Ratle, F., & Collobert, R. (2008). Deep learning via semi-supervised embedding. Int. Conf. Mach. Learn. 2008 (pp. 1168--1175).
[30]
Wu, Z. (1997). Global continuation for distance geometry problems. SIAM Journal of Optimization, 7, 814--836.
Information & Contributors
Information
Published In
ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
June 2009
1331 pages
Copyright © 2009 Copyright 2009 by the author(s)/owner(s).
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Published: 14 June 2009
Permissions
Request permissions for this article.
Check for updates
Qualifiers
- Research-article
Conference
Acceptance Rates
Overall Acceptance Rate 140 of 548 submissions, 26%
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- View Citations
- Downloads (Last 12 months)2,126
- Downloads (Last 6 weeks)291
Reflects downloads up to 14 Oct 2024
Other Metrics
Citations
- Fu YChen XXu SLi JYao XHuang ZWang Y(2025)GSSCL: A framework for Graph Self-Supervised Curriculum Learning based on clustering label smoothingNeural Networks10.1016/j.neunet.2024.106787181(106787)Online publication date: Jan-2025
- Lemos MVieira RTavares AMarcolino LChaimowicz L(2025)Enhancing deep reinforcement learning for scale flexibility in real-time strategy gamesEntertainment Computing10.1016/j.entcom.2024.10084352(100843)Online publication date: Jan-2025
- Zhou DLi SDong LChen RPeng XYao H(2025)C-KGE: Curriculum learning-based Knowledge Graph EmbeddingComputer Speech & Language10.1016/j.csl.2024.10168989(101689)Online publication date: Jan-2025
- Choi MShin WKim MPark HYou YLee MOh H(2024)Collective Navigation Through a Narrow Gap for a Swarm of UAVs Using Curriculum-Based Deep Reinforcement LearningJournal of Korea Robotics Society10.7746/jkros.2024.19.1.11719:1(117-129)Online publication date: 29-Feb-2024
- Hooshyar SElahi A(2024)Sequencing Initial Conditions in Physics-Informed Neural NetworksJournal of Chemistry and Environment10.56946/jce.v3i1.3453:1(98-108)Online publication date: 26-Mar-2024
- Yang YZhou THan LFang MPechenizkiy MDastani MSichman JAlechina NDignum V(2024)Automatic Curriculum for Unsupervised Reinforcement LearningProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663064(2002-2010)Online publication date: 6-May-2024
- Weber JGiriyan DParkar DBertsekas DRicha ADastani MSichman JAlechina NDignum V(2024)Distributed Online Rollout for Multivehicle Routing in Unmapped EnvironmentsProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663054(1910-1918)Online publication date: 6-May-2024
- Shperberg SLiu BStone PDastani MSichman JAlechina NDignum V(2024)Relaxed Exploration Constrained Reinforcement LearningProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663034(1727-1735)Online publication date: 6-May-2024
- Tang WWu YZhou X(2024)Discrete-time simulated annealing: A convergence analysis via the Eyring–Kramers lawNumerical Algebra, Control and Optimization10.3934/naco.2024015(0-0)Online publication date: 2024
- Bussola RFocchi MDel Prete AFontanelli DPalopoli L(2024)Efficient Reinforcement Learning for 3D Jumping MonopodsSensors10.3390/s2415498124:15(4981)Online publication date: 1-Aug-2024
- Show More Cited By
View Options
Get Access
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Full Access
View options
View or Download as a PDF file.
eReader
View online with eReader.
Media
Figures
Other
Tables
Affiliations
Yoshua Bengio
U. Montreal, Montreal, Canada
Jérôme Louradour
U. Montreal, Montreal, Canada and A2iA SA, Paris, France
Ronan Collobert
NEC Laboratories America, Princeton, NJ
Jason Weston
NEC Laboratories America, Princeton, NJ