Training products of experts by minimizing contrastive divergence - PubMed (original) (raw)
Training products of experts by minimizing contrastive divergence
Geoffrey E Hinton. Neural Comput. 2002 Aug.
Abstract
It is possible to combine multiple latent-variable models of the same data by multiplying their probability distributions together and then renormalizing. This way of combining individual "expert" models makes it hard to generate samples from the combined model but easy to infer the values of the latent variables of each expert, because the combination rule ensures that the latent variables of different experts are conditionally independent when given the data. A product of experts (PoE) is therefore an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary. Training a PoE by maximizing the likelihood of the data is difficult because it is hard even to approximate the derivatives of the renormalization term in the combination rule. Fortunately, a PoE can be trained using a different objective function called "contrastive divergence" whose derivatives with regard to the parameters can be approximated accurately and efficiently. Examples are presented of contrastive divergence learning using several types of expert on several types of data.
Similar articles
- Latent-space variational bayes.
Sung J, Ghahramani Z, Bang SY. Sung J, et al. IEEE Trans Pattern Anal Mach Intell. 2008 Dec;30(12):2236-42. doi: 10.1109/TPAMI.2008.157. IEEE Trans Pattern Anal Mach Intell. 2008. PMID: 18988955 - Site-specific updating and aggregation of Bayesian belief network models for multiple experts.
Stiber NA, Small MJ, Pantazidou M. Stiber NA, et al. Risk Anal. 2004 Dec;24(6):1529-38. doi: 10.1111/j.0272-4332.2004.00547.x. Risk Anal. 2004. PMID: 15660609 - Large-margin predictive latent subspace learning for multiview data analysis.
Chen N, Zhu J, Sun F, Xing EP. Chen N, et al. IEEE Trans Pattern Anal Mach Intell. 2012 Dec;34(12):2365-78. doi: 10.1109/TPAMI.2012.64. IEEE Trans Pattern Anal Mach Intell. 2012. PMID: 22392706 - Integration of stochastic models by minimizing alpha-divergence.
Amari S. Amari S. Neural Comput. 2007 Oct;19(10):2780-96. doi: 10.1162/neco.2007.19.10.2780. Neural Comput. 2007. PMID: 17716012 - Predicting transfer performance: a comparison of competing function learning models.
McDaniel MA, Dimperio E, Griego JA, Busemeyer JR. McDaniel MA, et al. J Exp Psychol Learn Mem Cogn. 2009 Jan;35(1):173-95. doi: 10.1037/a0013982. J Exp Psychol Learn Mem Cogn. 2009. PMID: 19210089
Cited by
- Generative Modeling of RNA Sequence Families with Restricted Boltzmann Machines.
Fernandez-de-Cossio-Diaz J. Fernandez-de-Cossio-Diaz J. Methods Mol Biol. 2025;2847:163-175. doi: 10.1007/978-1-0716-4079-1_11. Methods Mol Biol. 2025. PMID: 39312143 - Can a Hebbian-like learning rule be avoiding the curse of dimensionality in sparse distributed data?
Osório M, Sa-Couto L, Wichert A. Osório M, et al. Biol Cybern. 2024 Sep 9. doi: 10.1007/s00422-024-00995-y. Online ahead of print. Biol Cybern. 2024. PMID: 39249119 - Top-Down Priors Disambiguate Target and Distractor Features in Simulated Covert Visual Search.
Theiss JD, Silver MA. Theiss JD, et al. Neural Comput. 2024 Sep 17;36(10):2201-2224. doi: 10.1162/neco_a_01700. Neural Comput. 2024. PMID: 39141806 Free PMC article. - Training an Ising machine with equilibrium propagation.
Laydevant J, Marković D, Grollier J. Laydevant J, et al. Nat Commun. 2024 Apr 30;15(1):3671. doi: 10.1038/s41467-024-46879-4. Nat Commun. 2024. PMID: 38693108 Free PMC article. - An analytical approach for unsupervised learning rate estimation using rectified linear units.
Chen C, Golovko V, Kroshchanka A, Mikhno E, Chodyka M, Lichograj P. Chen C, et al. Front Neurosci. 2024 Apr 8;18:1362510. doi: 10.3389/fnins.2024.1362510. eCollection 2024. Front Neurosci. 2024. PMID: 38650619 Free PMC article.
LinkOut - more resources
Full Text Sources
Other Literature Sources