Choice of Basis for Laplace Approximation (original) (raw)
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
Abstract
Maximum a posteriori optimization of parameters and the Laplace approximation for the marginal likelihood are both basis-dependent methods. This note compares two choices of basis for models parameterized by probabilities, showing that it is possible to improve on the traditional choice, the probability simplex, by transforming to the 'softmax' basis.
Article PDF
Similar content being viewed by others
References
- Bridle, J.S. (1989). Probabilistic interpretation of feedforward classification network outputs,with relationships to statistical pattern recognition. In F. Fougelman-Soulie & J. Hérault (Eds.), Neuro-computing: Algorithms, architectures and applications, Springer-Verlag.
- Chickering, D.M., & Heckerman, D. (1996). Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables (Microsoft Research Technical Report MSR-TR-96-08).
- Gelman, A. (1996). Bayesianmodel-building by pure thought: Some principles and examples. Statistica Sinica, 6, 215–232.
Google Scholar - Gelman, A., Carlin, J., Stern, H., & Rubin, D. (1995). Bayesian data analysis. London: Chapman andHall.
Google Scholar - Jeffreys, H. (1939). Theory of probability. Oxford Univ. Press (3rd edition reprinted in paperback1985).
- Lee, C.H., & Gauvain, J.L. (1993). Speaker adaptation based on MAP estimation of HMM parameters.IEEE Proceedings (pp. II-558–561).
- Lindley, D.V. (1980). Approximate Bayesian methods. InJ.M. Bernardo, M.H. DeGroot, D.V. Lindley, & A.F.M. Smith (Eds.), Bayesian statistics (pp. 223–237). Valencia: Valencia University Press.
Google Scholar - MacKay, D.J.C. (1992). A practical Bayesian framework forbackpropagation networks. Neural Computation, 4(3), 448–472.
Google Scholar - MacKay, D.J.C., (1997).Ensemble learning for hidden Markov models. Available from http://wol.ra.phy.cam.ac.uk/.
- MacKay, D.J.C., & Peto, L. (1995). A hierarchical Dirichlet language model. Natural LanguageEngineering, 1(3), 1–19.
Google Scholar - Neal, R.M. (1992). Bayesian mixture modelling. In C. Smith, G. Erickson, & P. Neudorfer (Eds.), Maximum Entropy and Bayesian Methods, Seattle 1991. (pp. 197–211). Dordrecht: Kluwer.
Google Scholar - O'Hagan, A. (1994). Bayesian Inference, volume 2B of Kendall's AdvancedTheory of Statistics. Edward Arnold.
- Ripley, B.D. (1995). Pattern recognition and neural networks.Cambridge.
- Smith, A., & Spiegelhalter, D. (1980). Bayes factors and choice criteria for linear models.Journal of the Royal Statistical Society B, 42(2), 213–220.
Google Scholar
Author information
Authors and Affiliations
- Cavendish Laboratory, Cambridge, CB3 0HE, United Kingdom
David J.C. MacKay
Rights and permissions
About this article
Cite this article
MacKay, D.J. Choice of Basis for Laplace Approximation.Machine Learning 33, 77–86 (1998). https://doi.org/10.1023/A:1007558615313
- Issue date: October 1998
- DOI: https://doi.org/10.1023/A:1007558615313