Choice of Basis for Laplace Approximation (original) (raw)

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Maximum a posteriori optimization of parameters and the Laplace approximation for the marginal likelihood are both basis-dependent methods. This note compares two choices of basis for models parameterized by probabilities, showing that it is possible to improve on the traditional choice, the probability simplex, by transforming to the 'softmax' basis.

Article PDF

References

Bridle, J.S. (1989). Probabilistic interpretation of feedforward classification network outputs,with relationships to statistical pattern recognition. In F. Fougelman-Soulie & J. Hérault (Eds.), Neuro-computing: Algorithms, architectures and applications, Springer-Verlag.
Chickering, D.M., & Heckerman, D. (1996). Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables (Microsoft Research Technical Report MSR-TR-96-08).
Gelman, A. (1996). Bayesianmodel-building by pure thought: Some principles and examples. Statistica Sinica, 6, 215–232.
Google Scholar
Gelman, A., Carlin, J., Stern, H., & Rubin, D. (1995). Bayesian data analysis. London: Chapman andHall.
Google Scholar
Jeffreys, H. (1939). Theory of probability. Oxford Univ. Press (3rd edition reprinted in paperback1985).
Lee, C.H., & Gauvain, J.L. (1993). Speaker adaptation based on MAP estimation of HMM parameters.IEEE Proceedings (pp. II-558–561).
Lindley, D.V. (1980). Approximate Bayesian methods. InJ.M. Bernardo, M.H. DeGroot, D.V. Lindley, & A.F.M. Smith (Eds.), Bayesian statistics (pp. 223–237). Valencia: Valencia University Press.
Google Scholar
MacKay, D.J.C. (1992). A practical Bayesian framework forbackpropagation networks. Neural Computation, 4(3), 448–472.
Google Scholar
MacKay, D.J.C., (1997).Ensemble learning for hidden Markov models. Available from http://wol.ra.phy.cam.ac.uk/.
MacKay, D.J.C., & Peto, L. (1995). A hierarchical Dirichlet language model. Natural LanguageEngineering, 1(3), 1–19.
Google Scholar
Neal, R.M. (1992). Bayesian mixture modelling. In C. Smith, G. Erickson, & P. Neudorfer (Eds.), Maximum Entropy and Bayesian Methods, Seattle 1991. (pp. 197–211). Dordrecht: Kluwer.
Google Scholar
O'Hagan, A. (1994). Bayesian Inference, volume 2B of Kendall's AdvancedTheory of Statistics. Edward Arnold.
Ripley, B.D. (1995). Pattern recognition and neural networks.Cambridge.
Smith, A., & Spiegelhalter, D. (1980). Bayes factors and choice criteria for linear models.Journal of the Royal Statistical Society B, 42(2), 213–220.
Google Scholar