Dice, entropy, and likelihood (original) (raw)
2000, Proceedings of the IEEE
https://doi.org/10.1109/PROC.1985.13369
Sign up for access to the world's latest research
checkGet notified about relevant papers
checkSave papers to use in your research
checkJoin the discussion with peers
checkTrack your impact
Abstract
We show that a famous die experiment used by E. T.)dynes as intuitive justification of the need for maximum entropy (ME) estimation admits, in fact, of solutions by classical, Bayesian estimation. The Bayesian answers are the maximum probable (m.a.p.) and posterior mean solutions to the problem. These depart radically from the ME solution, and are also much more probable answers.
Related papers
Guesswork is not a substitute for Entropy
2005
Shannon entropy is often considered as a measure of uncertainty. It is commonly believed that entropy is a good measure of how many guesses it will take to correctly guess a single value generated by a source. This belief is not well founded. We summarise some work in this area, explore how this belief may have arisen via the asymptotic equipartition property and outline a hands-on calculation for guesswork asymptotics.
Relative Entropy and Statistics
Computing Research Repository - CORR, 2008
Formalising the confrontation of opinions (models) to observations (data) is the task of Inferential Statistics. Information Theory provides us with a basic functional, the relative entropy (or Kullback-Leibler divergence), an asymmetrical measure of dissimilarity between the empirical and the theoretical distributions. The formal properties of the relative entropy turn out to be able to capture every aspect of Inferential Statistics, as illustrated here, for simplicity, on dices (= i.i.d. process with finitely many outcomes): refutability (strict or probabilistic): the asymmetry data / models; small deviations: rejecting a single hypothesis; competition between hypotheses and model selection; maximum likelihood: model inference and its limits; maximum entropy: reconstructing partially observed data; EM-algorithm; flow data and gravity modelling; determining the order of a Markov chain.
Bayesian Inference and Maximum Entropy Methods in Science and Engineering
Springer proceedings in mathematics & statistics, 2018
This book series features volumes composed of selected contributions from workshops and conferences in all areas of current research in mathematics and statistics, including operation research and optimization. In addition to an overall evaluation of the interest, scientific quality, and timeliness of each proposal at the hands of the publisher, individual contributions are all refereed to the high quality standards of leading journals in the field. Thus, this series provides the research community with well-edited, authoritative reports on developments in the most exciting areas of mathematical and statistical research today.
Information vs. Entropy vs. Probability
Information, entropy, probability: these three terms are closely interconnected in the prevalent understanding of statistical mechanics, both when this field is taught to students at an introductory level and in advanced research into the field's foundations. This paper examines the interconnection between these three notions in light of recent research in the foundations of statistical mechanics. It disentangles these concepts and highlights their differences, at the same time explaining why they came to be so closely linked in the literature. In the literature the term 'information' is often linked to entropy and probability in discussions of Maxwell's Demon and its attempted exorcism by the Landauer-Bennett thesis, and in analyses of the spin echo experiments. the direction taken in the present paper is a different one. Here I discuss the mechanical underpinning of the notions of probability and entropy, and this constructive approach shows that information plays no fundamental role in these concepts, although it can be conveniently used in a sense that I will specify.
Information theory and statistical mechanics. II
Physical review, 1957
Information theory provides a constructive criterion for setting up probability distributions on the basis of partial knowledge, and leads to a type of statistical inference which is called the maximum.entropy estimate. It is the least biased estimate possible on the given information; i.e., it is maximally noncommittal with regard to missing information. If one considers statistical mechanics as a form of statistical inference rather than as a physical theory, it is found that the usual computational rules, starting with the determination of the partition function, are an immediate consequence of the maximum-entropy principle. In the resulting "subjective statistical mechanics," the usual rules are thus justified independently of any physical argument, and in particular independently of experimental verification; whether
Many Faces of Entropy or Bayesian Statistical Mechanics
Some 80–90 years ago, George A. Linhart, unlike A. Einstein, P. Debye, M. Planck and W. Nernst, managed to derive a very simple, but ultimately general mathematical formula for heat capacity versus temperature from fundamental thermodynamic principles, using what we would nowadays dub a “Bayesian approach to probability”. Moreover, he successfully applied his result to fit the experimental data for diverse substances in their solid state over a rather broad temperature range. Nevertheless, Linhart’s work was undeservedly forgotten, although it represents a valid and fresh standpoint on thermodynamics and statistical physics, which may have a significant implication for academic and applied science.
Yet Another Analysis of Dice Problems
AIP Conference Proceedings, 2003
During the MaxEnt 2002 workshop in Moscow, Idaho, Tony Vignaux asked again a few simple questions about using Maximum Entropy or Bayesian approaches for the famous Dice problems which have been analyzed many times through this workshop and also in other places. Here, there is another analysis of these problems. I hope that, this paper will answer a few questions of Tony and other participants of the workshop on the situations where we can use Maximum Entropy or Bayesian approaches or even the cases where we can actually use both of them.
The Statistical Foundations of Entropy
Entropy
During the last few decades, the notion of entropy has become omnipresent in many scientific disciplines, ranging from traditional applications in statistical physics and chemistry, information theory, and statistical estimation to more recent applications in biology, astrophysics, geology, financial markets, or social networks [...]
Information Theory and Statistical Mechanics
Physical Review, 1957
Information theory provides a constructive criterion for setting up probability distributions on the basis of partial knowledge, and leads to a type of statistical inference which is called the maximum.entropy estimate. It is the least biased estimate possible on the given information; i.e., it is maximally noncommittal with regard to missing information. If one considers statistical mechanics as a form of statistical inference rather than as a physical theory, it is found that the usual computational rules, starting with the determination of the partition function, are an immediate consequence of the maximum-entropy principle. In the resulting "subjective statistical mechanics," the usual rules are thus justified independently of any physical argument, and in particular independently of experimental verification; whether
A critique of Jaynes' maximum entropy principle
Advances in Applied Mathematics, 1981
Friedman and Shimony exhibited an anomaly in Jaynes' maximum entropy prescription: that if a certain unknown parameter is assumed to be characterized a priori by a normalizable probability measure, then the prior and posterior probabilities computed by means of the prescription are consistent with probability theory only if this measure assigns probability I to a single value of the parameter and probability 0 to the entire range of other values. We strengthen this result by deriving the same conclusion using only the assumption that the probability measure is u-finite. We also show that when the hypothesis and evidence to which the prescription is applied are expressed in certain rather simple languages, then the maximum entropy prescription yields probability evaluation in agreement with one of Catnap's X-continuum of inductive methods, namely X = 00. We conclude that the maximum entropy prescription is correct only under special circumstances, which are essentially those in which it is appropriate to use h = co.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (19)
- J. P. Burg, "Maximum entropy spectral analysis," presented at the 37th Annu. Int. Meet. Soc. Explor. Ceophys., Oklahoma City, OK, Oct. 1%7.
- J. C. Ables, "Maximum entropy spectral analysis," Astron.
- B. R. Frieden, "Restoring with maximum likelihood and maxi- 1972. mum entropy," 1. Opt. Soc. Amer., vol. 62, pp. 511-518, Apr.
- R. Kikuchi and B. H. Soffer, "Maximum entropy image restora- tion. 1. The entropy expression," J. Opt. Soc. Amer., vol. 67, pp. 1656-1665, Dec. 1977.
- B. R. Frieden, "Unified theory for estimating frequency-of-oc- currence laws and optical objects," J. Opt. Soc. Amer., vol. 73,
- C. Minerbo, "MENT: A maximum entropy algorithm for re- constructing a source from projection data," Comp. Graphics /mag. Processing, vol. IO, pp. 48-68, 1979.
- B. R. Frieden, "Image restoration using a norm of maximum 1980. information," Opt.
- Eng., vol. 19, pp. 290-296, May/June,
- C. E. Shannon, "A mathematical theory of communication," Bell Syst. Tech. J., vol. 27, pp. 379-423, 623-656, 1948.
- I. J. Good, "Maximum entropy for hypothesis formulation, especially for multidimensional contingency tables," Ann.
- E. T. Jaynes, "Prior probabilities," /€E€ Trans. Syst. Sci. Cybern., AstrophyS., SUPPI. 15, pp. 383-393, 1974. pp. 927-938, July 1983. Math. Stat., VOI. 34, pp. 911-934, 1%3. VOI. ssc-4. DD. 227-241, 1968.
- S. Kullback, ' lnformatidn Theory and Statistics. New York: Wiley, 1959.
- E. T. Jaynes, "On the rationale of maximum-entropy methods," 1131 J. MacQueen and J. Marschak, "Partial knowledge, entropy, and estimation," Proc. Nat. Acad. Sci. (USA), vol. 72, pp.
- E. R. Frieden, Probability, Statistical Optics and Data Testing. New York: Springer-Verlag, 1983.
- M. H. de Croot, Optimal Statistical Decisions. New York:
- McCraw-Hill, 1970.
- See, e.g., H. L. Van Trees, Detection, Estimation, andModula- tion Theory, Part 1. New York: Wiley, 1968.
- H. Theil and K. Laitinen, "Singular moment matrices in ap- Analysis-V. Amsterdam: North-Holland, 1980, pp. 629-649.
- plied econometrics," in P. R. Krishnaiah, Ed., Multivariate [I81 Y. Tikochinsky, N. Z. Tishby, and R. D. Levine, "Consistent inference of probabilities for reproducible experiments," Phys. Rev. Lett., vol. 52, pp. 1357-1360, 1984. PrOC. I€€€, VOI. 70, pp. 939-952, 1982. 3819-3824, Oct. 1975.
Related papers
Entropy, Information, and the Updating of Probabilities
Entropy
This paper is a review of a particular approach to the method of maximum entropy as a general framework for inference. The discussion emphasizes pragmatic elements in the derivation. An epistemic notion of information is defined in terms of its relation to the Bayesian beliefs of ideally rational agents. The method of updating from a prior to posterior probability distribution is designed through an eliminative induction process. The logarithmic relative entropy is singled out as a unique tool for updating (a) that is of universal applicability, (b) that recognizes the value of prior information, and (c) that recognizes the privileged role played by the notion of independence in science. The resulting framework—the ME method—can handle arbitrary priors and arbitrary constraints. It includes the MaxEnt and Bayes’ rules as special cases and, therefore, unifies entropic and Bayesian methods into a single general inference scheme. The ME method goes beyond the mere selection of a single...
Topics in Bayesian statistics and maximum entropy
1998
Notions of Bayesian decision theory and maximum entropy methods are reviewed with particular emphasis on probabilistic inference and Bayesian modeling, The axiomatic approach is considered as the best justification of Bayesian analysis and maximum entropy principle applied in natural sciences. Particular emphasis is put on solving the inverse problem in digital image restoration and Bayesian modeling of neural networks. Further topics addressed briefly include language modeling, neutron scattering, multiuser detection and channel equalization in digital communications, genetic information, and Bayesian court decision-making.
Objective Bayesianism and the Maximum Entropy Principle
Entropy, 2013
Objective Bayesian epistemology invokes three norms: the strengths of our beliefs should be probabilities; they should be calibrated to our evidence of physical probabilities; and they should otherwise equivocate sufficiently between the basic propositions that we can express. The three norms are sometimes explicated by appealing to the maximum entropy principle, which says that a belief function should be a probability function, from all those that are calibrated to evidence, that has maximum entropy. However, the three norms of objective Bayesianism are usually justified in different ways. In this paper, we show that the three norms can all be subsumed under a single justification in terms of minimising worst-case expected loss. This, in turn, is equivalent to maximising a generalised notion of entropy. We suggest that requiring language invariance, in addition to minimising worst-case expected loss, motivates maximisation of standard entropy as opposed to maximisation of other instances of generalised entropy. Our argument also provides a qualified justification for updating degrees of belief by Bayesian conditionalisation. However, conditional probabilities play a less central part in the objective Bayesian account than they do under the subjective view of Bayesianism, leading to a reduced role for Bayes' Theorem.
2008
Notions of Bayesian decision theory and maximum entropy methods are reviewed with particular emphasis on probabilistic inference and Bayesian modeling. The axiomatic approach is considered as the best justification of Bayesian analysis and maximum entropy principle applied in natural sciences. Solving the inverse problem in digital image restoration and Bayesian modeling of neural networks are discussed in detail. Further topics addressed briefly include language modeling, neutron scattering, multiuser detection and channel equalization in digital communications, genetic information, and Bayesian court decision-making.
Entropy and Inference, Revisited
Advances in Neural Information Processing Systems 14, 2002
We study properties of popular near-uniform (Dirichlet) priors for learning undersampled probability distributions on discrete nonmetric spaces and show that they lead to disastrous results. However, an Occam-style phase space argument expands the priors into their infinite mixture and resolves most of the observed problems. This leads to a surprisingly good estimator of entropies of discrete distributions.
Lectures on Probability, Entropy, and Statistical Physics
Corr, 2008
These lectures deal with the problem of inductive inference, that is, the problem of reasoning under conditions of incomplete information. Is there a general method for handling uncertainty? Or, at least, are there rules that could in principle be followed by an ideally rational mind when discussing scientific matters? What makes one statement more plausible than another? How much more plausible? And then, when new information is acquired how do we change our minds? Or, to put it differently, are there rules for learning? Are there rules for processing information that are objective and consistent? Are they unique? And, come to think of it, what, after all, is information? It is clear that data contains or conveys information, but what does this precisely mean? Can information be conveyed in other ways? Is information physical? Can we measure amounts of information? Do we need to? Our goal is to develop the main tools for inductive inference--probability and entropy--from a thoroughly Bayesian point of view and to illustrate their use in physics with examples borrowed from the foundations of classical statistical physics.
On The Relationship between Bayesian and Maximum Entropy Inference
AIP Conference Proceedings, 2004
We investigate Bayesian and Maximum Entropy methods for doing inference under uncertainty. This investigation is primarily through concrete examples that have been previously investigated in the literature. We find that it is possible to do Bayesian and MaxEnt inference using the same information, despite claims to the contrary, and that they lead to different results. We find that these differences are due to the Bayesian inference not assuming anything beyond the given prior probabilities and the data, whereas MaxEnt implicitly makes strong independence assumptions, and assumes that the given constraints are the only ones operating. We also show that maximum likelihood and maximum a posteriori estimators give different and misleading estimates in our examples compared to posterior mean estimates. We generalize the classic method of maximum entropy inference to allow for uncertainty in the constraint values. This generalized MaxEnt (GME) makes MaxEnt inference applicable to a much wider range of problems, and makes direct comparison between Bayesian and MaxEnt inference possible. Also, we show that MaxEnt is a generalized principle of independence, and this property is what makes it the preferred inference method in many cases.
Entropy concentration and the empirical coding game
Statistica Neerlandica, 2008
We give a characterization of Maximum Entropy/Minimum Relative Entropy inference by providing two 'strong entropy concentration' theorems. These theorems unify and generalize Jaynes' 'concentration phenomenon' and Van Campenhout and Cover's 'conditional limit theorem'. The theorems characterize exactly in what sense a prior distribution Q conditioned on a given constraint and the distributionP minimizing D(P ||Q) over all P satisfying the constraint are 'close' to each other. We then apply our theorems to establish the relationship between entropy concentration and a game-theoretic characterization of Maximum Entropy Inference due to Topsøe and others.
Maximum entropy and Bayesian data analysis: Entropic prior distributions
Physical Review E, 2004
The problem of assigning probability distributions which objectively reflect the prior information available about experiments is one of the major stumbling blocks in the use of Bayesian methods of data analysis. In this paper the method of Maximum (relative) Entropy (ME) is used to translate the information contained in the known form of the likelihood into a prior distribution for Bayesian inference. The argument is inspired and guided by intuition gained from the successful use of ME methods in statistical mechanics. For experiments that cannot be repeated the resulting "entropic prior" is formally identical with the Einstein fluctuation formula. For repeatable experiments, however, the expected value of the entropy of the likelihood turns out to be relevant information that must be included in the analysis. The important case of a Gaussian likelihood is treated in detail.