Dice, entropy, and likelihood (original) (raw)

Entropy, Information, and the Updating of Probabilities

Entropy

This paper is a review of a particular approach to the method of maximum entropy as a general framework for inference. The discussion emphasizes pragmatic elements in the derivation. An epistemic notion of information is defined in terms of its relation to the Bayesian beliefs of ideally rational agents. The method of updating from a prior to posterior probability distribution is designed through an eliminative induction process. The logarithmic relative entropy is singled out as a unique tool for updating (a) that is of universal applicability, (b) that recognizes the value of prior information, and (c) that recognizes the privileged role played by the notion of independence in science. The resulting framework—the ME method—can handle arbitrary priors and arbitrary constraints. It includes the MaxEnt and Bayes’ rules as special cases and, therefore, unifies entropic and Bayesian methods into a single general inference scheme. The ME method goes beyond the mere selection of a single...

Topics in Bayesian statistics and maximum entropy

1998

Notions of Bayesian decision theory and maximum entropy methods are reviewed with particular emphasis on probabilistic inference and Bayesian modeling, The axiomatic approach is considered as the best justification of Bayesian analysis and maximum entropy principle applied in natural sciences. Particular emphasis is put on solving the inverse problem in digital image restoration and Bayesian modeling of neural networks. Further topics addressed briefly include language modeling, neutron scattering, multiuser detection and channel equalization in digital communications, genetic information, and Bayesian court decision-making.

Objective Bayesianism and the Maximum Entropy Principle

Entropy, 2013

Objective Bayesian epistemology invokes three norms: the strengths of our beliefs should be probabilities; they should be calibrated to our evidence of physical probabilities; and they should otherwise equivocate sufficiently between the basic propositions that we can express. The three norms are sometimes explicated by appealing to the maximum entropy principle, which says that a belief function should be a probability function, from all those that are calibrated to evidence, that has maximum entropy. However, the three norms of objective Bayesianism are usually justified in different ways. In this paper, we show that the three norms can all be subsumed under a single justification in terms of minimising worst-case expected loss. This, in turn, is equivalent to maximising a generalised notion of entropy. We suggest that requiring language invariance, in addition to minimising worst-case expected loss, motivates maximisation of standard entropy as opposed to maximisation of other instances of generalised entropy. Our argument also provides a qualified justification for updating degrees of belief by Bayesian conditionalisation. However, conditional probabilities play a less central part in the objective Bayesian account than they do under the subjective view of Bayesianism, leading to a reduced role for Bayes' Theorem.

Bayesian Maximum Entropy

2008

Notions of Bayesian decision theory and maximum entropy methods are reviewed with particular emphasis on probabilistic inference and Bayesian modeling. The axiomatic approach is considered as the best justification of Bayesian analysis and maximum entropy principle applied in natural sciences. Solving the inverse problem in digital image restoration and Bayesian modeling of neural networks are discussed in detail. Further topics addressed briefly include language modeling, neutron scattering, multiuser detection and channel equalization in digital communications, genetic information, and Bayesian court decision-making.

Entropy and Inference, Revisited

Advances in Neural Information Processing Systems 14, 2002

We study properties of popular near-uniform (Dirichlet) priors for learning undersampled probability distributions on discrete nonmetric spaces and show that they lead to disastrous results. However, an Occam-style phase space argument expands the priors into their infinite mixture and resolves most of the observed problems. This leads to a surprisingly good estimator of entropies of discrete distributions.

Guesswork is not a substitute for Entropy

2005

Shannon entropy is often considered as a measure of uncertainty. It is commonly believed that entropy is a good measure of how many guesses it will take to correctly guess a single value generated by a source. This belief is not well founded. We summarise some work in this area, explore how this belief may have arisen via the asymptotic equipartition property and outline a hands-on calculation for guesswork asymptotics.

Relative Entropy and Statistics

Computing Research Repository - CORR, 2008

Formalising the confrontation of opinions (models) to observations (data) is the task of Inferential Statistics. Information Theory provides us with a basic functional, the relative entropy (or Kullback-Leibler divergence), an asymmetrical measure of dissimilarity between the empirical and the theoretical distributions. The formal properties of the relative entropy turn out to be able to capture every aspect of Inferential Statistics, as illustrated here, for simplicity, on dices (= i.i.d. process with finitely many outcomes): refutability (strict or probabilistic): the asymmetry data / models; small deviations: rejecting a single hypothesis; competition between hypotheses and model selection; maximum likelihood: model inference and its limits; maximum entropy: reconstructing partially observed data; EM-algorithm; flow data and gravity modelling; determining the order of a Markov chain.

Bayesian Inference and Maximum Entropy Methods in Science and Engineering

Springer proceedings in mathematics & statistics, 2018

This book series features volumes composed of selected contributions from workshops and conferences in all areas of current research in mathematics and statistics, including operation research and optimization. In addition to an overall evaluation of the interest, scientific quality, and timeliness of each proposal at the hands of the publisher, individual contributions are all refereed to the high quality standards of leading journals in the field. Thus, this series provides the research community with well-edited, authoritative reports on developments in the most exciting areas of mathematical and statistical research today.

Information vs. Entropy vs. Probability

Information, entropy, probability: these three terms are closely interconnected in the prevalent understanding of statistical mechanics, both when this field is taught to students at an introductory level and in advanced research into the field's foundations. This paper examines the interconnection between these three notions in light of recent research in the foundations of statistical mechanics. It disentangles these concepts and highlights their differences, at the same time explaining why they came to be so closely linked in the literature. In the literature the term 'information' is often linked to entropy and probability in discussions of Maxwell's Demon and its attempted exorcism by the Landauer-Bennett thesis, and in analyses of the spin echo experiments. the direction taken in the present paper is a different one. Here I discuss the mechanical underpinning of the notions of probability and entropy, and this constructive approach shows that information plays no fundamental role in these concepts, although it can be conveniently used in a sense that I will specify.

Information theory and statistical mechanics. II

Physical review, 1957

Information theory provides a constructive criterion for setting up probability distributions on the basis of partial knowledge, and leads to a type of statistical inference which is called the maximum.entropy estimate. It is the least biased estimate possible on the given information; i.e., it is maximally noncommittal with regard to missing information. If one considers statistical mechanics as a form of statistical inference rather than as a physical theory, it is found that the usual computational rules, starting with the determination of the partition function, are an immediate consequence of the maximum-entropy principle. In the resulting "subjective statistical mechanics," the usual rules are thus justified independently of any physical argument, and in particular independently of experimental verification; whether