Maximum entropy inference as a special case of conditionalization (original) (raw)

Relative Entropy and Inductive Inference

AIP Conference Proceedings, 2004

We discuss how the method of maximum entropy, MaxEnt, can be extended beyond its original scope, as a rule to assign a probability distribution, to a full-fledged method for inductive inference. The main concept is the (relative) entropy S[p|q] which is designed as a tool to update from a prior probability distribution q to a posterior probability distribution p when new information in the form of a constraint becomes available. The extended method goes beyond the mere selection of a single posterior p, but also addresses the question of how much less probable other distributions might be. Our approach clarifies how the entropy S[p|q] is used while avoiding the question of its meaning. Ultimately, entropy is a tool for induction which needs no interpretation. Finally, being a tool for generalization from special examples, we ask whether the functional form of the entropy depends on the choice of the examples and we find that it does. The conclusion is that there is no single general theory of inductive inference and that alternative expressions for the entropy are possible.

On the axiomatic approach to the maximum entropy principle of inference

Pramana, 1986

Recent axiomatic derivations of the maximum entropy principle from consistency conditions are critically examined. We show that proper application of cons'mtency conditions alone allows a wider class of functionals, essentially of the form S dx p(x) [p(x)/o(x)]', for some real number s, to be used for inductive inference and the commonly used formS dx p (x) In [p (x)/o (x)] is only a particular case. The role of the prior density 0 (x) is clarified. It is possible to regard it as a geometric factor, describing the coordinate system used and it does not represent information of the same kind as obtained by measurements on the system in the form of expectation values.

Entropy, Information, and the Updating of Probabilities

Entropy

This paper is a review of a particular approach to the method of maximum entropy as a general framework for inference. The discussion emphasizes pragmatic elements in the derivation. An epistemic notion of information is defined in terms of its relation to the Bayesian beliefs of ideally rational agents. The method of updating from a prior to posterior probability distribution is designed through an eliminative induction process. The logarithmic relative entropy is singled out as a unique tool for updating (a) that is of universal applicability, (b) that recognizes the value of prior information, and (c) that recognizes the privileged role played by the notion of independence in science. The resulting framework—the ME method—can handle arbitrary priors and arbitrary constraints. It includes the MaxEnt and Bayes’ rules as special cases and, therefore, unifies entropic and Bayesian methods into a single general inference scheme. The ME method goes beyond the mere selection of a single...

On The Relationship between Bayesian and Maximum Entropy Inference

AIP Conference Proceedings, 2004

We investigate Bayesian and Maximum Entropy methods for doing inference under uncertainty. This investigation is primarily through concrete examples that have been previously investigated in the literature. We find that it is possible to do Bayesian and MaxEnt inference using the same information, despite claims to the contrary, and that they lead to different results. We find that these differences are due to the Bayesian inference not assuming anything beyond the given prior probabilities and the data, whereas MaxEnt implicitly makes strong independence assumptions, and assumes that the given constraints are the only ones operating. We also show that maximum likelihood and maximum a posteriori estimators give different and misleading estimates in our examples compared to posterior mean estimates. We generalize the classic method of maximum entropy inference to allow for uncertainty in the constraint values. This generalized MaxEnt (GME) makes MaxEnt inference applicable to a much wider range of problems, and makes direct comparison between Bayesian and MaxEnt inference possible. Also, we show that MaxEnt is a generalized principle of independence, and this property is what makes it the preferred inference method in many cases.

Generalising the Maximum Entropy Inference Process to the Aggregation of Probabilistic Beliefs

1 2 This formulation ensures that linear constraint conditions such as w(θ) = a , w(φ | ψ) = b , and w(ψ | θ) ≤ c , where a, b, c ∈ [0, 1] and θ , φ , and ψ are Boolean combinations of the α j 's, are all permissible in K provided that the resulting constraint set K is consistent. Here a conditional constraint such as w(ψ | θ) ≤ c is interpreted as w(ψ ∧ θ) ≤ c w(θ) which is always a well-defined linear constraint, albeit vacuous when w(θ) = 0 .. See e.g.

Objective Bayesianism and the Maximum Entropy Principle

Entropy, 2013

Objective Bayesian epistemology invokes three norms: the strengths of our beliefs should be probabilities; they should be calibrated to our evidence of physical probabilities; and they should otherwise equivocate sufficiently between the basic propositions that we can express. The three norms are sometimes explicated by appealing to the maximum entropy principle, which says that a belief function should be a probability function, from all those that are calibrated to evidence, that has maximum entropy. However, the three norms of objective Bayesianism are usually justified in different ways. In this paper, we show that the three norms can all be subsumed under a single justification in terms of minimising worst-case expected loss. This, in turn, is equivalent to maximising a generalised notion of entropy. We suggest that requiring language invariance, in addition to minimising worst-case expected loss, motivates maximisation of standard entropy as opposed to maximisation of other instances of generalised entropy. Our argument also provides a qualified justification for updating degrees of belief by Bayesian conditionalisation. However, conditional probabilities play a less central part in the objective Bayesian account than they do under the subjective view of Bayesianism, leading to a reduced role for Bayes' Theorem.

Maximum entropy and conditional probability

IEEE Transactions on Information Theory, 1981

It is well-known that maximum entropy distributions, subject to appropriate moment constraints, arise in physics and mathematics. In an attempt to find a physical reason for the appearance of maximum entropy distributions, the following theorem is offered. The conditional distribution of X, given the empirical observation (1 /n)X:, ,/I (X,) = (Y, where X, , X2,. are independent identically distributed random variables with common density g converges to fA(x) = e A'h(x)g(x) (suitably normalized), where X is chosen to satisfy jfx(x) h(x) dx = o. Thus the conditional distribution of a given random variable X is the (normalized) product of the maximum entropy distribution and the initial distribution. This distribution is the maximum entropy distribution when g is uniform. The proof of this and related results relies heavily on the work of Zabell and Lanford. IEEE TRANSACTIONS ON INFORMATION THEORY,

Entropy and Inference, Revisited

Advances in Neural Information Processing Systems 14, 2002

We study properties of popular near-uniform (Dirichlet) priors for learning undersampled probability distributions on discrete nonmetric spaces and show that they lead to disastrous results. However, an Occam-style phase space argument expands the priors into their infinite mixture and resolves most of the observed problems. This leads to a surprisingly good estimator of entropies of discrete distributions.

Topics in Bayesian statistics and maximum entropy

1998

Notions of Bayesian decision theory and maximum entropy methods are reviewed with particular emphasis on probabilistic inference and Bayesian modeling, The axiomatic approach is considered as the best justification of Bayesian analysis and maximum entropy principle applied in natural sciences. Particular emphasis is put on solving the inverse problem in digital image restoration and Bayesian modeling of neural networks. Further topics addressed briefly include language modeling, neutron scattering, multiuser detection and channel equalization in digital communications, genetic information, and Bayesian court decision-making.