Entropic Priors for Short-Term Stochastic Process Classification (original) (raw)
Related papers
Maximum entropy and Bayesian data analysis: Entropic prior distributions
Physical Review E, 2004
The problem of assigning probability distributions which objectively reflect the prior information available about experiments is one of the major stumbling blocks in the use of Bayesian methods of data analysis. In this paper the method of Maximum (relative) Entropy (ME) is used to translate the information contained in the known form of the likelihood into a prior distribution for Bayesian inference. The argument is inspired and guided by intuition gained from the successful use of ME methods in statistical mechanics. For experiments that cannot be repeated the resulting "entropic prior" is formally identical with the Einstein fluctuation formula. For repeatable experiments, however, the expected value of the entropy of the likelihood turns out to be relevant information that must be included in the analysis. The important case of a Gaussian likelihood is treated in detail.
AIP Conference Proceedings, 2004
The method of Maximum (relative) Entropy (ME) is used to translate the information contained in the known form of the likelihood into a prior distribution for Bayesian inference. The argument is guided by intuition gained from the successful use of ME methods in statistical mechanics. For experiments that cannot be repeated the resulting "entropic prior" is formally identical with the Einstein fluctuation formula. For repeatable experiments, however, the expected value of the entropy of the likelihood turns out to be relevant information that must be included in the analysis. As an example the entropic prior for a Gaussian likelihood is calculated.
Objective priors from maximum entropy in data classification
Information Fusion, 2013
Lack of knowledge of the prior distribution in classification problems that operate on small data sets may make the application of Bayes’ rule questionable. Uniform or arbitrary priors may provide classification answers that, even in simple examples, may end up contradicting our common sense about the problem. Entropic priors (EPs), via application of the maximum entropy (ME) principle, seem to provide good objective answers in practical cases leading to more conservative Bayesian inferences. EP are derived and applied to classification tasks when only the likelihood functions are available. In this paper, when inference is based only on one sample, we review the use of the EP also in comparison to priors that are obtained from maximization of the mutual information between observations and classes. This last criterion coincides with the maximization of the KL divergence between posteriors and priors that for large sample sets leads to the well-known reference (or Bernardo’s) priors. Our comparison on single samples considers both approaches in prospective and clarifies differences and potentials. A combinatorial justification for EP, inspired by Wallis’ combinatorial argument for entropy definition, is also included. The application of the EP to sequences (multiple samples) that may be affected by excessive domination of the class with the maximum entropy is also considered with a solution that guarantees posterior consistency. An explicit iterative algorithm is proposed for EP determination solely from knowledge of the likelihood functions. Simulations that compare EP with uniform priors on short sequences are also included.
Data fusion with entropic priors
20th Workshop on Neural Networks (WIRN 2010), 2010
In classification problems, lack of knowledge of the prior distribution may make the application of Bayes' rule inadequate. Uniform or arbitrary priors may often provide classification answers that, even in simple examples, may end up contradicting our common sense about the problem. Entropic priors, via application of the maximum entropy principle, seem to provide a much better answer and can be easily derived and applied to classification tasks when no more than the likelihood funtions are available. In this paper we present an application example in which the use of the entropic priors is compared to the results of the application of Dempster-Shafer theory.
On the method of likelihood-induced priors
ArXiv, 2019
We demonstrate that the functional form of the likelihood contains a sufficient amount of information for constructing a prior for the unknown parameters. We develop a four-step algorithm by invoking the information entropy as the measure of uncertainty and show how the information gained from coarse-graining and resolving power of the likelihood can be used to construct the likelihood-induced priors. As a consequence, we show that if the data model density belongs to the exponential family, the likelihood-induced prior is the conjugate prior to the corresponding likelihood.
Bayesian Inference and Maximum Entropy Methods in Science and Engineering
Springer proceedings in mathematics & statistics, 2018
This book series features volumes composed of selected contributions from workshops and conferences in all areas of current research in mathematics and statistics, including operation research and optimization. In addition to an overall evaluation of the interest, scientific quality, and timeliness of each proposal at the hands of the publisher, individual contributions are all refereed to the high quality standards of leading journals in the field. Thus, this series provides the research community with well-edited, authoritative reports on developments in the most exciting areas of mathematical and statistical research today.
Entropy and Inference, Revisited
Advances in Neural Information Processing Systems 14, 2002
We study properties of popular near-uniform (Dirichlet) priors for learning undersampled probability distributions on discrete nonmetric spaces and show that they lead to disastrous results. However, an Occam-style phase space argument expands the priors into their infinite mixture and resolves most of the observed problems. This leads to a surprisingly good estimator of entropies of discrete distributions.
On The Relationship between Bayesian and Maximum Entropy Inference
AIP Conference Proceedings, 2004
We investigate Bayesian and Maximum Entropy methods for doing inference under uncertainty. This investigation is primarily through concrete examples that have been previously investigated in the literature. We find that it is possible to do Bayesian and MaxEnt inference using the same information, despite claims to the contrary, and that they lead to different results. We find that these differences are due to the Bayesian inference not assuming anything beyond the given prior probabilities and the data, whereas MaxEnt implicitly makes strong independence assumptions, and assumes that the given constraints are the only ones operating. We also show that maximum likelihood and maximum a posteriori estimators give different and misleading estimates in our examples compared to posterior mean estimates. We generalize the classic method of maximum entropy inference to allow for uncertainty in the constraint values. This generalized MaxEnt (GME) makes MaxEnt inference applicable to a much wider range of problems, and makes direct comparison between Bayesian and MaxEnt inference possible. Also, we show that MaxEnt is a generalized principle of independence, and this property is what makes it the preferred inference method in many cases.
2008
Notions of Bayesian decision theory and maximum entropy methods are reviewed with particular emphasis on probabilistic inference and Bayesian modeling. The axiomatic approach is considered as the best justification of Bayesian analysis and maximum entropy principle applied in natural sciences. Solving the inverse problem in digital image restoration and Bayesian modeling of neural networks are discussed in detail. Further topics addressed briefly include language modeling, neutron scattering, multiuser detection and channel equalization in digital communications, genetic information, and Bayesian court decision-making.