On the Sample Information About Parameter and Prediction (original) (raw)

ON DUAL EXPRESSION OF PRIOR INFORMATION IN BAYESIAN PARAMETER ESTIMATION

In Bayesian parameter estimation, a priori information can be used to shape the prior density of unknown parameters of the model. When chosen in a conjugate, selfreproducing form, the prior density of parameters is nothing but a model-based transform of a certain "prior" density of observed data. This observation suggests two possible ways of expressing a priori knowledge-in terms of parameters of a particular model and in terms of data entering the model. The latter way turns out useful when dealing with statistical models whose parameters lack a direct physical interpretation. In practice, the amount of a priori information is usually not sufficient for complete specification of the prior density of data. The paper shows an information-based way of converting such incomplete information into the prior density of unknown parameters.

Statistical Problem Classes and Their Links to Information Theory

Econometric Reviews, 2014

ABSTRACT We begin by recalling the tripartite division of statistical problems into three classes, M-closed, M-complete, and M-open and then reviewing the key ideas of introductory Shannon theory. Focusing on the related but distinct goals of model selection and prediction, we argue that different techniques for these two goals are appropriate for the three different problem classes. For M-closed problems we give relative entropy justification that the Bayes information criterion (BIC) is appropriate for model selection and that the Bayes model average is information optimal for prediction. For M-complete problems, we discuss the principle of maximum entropy and a way to use the rate distortion function to bypass the inaccessibility of the true distribution. For prediction in the M-complete class, there is little work done on information based model averaging so we discuss the Akaike information criterion (AIC) and its properties and variants.For the M-open class, we argue that essentially only predictive criteria are suitable. Thus, as an analog to model selection, we present the key ideas of prediction along a string under a codelength criterion and propose a general form of this criterion. Since little work appears to have been done on information methods for general prediction in the M-open class of problems, we mention the field of information theoretic learning in certain general function spaces.

Bayes Estimate and Inference for Entropy and Information Index of Fit

Econometric Reviews, 2008

Kullback-Leibler information is widely used for developing indices of distributional fit. The most celebrated of such indices is Akaike's AIC, which is derived as an estimate of the minimum Kullback-Leibler information between the unknown data-generating distribution and a parametric model. In the derivation of AIC, the entropy of the data-generating distribution is bypassed because it is free from the parameters. Consequently, the AIC type measures provide criteria for model comparison purposes only, and do not provide information diagnostic about the model fit. A nonparametric estimate of entropy of the data-generating distribution is needed for assessing the model fit. Several entropy estimates are available and have been used for frequentist inference about information fit indices. A few entropy-based fit indices have been suggested for Bayesian inference. This paper develops a class of entropy estimates and provides a procedure for Bayesian inference on the entropy and a fit index. For the continuous case, we define a quantized entropy that approximates and converges to the entropy integral. The quantized entropy includes some well known measures of sample entropy and the existing Bayes entropy estimates as its special cases. For inference about the fit, we use the candidate model as the expected distribution in the Dirichlet process prior and derive the posterior mean of the quantized entropy as the Bayes estimate. The maximum entropy characterization of the candidate model is then used to derive the prior and posterior distributions for the Kullback-Leibler information index of fit. The consistency of the proposed Bayes estimates for the entropy and for the information index are shown. As by-products, the procedure also produces priors and posteriors for the model parameters and the moments.

Information optimality and Bayesian modelling

Journal of Econometrics, 2007

This article was originally published in a journal published by Elsevier, and the attached copy is provided by Elsevier for the author's benefit and for the benefit of the author's institution, for non-commercial research and educational use including without limitation use in instruction at your institution, sending it to specific colleagues that you know, and providing a copy to your institution's administrator.

Shannon's Entropy and Its Generalisations Towards Statistical Inference in Last Seven Decades

International Statistical Review, 2020

SummaryStarting from the pioneering works of Shannon and Weiner in 1948, a plethora of works have been reported on entropy in different directions. Entropy‐related review work in the direction of statistical inference, to the best of our knowledge, has not been reported so far. Here, we have tried to collect all possible works in this direction during the last seven decades so that people interested in entropy, specially the new researchers, get benefited.

On the Small Sample Size Behavior of Bayesian and Information-Theoretic Approaches for Predictive Inference

1998

In this work we focus on discrete prediction problems for a decision-theoretic setting, where the task is to compute the predictive distribution for a nite set of possible alternatives. This question is rst addressed within a general framework, where we consider a set of probability distributions de ned by some parametric model class. The fully Bayesian predictive distribution is obtained by integrating over all the individual parameter instantiations. As an alternative to this standard Bayesian formulation, we consider two information-theoretically motivated approaches: the Minimum Description Length (MDL) principle by Rissanen, and the Minimum Message Length (MML) principle by Wallace et al. For implementing the MDL approach, we use an estimator based on Rissanen's recent new de nition of stochastic complexity, which can be shown to asymptotically approach the fully Bayesian predictive distribution with Je rey's prior. For the MML approach, we use a novel volumewise optimal MML estimator, which is closely related to Rissanen's MDL estimator. However, as the similarities between the resulting predictive distributions are all asymptotic in nature, it is not a priori clear how the di erent methods behave in practice with small sample sizes. In order to be able to study this question empirically, we realize each of the predictive distributions for a model family of practical relevance, the family of Bayesian networks. In the experimental part of the paper, we compare the predictive accuracy of the di erent approaches by using arti cially generated data sets. This experimental setup allows us to examine the behavior of the methods not only as a function of the sample size, but as a function of the plausibility of the assumptions made about the problem domain.

Information theoretic methods in parameter estimation

2013

In the present communication entropy optimization principles namely maximum entropy principle and minimum cross entropy principle are defined and a critical approach of parameter estimation methods using entropy optimization methods is described in brief. Maximum entropy principle and its applications in deriving other known methods in parameter estimation are discussed. The relation between maximum likelihood estimation and maximum entropy principle has been derived. The relation between minimum divergence information principle and other classical method minimum Chi-square is studied. A comparative study of Fisher’s measure of information and minimum divergence measure is made. Equivalence of classical parameter estimation methods and information theoretic methods is studied. An application for estimation of parameter estimation when interval proportions are given is discussed with a numerical example. Key words: Parameter estimation, maximum entropy principle, minimum divergence...

Topics in Bayesian statistics and maximum entropy

1998

Notions of Bayesian decision theory and maximum entropy methods are reviewed with particular emphasis on probabilistic inference and Bayesian modeling, The axiomatic approach is considered as the best justification of Bayesian analysis and maximum entropy principle applied in natural sciences. Particular emphasis is put on solving the inverse problem in digital image restoration and Bayesian modeling of neural networks. Further topics addressed briefly include language modeling, neutron scattering, multiuser detection and channel equalization in digital communications, genetic information, and Bayesian court decision-making.

Some Information Theoretic Ideas Useful in Statistical Inference

Methodology and Computing in Applied Probability, 2007

In this paper we discuss four information theoretic ideas and present their implications to statistical inference: (1) Fisher information and divergence generating functions, (2) information optimum unbiased estimators, (3) information content of various statistics, (4) characterizations based on Fisher information.