A Tutorial on Fisher information (original) (raw)

Some Information Theoretic Ideas Useful in Statistical Inference

Methodology and Computing in Applied Probability, 2007

In this paper we discuss four information theoretic ideas and present their implications to statistical inference: (1) Fisher information and divergence generating functions, (2) information optimum unbiased estimators, (3) information content of various statistics, (4) characterizations based on Fisher information.

Review of the Book Titled "Information and Complexity in Statistical Modeling" by Jorma Rissanen, Published by Springer, N. Y., 2007

Electronic Journal of Applied Statistical Analysis, 2011

No statistical model is right or wrong, true or false in a strict sense. We only evaluate and compare their contributions. Based on this theme, Jorma Rissanen has written a short but beautiful book titled "Information and Complexity in Statistical Modeling" (Springer, 2007), where modeling is done primarily by extracting the information from the data that can be learned with suggested classes of probability models. The note reviews this book and on the way rediscovers the chain information-knowledge-wisdom.

On the Measure of the Information in a Statistical Experiment

2008

Abstract. Setting aside experimental costs, the choice of an experiment is usually formulated in terms of the maximization of a measure of information, often presented as an optimality design criterion. However, there does not seem to be a universal agreement on what objects can qualify as a valid measure of the information in an experiment. In this article we explicitly state a minimal set of requirements that must be satisfied by all such measures. Under that framework, the measure of the information in an experiment is equivalent to the measure of the variability of its likelihood ratio statistics or which is the same, it is equivalent to the measure of the variability of its posterior to prior ratio statistics and to the measure of the variability of the distribution of the posterior distributions yielded by it. The larger that variability, the more peaked the likelihood functions and posterior distributions that tend to be yielded by the experiment, and the more informative the...

From evidence to understanding: a commentary on Fisher (1922) 'On the mathematical foundations of theoretical statistics

Philosophical transactions. Series A, Mathematical, physical, and engineering sciences, 2015

The nature of statistics has changed over time. It was originally concerned with descriptive 'matters of state'-with summarizing population numbers, economic strength and social conditions. But during the course of the twentieth century its aim broadened to include inference-how to use data to shed light on underlying mechanisms, about what might happen in the future, about what would happen if certain actions were taken. Central to this development was Ronald Fisher. Over the course of his life he was responsible for many of the major conceptual advances in statistics. This is particularly illustrated by his 1922 paper, in which he introduced many of the concepts which remain fundamental to our understanding of how to extract meaning from data, right to the present day. It is no exaggeration to say that Fisher's work, as illustrated by the ideas he described and developed in this paper, underlies all modern science, and much more besides. This commentary was written to ...

Vanishing Fisher Information

2003

There are consistently estimable parameters of interest whose semiparametric Fisher information vanishes at some points of the model in question. Here we investigate how bad this is for estimation.

Bayesian hypothesis testing for psychologists: A tutorial on the Savage–Dickey method

Cognitive Psychology, 2010

In the field of cognitive psychology, the p-value hypothesis test has established a stranglehold on statistical reporting. This is unfortunate, as the p-value provides at best a rough estimate of the evidence that the data provide for the presence of an experimental effect. An alternative and arguably more appropriate measure of evidence is conveyed by a Bayesian hypothesis test, which prefers the model with the highest average likelihood. One of the main problems with this Bayesian hypothesis test, however, is that it often requires relatively sophisticated numerical methods for its computation. Here we draw attention to the Savage-Dickey density ratio method, a method that can be used to compute the result of a Bayesian hypothesis test for nested models and under certain plausible restrictions on the parameter priors. Practical examples demonstrate the method's validity, generality, and flexibility.

Bayes not Bust! Why Simplicity is no Problem for Bayesians

The British Journal for the Philosophy of Science, 2007

The advent of formal definitions of the simplicity of a theory has important implications for model selection. But what is the best way to define simplicity? Forster and Sober ([1994]) advocate the use of Akaike's Information Criterion (AIC), a non-Bayesian formalisation of the notion of simplicity. This forms an important part of their wider attack on Bayesianism in the philosophy of science. We defend a Bayesian alternative: the simplicity of a theory is to be characterised in terms of Wallace's Minimum Message Length (MML). We show that AIC is inadequate for many statistical problems where MML performs well. Whereas MML is always defined, AIC can be undefined. Whereas MML is not known ever to be statistically inconsistent, AIC can be. Even when defined and consistent, AIC performs worse than MML on small sample sizes. MML is statistically invariant under 1-to-1 re-parametrisation, thus avoiding a common criticism of Bayesian approaches. We also show that MML provides answers to many of Forster's objections to Bayesianism. Hence an important part of the attack on Bayesianism fails. 5 The Minimum Message Length (MML) Principle 5.1 The Strict MML estimator 5.2 An example: The binomial distribution 5.3 Properties of the SMML estimator 5.3.1 Bayesianism 5.3.2 Language invariance 5.3.3 Generality 5.3.4 Consistency and efficiency 5.4 Similarity to false oracles 5.5 Approximations to SMML 6 Criticisms of AIC 6.1 Problems with ML 6.1.1 Small sample bias in a Gaussian distribution 6.1.2 The von Mises circular and von Mises-Fisher spherical distributions 6.1.3 The Neyman-Scott problem 6.1.4 Neyman-Scott, predictive accuracy and minimum expected KL distance 6.2 Other problems with AIC 6.2.1 Univariate polynomial regression 6.2.2 Autoregressive econometric time series 6.2.3 Multivariate second-order polynomial model selection 6.2.4 Gap or no gap: a clustering-like problem for AIC 6.3 Conclusions from the comparison of MML and AIC 7 Meeting Forster's objections to Bayesianism 7.1 The sub-family problem 7.2 The problem of approximation, or, which framework for statistics? 8 Conclusion A Details of the derivation of the Strict MML estimator B MML, AIC and the Gap vs. No Gap Problem B.1 Expected size of the largest gap B.2 Performance of AIC on the gap vs. no gap problem B.3 Performance of MML in the gap vs. no gap problem

Replicability, confidence, and priors

2005

All commentaries on p-rep in this issue of Psychological Science concern priors. Cumming graphically demonstrates the implications of our ignorance of d. Doros and Geier improve my argument with a Bayesian account. Macdonald notes that my program is like Fisher's, Fisher's is like the Bayesians', and the Bayesians' is incoherent. These commentaries strengthen the foundation of this predictive inferential technique while leaving all conclusions intact.

Fisher information distance: A geometrical reading

Discrete Applied Mathematics, 2014

This paper is a strongly geometrical approach to the Fisher distance, which is a measure of dissimilarity between two probability distribution functions. The Fisher distance, as well as other divergence measures, are also used in many applications to establish a proper data average. The main purpose is to widen the range of possible interpretations and relations of the Fisher distance and its associated geometry for the prospective applications. It focuses on statistical models of the normal probability distribution functions and takes advantage of the connection with the classical hyperbolic geometry to derive closed forms for the Fisher distance in several cases. Connections with the well-known Kullback-Leibler divergence measure are also devised.