A class of models for Bayesian predictive inference (original) (raw)

Bayesian Predictive Inference Without a Prior

Statistica Sinica, 2024

Let (Xn : n ≥ 1) be a sequence of random observations. Let σn(•) = P X n+1 ∈ • | X 1 ,. .. , Xn be the n-th predictive distribution and σ 0 (•) = P (X 1 ∈ •) the marginal distribution of X 1. In a Bayesian framework, to make predictions on (Xn), one only needs the collection σ = (σn : n ≥ 0). Because of the Ionescu-Tulcea theorem, σ can be assigned directly, without passing through the usual prior/posterior scheme. One main advantage is that no prior probability has to be selected. In this paper, σ is subjected to two requirements: (i) The resulting sequence (Xn) is conditionally identically distributed, in the sense of [4]; (ii) Each σ n+1 is a simple recursive update of σn. Various new σ satisfying (i)-(ii) are introduced and investigated. For such σ, the asymptotics of σn, as n → ∞, is determined. In some cases, the probability distribution of (Xn) is also evaluated.

A probabilistic view on predictive constructions for Bayesian learning

arXiv (Cornell University), 2022

Given a sequence X = (X 1 , X 2 ,. . .) of random observations, a Bayesian forecaster aims to predict X n+1 based on (X 1 ,. .. , Xn) for each n ≥ 0. To this end, in principle, she only needs to select a collection σ = (σ 0 , σ 1 ,. . .), called "strategy" in what follows, where σ 0 (•) = P (X 1 ∈ •) is the marginal distribution of X 1 and σn(•) = P (X n+1 ∈ • | X 1 ,. .. , Xn) the n-th predictive distribution. Because of the Ionescu-Tulcea theorem, σ can be assigned directly, without passing through the usual prior/posterior scheme. One main advantage is that no prior probability is to be selected. In a nutshell, this is the predictive approach to Bayesian learning. A concise review of the latter is provided in this paper. We try to put such an approach in the right framework, to make clear a few misunderstandings, and to provide a unifying view. Some recent results are discussed as well. In addition, some new strategies are introduced and the corresponding distribution of the data sequence X is determined. The strategies concern generalized Pólya urns, random change points, covariates and stationary sequences.

On predictive distributions and Bayesian networks

Statistics and …, 2000

Statistics and Computing (2000) 10, 39–54 On predictive distributions and Bayesian networks P. KONTKANEN,* P. MYLLYM ¨AKI,* T. SILANDER,* H. TIRRI* and P. GR ¨UNWALD † ... Keywords:Bayesian networks, predictive inference, MDL, MML, Jeffreys' prior 1. Introduction ...

Computing Bayesian predictive distributions: The K-square and K-prime distributions

The computation of two Bayesian predictive distributions which are discrete mixtures of incomplete beta functions is considered. The number of iterations can easily become large for these distributions and thus, the accuracy of the result can be questionable. Therefore, existing algorithms for that class of mixtures are improved by introducing round-off error calculation into the stopping rule. A further simple modification is proposed to deal with possible underflows that may prevent recurrence to work properly.

Predictive construction of priors in Bayesian nonparametrics

Brazilian Journal of Probability and Statistics, 2012

The characterization of models and priors through a predictive approach is a fundamental problem in Bayesian statistics. In the last decades, it has received renewed interest, as the basis of important developments in Bayesian nonparametrics and in machine learning. In this paper, we review classical and recent work based on the predictive approach in these areas. Our focus is on the predictive construction of priors for Bayesian nonparametric inference, for exchangeable and partially exchangeable sequences. Some results are revisited to shed light on theoretical connections among them.

Rate of convergence of predictive distributions for dependent data

Bernoulli, 2009

This paper deals with empirical processes of the type Cn(B) = √ n{µn(B) − P (Xn+1 ∈ B | X1, . . . , Xn)}, where (Xn) is a sequence of random variables and µn = (1/n) n i=1 δX i the empirical measure. Conditions for sup B |Cn(B)| to converge stably (in particular, in distribution) are given, where B ranges over a suitable class of measurable sets. These conditions apply when (Xn) is exchangeable or, more generally, conditionally identically distributed (in the sense of Berti et al. [Ann. Probab. 32 (2004) 2029-2052]). By such conditions, in some relevant situations, one obtains that sup B |Cn(B)| P → 0 or even that √ n sup B |Cn(B)| converges a.s. Results of this type are useful in Bayesian statistics.

Model-free objetive Bayesian prediction

1999

Probabilistc prediction of the value of a given observable quantity given a random sample of past observations of that quantity is a frequent problem in the sciences, but a problem which has not a commonly agreed solution. In this paper, Bayesian statistical methods and information theory are used to propose a new procedure which is model-free, in that no assumption is required about an underlying statistical model, and it is objective, in that a reference non-subjective prior distribution is used. The proposed method may be seen as a Bayesian analogue to conventional kernel density estimation, but one with an appropriate predictive behaviour not previously available. The procedure is illustrated with the analysis of some published astronomical data. RESUMEN Predicción Bayesiana objetiva con modelos probabilísticos desconocidos 0 i=\ ' Research partially funded with grant PB97-1403 of the DGICYT, Madrid, Spain. 20. West, M. and Harrison (1989). Bayesian Forecasting and Dynamic Models. New York: Springer. Second edition in 1997. This predictive interpretation, central to most scientific data analysis is not justifiable from a conventional kernel density estimation viewpoint.

Bayesian predictive inference under informative sampling and transformation

2006

We have considered the problem in which a biased sample is selected from a finite population, and this finite population itself is a random sample from an infinitely large population, called the superpopulation. The parameters of the superpopulation and the finite population are of interest. There is some information about the selection mechanism in that the selection probabilities are linearly related to the measurements. This is typical of establishment surveys where the selection probabilities are taken to be proportional to the previous year's characteristics. When all the selection probabilities are known, as in our problem, inference about the finite population can be made, but inference about the distribution is not so clear. For continuous measurements, one might assume that the the values are normally distributed, but as a practical issue normality can be tenuous. In such a situation a transformation to normality may be useful, but this transformation will destroy the linearity between the selection probabilities and the values. The purpose of this work is to address this issue. In this light we have constructed two models, an ignorable selection model and a nonignorable selection model. We use the Gibbs sampler and the sample importance re-sampling algorithm to fit the nonignorable selection model. We have emphasized estimation of the finite population parameters, although within this framework other quantities can be estimated easily. We have found that our nonignorable selection model can correct the bias due to unequal selection probabilities, and it provides improved precision over the estimates from the ignorable selection model. In addition, we have described the case in which all the selection probabilities are unknown. This is useful because many agencies (e.g., government) tend to hide these selection probabilities when public-used data are constructed. Also, we have given an extensive theoretical discussion on Poisson sampling, an underlying sampling scheme in our models especially useful in the case in which the selection probabilities are unknown. beloved wife, Jiong Chen, and son, Jin Jin, to whom I owe much love.

Application of a predictive distribution formula to Bayesian computation for incomplete data models

Statistics and Computing, 2005

We consider exact and approximate Bayesian computation in the presence of latent variables or missing data. Specifically we explore the application of a posterior predictive distribution formula derived in , which is a particular form of Laplace approximation, both as an importance and a proposal distribution. We show that this formula provides a stable importance function for use within poor man's data augmentation schemes and that it can also be used as a proposal distribution within a Metropolis-Hastings algorithm for models that are not analytically tractable. We illustrate both uses in the case of a censored regression model and a normal hierarchical model, with both normal and Student t distributed random effects. Although the predictive distribution formula is motivated by regular asymptotic theory, it is not necessary that the likelihood has a closed form or that it possesses a local maximum.