Power-expected-posterior priors for variable selection in Gaussian linear models (original) (raw)
Imaginary training samples are often used in Bayesian statistics to develop prior distributions, with appealing interpretations, for use in model comparison. Expected-posterior priors are defined via imaginary training samples coming from a common underlying predictive distribution m * , using an initial baseline prior distribution. These priors can have subjective and also default Bayesian implementations, based on different choices of m * and of the baseline prior. One of the main advantages of the expected-posterior priors is that impropriety of baseline priors causes no indeterminacy of Bayes factors; but at the same time they strongly depend on the selection and the size of the training sample. Here we combine ideas from the power-prior and the unitinformation prior methodologies to greatly diminish the effect of training samples on a Bayesian variable-selection problem using the expected-posterior prior approach: we raise the likelihood involved in the expected-posterior prior distribution to a power that produces a prior information content equivalent to one data point. The result is that in practice our power-expected-posterior (PEP) methodology is sufficiently insensitive to the size n * of the training sample that one may take n * equal to the full-data sample size and dispense with training samples altogether; this promotes stability of the resulting Bayes factors, removes the arbitrariness arising from individual training-sample selections, and greatly increases computational speed, allowing many more models to be compared within a fixed CPU budget. Here we focus on Gaussian linear models and develop our method under two different baseline prior choices: the independence Jeffreys prior and the Zellner g-prior. The method's performance is compared, in simulation studies and a real example involving prediction of air-pollutant concentrations from meteorological covariates, with a variety of previously-defined variants on Bayes factors for variable selection. We find that the variable-selection procedure using our PEP prior (1) is systematically more parsimonious than the original expected-posterior prior with minimal training sample, while sacrificing no desirable performance characteristics to achieve this parsimony; (2) is robust to the size of the training sample, thus enjoying the advantages described above arising from the avoidance of training samples altogether; and (3) identifies maximum-a-posteriori models that achieve good out-of-sample predictive performance.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.