Variance Prior Forms for High-Dimensional Bayesian Variable Selection (original) (raw)

On variance estimation for Bayesian variable selection

arXiv (Cornell University), 2018

Consider the problem of high dimensional variable selection for the Gaussian linear model when the unknown error variance is also of interest. In this paper, we show that the use of conjugate shrinkage priors for Bayesian variable selection can have detrimental consequences for such variance estimation. Such priors are often motivated by the invariance argument of Jeffreys (1961). Revisiting this work, however, we highlight a caveat that Jeffreys himself noticed; namely that biased estimators can result from inducing dependence between parameters a priori. In a similar way, we show that conjugate priors for linear regression, which induce prior dependence, can lead to such underestimation in the Bayesian high-dimensional regression setting. Following Jeffreys, we recommend as a remedy to treat regression coefficients and the error variance as independent a priori. Using such an independence prior framework, we extend the Spike-and-Slab Lasso of Ročková and George (2018) to the unknown variance case. This extended procedure outperforms both the fixed variance approach and alternative penalized likelihood methods on simulated data. On the protein activity dataset of Clyde and Parmigiani (1998), the Spike-and-Slab Lasso with unknown variance achieves lower cross-validation error than alternative penalized likelihood methods, demonstrating the gains in predictive accuracy afforded by simultaneous error variance estimation.

On Bayesian Lasso Variable Selection and the Specification of the Shrinkage Parameter

stat-athens.aueb.gr

In this work, we propose a Bayesian implementation of the Lasso regression that accomplishes both shrinkage and variable selection. We focus on the appropriate specification for the shrinkage parameter λ through Bayes factors that evaluate the inclusion of each covariate in the model formulation. We associate this parameter with the values of Pearson and partial correlation at the limits between significance and insignificance as defined by Bayes factors. By this way, a meaningful interpretation of λ is achieved that leads to a simple specification of this parameter which is of prominent importance in Lasso literature.

BayesSUR: An R Package for High-Dimensional Multivariate Bayesian Variable and Covariance Selection in Linear Regression

Journal of Statistical Software

In molecular biology, advances in high-throughput technologies have made it possible to study complex multivariate phenotypes and their simultaneous associations with highdimensional genomic and other omics data, a problem that can be studied with highdimensional multi-response regression, where the response variables are potentially highly correlated. To this purpose, we recently introduced several multivariate Bayesian variable and covariance selection models, e.g., Bayesian estimation methods for sparse seemingly unrelated regression for variable and covariance selection. Several variable selection priors have been implemented in this context, in particular the hotspot detection prior for latent variable inclusion indicators, which results in sparse variable selection for associations between predictors and multiple phenotypes. We also propose an alternative, which uses a Markov random field (MRF) prior for incorporating prior knowledge about the dependence structure of the inclusion indicators. Inference of Bayesian seemingly unrelated regression (SUR) by Markov chain Monte Carlo methods is made computationally feasible by factorisation of the covariance matrix amongst the response variables. In this paper we present BayesSUR, an R package, which allows the user to easily specify and run a range of different Bayesian SUR models, which have been implemented in C++ for computational efficiency. The R package allows the specification of the models in a modular way, where the user chooses the priors for variable selection and for covariance selection separately. We demonstrate the performance of sparse SUR models with the hotspot prior and spike-and-slab MRF prior on synthetic and real data sets representing eQTL or mQTL studies and in vitro anti-cancer drug screening studies as examples for typical applications.

Consistent High-Dimensional Bayesian Variable Selection via Penalized Credible Regions

Journal of the American Statistical Association, 2012

For high-dimensional data, particularly when the number of predictors greatly exceeds the sample size, selection of relevant predictors for regression is a challenging problem. Methods such as sure screening, forward selection, or penalized regressions are commonly used. Bayesian variable selection methods place prior distributions on the parameters along with a prior over model space, or equivalently, a mixture prior on the parameters having mass at zero. Since exhaustive enumeration is not feasible, posterior model probabilities are often obtained via long MCMC runs. The chosen model can depend heavily on various choices for priors and also posterior thresholds. Alternatively, we propose a conjugate prior only on the full model parameters and use sparse solutions within posterior credible regions to perform selection. These posterior credible regions often have closed-form representations, and it is shown that these sparse solutions can be computed via existing algorithms. The approach is shown to outperform common methods in the high-dimensional setting, particularly under correlation. By searching for a sparse solution within a joint credible region, consistent model selection is established. Furthermore, it is shown that, under certain conditions, the use of marginal credible intervals can give consistent selection up to the case where the dimension grows exponentially in the sample size. The proposed approach successfully accomplishes variable selection in the high-dimensional setting, while avoiding pitfalls that plague typical Bayesian variable selection methods.

Model Based Screening Embedded Bayesian Variable Selection for Ultra-high Dimensional Settings

Journal of Computational and Graphical Statistics

We develop a Bayesian variable selection method, called SVEN, based on a hierarchical Gaussian linear model with priors placed on the regression coefficients as well as on the model space. Sparsity is achieved by using degenerate spike priors on inactive variables, whereas Gaussian slab priors are placed on the coefficients for the important predictors making the posterior probability of a model available in explicit form (up to a normalizing constant). The strong model selection consistency is shown to be attained when the number of predictors grows nearly exponentially with the sample size and even when the norm of mean effects solely due to the unimportant variables diverge, which is a novel attractive feature. An appealing byproduct of SVEN is the construction of novel model weight adjusted prediction intervals. Embedding a unique model based screening and using fast Cholesky updates, SVEN produces a highly scalable computational framework to explore gigantic model spaces, rapidly identify the regions of high posterior probabilities and make fast inference and prediction. A temperature schedule guided by our model selection consistency derivations is used to further mitigate multimodal posterior distributions. The performance of SVEN is demonstrated through a number of simulation experiments and a real data example from a genome wide association study with over half a million markers.

Penalized regression, standard errors, and Bayesian lassos

Bayesian Analysis, 2010

Penalized regression methods for simultaneous variable selection and coefficient estimation, especially those based on the lasso of , have received a great deal of attention in recent years, mostly through frequentist models. Properties such as consistency have been studied, and are achieved by different lasso variations. Here we look at a fully Bayesian formulation of the problem, which is flexible enough to encompass most versions of the lasso that have been previously considered. The advantages of the hierarchical Bayesian formulations are many. In addition to the usual ease-of-interpretation of hierarchical models, the Bayesian formulation produces valid standard errors (which can be problematic for the frequentist lasso), and is based on a geometrically ergodic Markov chain. We compare the performance of the Bayesian lassos to their frequentist counterparts using simulations, data sets that previous lasso papers have used, and a difficult modeling problem for predicting the collapse of governments around the world. In terms of prediction mean squared error, the Bayesian lasso performance is similar to and, in some cases, better than, the frequentist lasso.

Nearly optimal Bayesian Shrinkage for High Dimensional Regression

arXiv: Statistics Theory, 2017

During the past decade, shrinkage priors have received much attention in Bayesian analysis of high-dimensional data. In this paper, we study the problem for high-dimensional linear regression models. We show that if the shrinkage prior has a heavy and flat tail, and allocates a sufficiently large probability mass in a very small neighborhood of zero, then its posterior properties are as good as those of the spike-and-slab prior. While enjoying its efficiency in Bayesian computation, the shrinkage prior can lead to a nearly optimal contraction rate and selection consistency as the spike-and-slab prior. Our numerical results show that under posterior consistency, Bayesian methods can yield much better results in variable selection than the regularization methods, such as Lasso and SCAD. We also establish a Bernstein von-Mises type results comparable to Castillo et al (2015), this result leads to a convenient way to quantify uncertainties of the regression coefficient estimates, which ...

Dirichlet Lasso: A Bayesian approach to variable selection

Sage

Selection of the most important predictor variables in regression analysis is one of the key problems statistical research has been concerned with for long time. In this article, we propose the methodology, Dirichlet Lasso (abbreviated as DLASSO) to address this issue in a Bayesian framework. In many modern regression settings, large set of predictor variables are grouped and the coefficients belonging to any one of these groups are either all redundant or all important in predicting the response; we say in those cases that the predictors exhibit a group structure. We show that DLASSO is particularly useful where the group structure is not fully known. We exploit the clustering property of Dirichlet Process priors to infer the possibly missing group information. The Dirichlet Process has the advantage of simultaneously clustering the variable coefficients and selecting the best set of predictor variables. We compare the predictive performance of DLASSO to Group Lasso and ordinary Lasso with real data and simulation studies. Our results demonstrate that the predictive performance of DLASSO is almost as good as that of Group Lasso when group label information is given; and superior to the ordinary Lasso for missing group information. For high dimensional data (e.g., genetic data) with missing group information, DLASSO will be a powerful approach of variable selection since it provides a superior predictive performance and higher statistical accuracy.

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

2012

Inspired by analysis of genomic data, the primary quest is to identify associations between studied traits and genetic markers where number of markers is typically much larger than sample size. Bayesian variable selection methods with Markov chain Monte Carlo (MCMC) are extensively applied to analyze such high-dimensional data. However, MCMC is often slow to converge with large number of candidate predictors. In this study, we examine the empirical Bayes variable selection with a sparse prior on the unknown coefficients. An iterated conditional modes/medians (ICM/M) algorithm is proposed for implementation by iteratively minimizing a conditional loss function in high-dimensional linear regression model. Attention is then directed to extend the algorithm to a generalized linear model. The performances of our approach are evaluated through simulation study.