Bayesian variable selection in linear regression models with instrumental variables (original) (raw)

Heterogeneous Variable Selection in Nonlinear Panel Data Models: A Semi-Parametric Bayesian Approach

SSRN Electronic Journal

In this paper, we develop a general method for heterogeneous variable selection in Bayesian nonlinear panel data models. Heterogeneous variable selection refers to the possibility that subsets of units are unaffected by certain variables. It may be present in applications as diverse as health treatments, consumer choice-making, macroeconomics, and operations research. Our method additionally allows for other forms of cross-sectional heterogeneity. We consider a two-group approach for the model's unitspecific parameters: each unit-specific parameter is either equal to zero (heterogeneous variable selection) or comes from a Dirichlet process (DP) mixture of multivariate normals (other cross-sectional heterogeneity). We develop our approach for general nonlinear panel data models, encompassing multinomial logit and probit models, poisson and negative binomial count models, exponential models, among many others. For inference, we develop an efficient Bayesian MCMC sampler. In a Monte Carlo study, we find that our approach is able to capture heterogeneous variable selection whereas a "standard" DP mixture is not. In an empirical application, we find that accounting for heterogeneous variable selection and non-normality of the continuous heterogeneity leads to an improved in-sample and out-of-sample performance and interesting insights. These findings illustrate the usefulness of our approach.

Bayesian model averaging in the instrumental variable regression model

Journal of Econometrics, 2012

This paper considers the instrumental variable regression model when there is uncertainty about the set of instruments, exogeneity restrictions, the validity of identifying restrictions and the set of exogenous regressors. This uncertainty can result in a huge number of models. To avoid statistical problems associated with standard model selection procedures, we develop a reversible jump Markov chain Monte Carlo algorithm that allows us to do Bayesian model averaging. The algorithm is very ‡exible and can be easily adapted to analyze any of the di¤erent priors that have been proposed in the Bayesian instrumental variables literature. We show how to calculate the probability of any relevant restriction (e.g. the posterior probability that over-identifying restrictions hold) and discuss diagnostic checking using the posterior distribution of discrepancy vectors. We illustrate our methods in a returns-to-schooling application.

Estimation of semiparametric models in the presence of endogeneity and sample selection

Journal of Computational and Graphical …, 2009

We analyze a semiparametric model for data that suffer from the problems of incidental truncation, where some of the data are observed for only part of the sample with a probability that depends on a selection equation, and of endogeneity, where a covariate is correlated with the disturbance term. The introduction of nonparametric functions in the model permits significant flexibility in the way covariates affect response variables. We present a Bayesian method for the analysis of such models that allows us to consider general systems of outcome variables and endogenous regressors that are continuous, binary, censored, or ordered.

On consistency of Bayesian variable selection procedures

2012

In this paper we extend the pairwise consistency of the Bayesian procedure to the entire class of linear models when the number of regressors grows as thesample size grows, and it is seen that for establishing consistency both the prior overthe model parameters and the prior over the models play now an important role. Wewill show that commonly used Bayesian procedures with non–fully Bayes priors formodels and for model parameters are inconsistent, and that fully Bayes versions ofthese priors correct this undesirable behavior.

Consistency of Bayesian procedures for variable selection

2009

It has long been known that for the comparison of pairwise nested models, a decision based on the Bayes factor produces a consistent model selector (in the frequentist sense). Here we go beyond the usual consistency for nested pairwise models, and show that for a wide class of prior distributions, including intrinsic priors, the corresponding Bayesian procedure for variable selection in normal regression is consistent in the entire class of normal linear models. We find that the asymptotics of the Bayes factors for intrinsic priors are equivalent to those of the Schwarz (BIC) criterion. Also, recall that the Jeffreys--Lindley paradox refers to the well-known fact that a point null hypothesis on the normal mean parameter is always accepted when the variance of the conjugate prior goes to infinity. This implies that some limiting forms of proper prior distributions are not necessarily suitable for testing problems. Intrinsic priors are limits of proper prior distributions, and for finite sample sizes they have been proved to behave extremely well for variable selection in regression; a consequence of our results is that for intrinsic priors Lindley's paradox does not arise.

A novel Bayesian approach for variable selection in linear regression models

Computational Statistics & Data Analysis

We propose a novel Bayesian approach to the problem of variable selection in multiple linear regression models. In particular, we present a hierarchical setting which allows for direct specification of a-priori beliefs about the number of nonzero regression coefficients as well as a specification of beliefs that given coefficients are nonzero. To guarantee numerical stability, we adopt a g-prior with an additional ridge parameter for the unknown regression coefficients. In order to simulate from the joint posterior distribution an intelligent random walk Metropolis-Hastings algorithm which is able to switch between different models is proposed. Testing our algorithm on real and simulated data illustrates that it performs at least on par and often even better than other well-established methods. Finally, we prove that under some nominal assumptions, the presented approach is consistent in terms of model selection.

Posterior Model Consistency in Variable Selection as the Model Dimension Grows

Statistical Science, 2015

Most of the consistency analyses of Bayesian procedures for variable selection in regression refer to pairwise consistency, that is, consistency of Bayes factors. However, variable selection in regression is carried out in a given class of regression models where a natural variable selector is the posterior probability of the models. In this paper we analyze the consistency of the posterior model probabilities when the number of potential regressors grows as the sample size grows. The novelty in the posterior model consistency is that it depends not only on the priors for the model parameters through the Bayes factor, but also on the model priors, so that it is a useful tool for choosing priors for both models and model parameters. We have found that some classes of priors typically used in variable selection yield posterior model inconsistency, while mixtures of these priors improve this undesirable behavior. For moderate sample sizes, we evaluate Bayesian pairwise variable selection procedures by comparing their frequentist Type I and II error probabilities. This provides valuable information to discriminate between the priors for the model parameters commonly used for variable selection.

Comparison of Bayesian objective procedures for variable selection in linear regression

TEST, 2008

In the objective Bayesian approach to variable selection in regression a crucial point is the encompassing of the underlying nonnested linear models. Once the models have been encompassed one can define objective priors for the multiple testing problem involved in the variable selection problem. There are two natural ways of encompassing: one way is to encompass all models into the model containing all possible regressors, and the other one is to encompass the model containing the intercept only into any other. In this paper we compare the variable selection procedures that result from each of the two mentioned ways of encompassing by analysing their theoretical properties and their behavior in simulated and real data. Relations with frequentist criteria for model selection such as those based on the R 2 adj , and Mallows C p are provided incidentally.

Bayesian Variable Selection in Linear Regression and a Comparison

Hacettepe Journal of Mathematics and Statistics, 2001

In this study, Bayesian approaches, such as Zellner, Occam's Window and Gibbs sampling, have been compared in terms of selecting the correct subset for the variable selection in a linear regression model. The aim of this comparison is to analyze Bayesian variable selection and the behavior of classical criteria by taking into consideration the different values of β and σ and prior expected levels.

Variable Selection in Causal Inference Using Penalization

In the causal adjustment setting, variable selection techniques based on either the outcome or treatment allocation model can result in the omission of confounders or the inclusion of spurious variables in the propensity score. We propose a variable selection 10 method based on a penalized likelihood which considers the response and treatment assignment models simultaneously. We show that under some conditions our method attains the oracle property. The selected variables are used to form a double robust regression estimator of the treatment effect. Simulation results are presented and data from the National Supported Work Demonstration are analyzed. 15