Ioannis Ntzoufras | Athens University of Economics and Business (original) (raw)

Uploads

Papers by Ioannis Ntzoufras

Research paper thumbnail of Power-expected-posterior priors for variable selection in Gaussian linear models

Imaginary training samples are often used in Bayesian statistics to develop prior distributions, ... more Imaginary training samples are often used in Bayesian statistics to develop prior distributions, with appealing interpretations, for use in model comparison. Expected-posterior priors are defined via imaginary training samples coming from a common underlying predictive distribution m * , using an initial baseline prior distribution. These priors can have subjective and also default Bayesian implementations, based on different choices of m * and of the baseline prior. One of the main advantages of the expected-posterior priors is that impropriety of baseline priors causes no indeterminacy of Bayes factors; but at the same time they strongly depend on the selection and the size of the training sample. Here we combine ideas from the power-prior and the unitinformation prior methodologies to greatly diminish the effect of training samples on a Bayesian variable-selection problem using the expected-posterior prior approach: we raise the likelihood involved in the expected-posterior prior distribution to a power that produces a prior information content equivalent to one data point. The result is that in practice our power-expected-posterior (PEP) methodology is sufficiently insensitive to the size n * of the training sample that one may take n * equal to the full-data sample size and dispense with training samples altogether; this promotes stability of the resulting Bayes factors, removes the arbitrariness arising from individual training-sample selections, and greatly increases computational speed, allowing many more models to be compared within a fixed CPU budget. Here we focus on Gaussian linear models and develop our method under two different baseline prior choices: the independence Jeffreys prior and the Zellner g-prior. The method's performance is compared, in simulation studies and a real example involving prediction of air-pollutant concentrations from meteorological covariates, with a variety of previously-defined variants on Bayes factors for variable selection. We find that the variable-selection procedure using our PEP prior (1) is systematically more parsimonious than the original expected-posterior prior with minimal training sample, while sacrificing no desirable performance characteristics to achieve this parsimony; (2) is robust to the size of the training sample, thus enjoying the advantages described above arising from the avoidance of training samples altogether; and (3) identifies maximum-a-posteriori models that achieve good out-of-sample predictive performance.

Research paper thumbnail of Bivariate Poisson and Diagonal Inflated Bivariate Poisson Regression Models in R

In this paper we present an R package called bivpois for maximum likelihood estimation of the par... more In this paper we present an R package called bivpois for maximum likelihood estimation of the parameters of bivariate and diagonal inflated bivariate Poisson regression models. An Expectation-Maximization (EM) algorithm is implemented. Inflated models allow for modelling both over-dispersion (or under-dispersion) and negative correlation and thus they are appropriate for a wide range of applications. Extensions of the algorithms for several other models are also discussed. Detailed guidance and implementation on simulated and real data sets using bivpois package is provided.

Research paper thumbnail of High-dimensional variable selection via low-dimensional adaptive learning

arXiv: Computation, 2019

A stochastic search method, the so-called Adaptive Subspace (AdaSub) method, is proposed for vari... more A stochastic search method, the so-called Adaptive Subspace (AdaSub) method, is proposed for variable selection in high-dimensional linear regression models. The method aims at finding the best model with respect to a certain model selection criterion and is based on the idea of adaptively solving low-dimensional sub-problems in order to provide a solution to the original high-dimensional problem. Any of the usual ell_0\ell_0ell_0-type model selection criteria can be used, such as Akaike's Information Criterion (AIC), the Bayesian Information Criterion (BIC) or the Extended BIC (EBIC), with the last being particularly suitable for high-dimensional cases. The limiting properties of the new algorithm are analysed and it is shown that, under certain conditions, AdaSub converges to the best model according to the considered criterion. In a simulation study, the performance of AdaSub is investigated in comparison to alternative methods. The effectiveness of the proposed method is illustrated...

Research paper thumbnail of Bayesian Analysis of Marginal Log-Linear Graphical Models for Three Way Contingency Tables

arXiv: Methodology, 2008

This paper deals with the Bayesian analysis of graphical models of marginal independence for thre... more This paper deals with the Bayesian analysis of graphical models of marginal independence for three way contingency tables. We use a marginal log-linear parametrization, under which the model is defined through suitable zero-constraints on the interaction parameters calculated within marginal distributions. We undertake a comprehensive Bayesian analysis of these models, involving suitable choices of prior distributions, estimation, model determination, as well as the allied computational issues. The methodology is illustrated with reference to two real data sets.

Research paper thumbnail of A Metropolized adaptive subspace algorithm for high-dimensional Bayesian variable selection

A simple and efficient adaptive Markov Chain Monte Carlo (MCMC) method, called the Metropolized A... more A simple and efficient adaptive Markov Chain Monte Carlo (MCMC) method, called the Metropolized Adaptive Subspace (MAdaSub) algorithm, is proposed for sampling from high-dimensional posterior model distributions in Bayesian variable selection. The MAdaSub algorithm is based on an independent Metropolis-Hastings sampler, where the individual proposal probabilities of the explanatory variables are updated after each iteration using a form of Bayesian adaptive learning, in a way that they finally converge to the respective covariates’ posterior inclusion probabilities. We prove the ergodicity of the algorithm and present a parallel version of MAdaSub with an adaptation scheme for the proposal probabilities based on the combination of information from multiple chains. The effectiveness of the algorithm is demonstrated via various simulated and real data examples, including a high-dimensional problem with more than 20,000 covariates.

Research paper thumbnail of On the identifiability of Bayesian factor analytic models

A well known identifiability issue in factor analytic models is the invariance with respect to or... more A well known identifiability issue in factor analytic models is the invariance with respect to orthogonal transformations. This problem burdens the inference under a Bayesian setup, where Markov chain Monte Carlo (MCMC) methods are used to generate samples from the posterior distribution. We introduce a post-processing scheme in order to deal with rotation, sign and permutation invariance of the MCMC sample. The exact version of the contributed algorithm requires to solve 2q2^q2q assignment problems per (retained) MCMC iteration, where qqq denotes the number of factors of the fitted model. For large numbers of factors two approximate schemes based on simulated annealing are also discussed. We demonstrate that the proposed method leads to interpretable posterior distributions using synthetic and publicly available data from typical factor analytic models as well as mixtures of factor analyzers. An R package is available online at CRAN web-page.

Research paper thumbnail of Bayesian epidemic models for spatially aggregated count data

Statistics in medicine, Jan 12, 2017

Epidemic data often possess certain characteristics, such as the presence of many zeros, the spat... more Epidemic data often possess certain characteristics, such as the presence of many zeros, the spatial nature of the disease spread mechanism, environmental noise, serial correlation and dependence on time-varying factors. This paper addresses these issues via suitable Bayesian modelling. In doing so, we utilize a general class of stochastic regression models appropriate for spatio-temporal count data with an excess number of zeros. The developed regression framework does incorporate serial correlation and time-varying covariates through an Ornstein-Uhlenbeck process formulation. In addition, we explore the effect of different priors, including default options and variations of mixtures of g-priors. The effect of different distance kernels for the epidemic model component is investigated. We proceed by developing branching process-based methods for testing scenarios for disease control, thus linking traditional epidemiological models with stochastic epidemic processes, useful in polic...

Research paper thumbnail of Bayesian Model and Variable Evaluation

Research paper thumbnail of Power-Expected-Posterior Priors as Mixtures of g-Priors in Normal Linear Models

Bayesian Analysis

One of the main approaches used to construct prior distributions for objective Bayes methods is t... more One of the main approaches used to construct prior distributions for objective Bayes methods is the concept of random imaginary observations. Under this setup, the expected-posterior prior (EPP) offers several advantages, among which it has a nice and simple interpretation and provides an effective way to establish compatibility of priors among models. In this paper, we study the power-expected posterior prior as a generalization to the EPP in objective Bayesian model selection under normal linear models. We prove that it can be represented as a mixture of g-prior, like a wide range of prior distributions under normal linear models, and thus posterior distributions and Bayes factors are derived in closed form, keeping therefore computational tractability. Comparisons with other mixtures of g-prior are made and emphasis is given in the posterior distribution of g and its effect on Bayesian model selection and model averaging.

Research paper thumbnail of A Bayesian quest for finding a unified model for predicting volleyball games

Journal of the Royal Statistical Society: Series C (Applied Statistics)

Volleyball is a team sport with unique and specific characteristics. We introduce a new two level... more Volleyball is a team sport with unique and specific characteristics. We introduce a new two level-hierarchical Bayesian model which accounts for theses volleyball specific characteristics. In the first level, we model the set outcome with a simple logistic regression model. Conditionally on the winner of the set, in the second level, we use a truncated negative binomial distribution for the points earned by the loosing team. An additional Poisson distributed inflation component is introduced to model the extra points played in the case that the two teams have point difference less than two points. The number of points of the winner within each set is deterministically specified by the winner of the set and the points of the inflation component. The team specific abilities and the home effect are used as covariates on all layers of the model (set, point, and extra inflated points). The implementation of the proposed model on the Italian Superlega 2017/2018 data shows an exceptional reproducibility of the final league table and a satisfactory predictive ability.

Research paper thumbnail of Bayesian Model Averaging Using Power-Expected-Posterior Priors

Econometrics

This paper focuses on the Bayesian model average (BMA) using the power–expected– posterior prior ... more This paper focuses on the Bayesian model average (BMA) using the power–expected– posterior prior in objective Bayesian variable selection under normal linear models. We derive a BMA point estimate of a predicted value, and present computation and evaluation strategies of the prediction accuracy. We compare the performance of our method with that of similar approaches in a simulated and a real data example from economics.

Research paper thumbnail of Bayesian variable selection using the hyper-g prior in WinBUGS

Wiley Interdisciplinary Reviews: Computational Statistics

Research paper thumbnail of Prior Distributions for Objective Bayesian Analysis

Bayesian Analysis

We provide a review of prior distributions for objective Bayesian analysis. We start by examining... more We provide a review of prior distributions for objective Bayesian analysis. We start by examining some foundational issues and then organize our exposition into priors for: i) estimation or prediction; ii) model selection; iii) highdimensional models. With regard to i), we present some basic notions, and then move to more recent contributions on discrete parameter space, hierarchical models, nonparametric models, and penalizing complexity priors. Point ii) is the focus of this paper: it discusses principles for objective Bayesian model comparison, and singles out some major concepts for building priors, which are subsequently illustrated in some detail for the classic problem of variable selection in normal linear models. We also present some recent contributions in the area of objective priors on model space. With regard to point iii) we only provide a short summary of some default priors for high-dimensional models, a rapidly growing area of research.

Research paper thumbnail of Probability Based Independence Sampler for Bayesian Quantitative Learning in Graphical Log-Linear Marginal Models

Bayesian Analysis

We introduce a novel Bayesian approach for quantitative learning for graphical log-linear margina... more We introduce a novel Bayesian approach for quantitative learning for graphical log-linear marginal models. These models belong to curved exponential families that are difficult to handle from a Bayesian perspective. The likelihood cannot be analytically expressed as a function of the marginal log-linear interactions, but only in terms of cell counts or probabilities. Posterior distributions cannot be directly obtained, and Markov Chain Monte Carlo (MCMC) methods are needed. Finally, a well-defined model requires parameter values that lead to compatible marginal probabilities. Hence, any MCMC should account for this important restriction. We construct a fully automatic and efficient MCMC strategy for quantitative learning for such models that handles these problems. While the prior is expressed in terms of the marginal log-linear interactions, we build an MCMC algorithm that employs a proposal on the probability parameter space. The corresponding proposal on the marginal log-linear interactions is obtained via parameter transformation. We exploit a conditional conjugate setup to build an efficient proposal on probability parameters. The proposed methodology is illustrated by a simulation study and a real dataset.

Research paper thumbnail of Information consistency of the Jeffreys power-expected-posterior prior in Gaussian linear models

METRON

Power-expected-posterior (PEP) priors have been recently introduced as generalized versions of th... more Power-expected-posterior (PEP) priors have been recently introduced as generalized versions of the expected-posterior-priors (EPPs) for variable selection in Gaussian linear models. They are minimally-informative priors that reduce the effect of training samples under the EPP approach, by combining ideas from the power-prior and unit-information-prior methodologies. In this paper we prove the information consistency of the PEP methodology, when using the independence Jeffreys as a baseline prior, for the variable selection problem in normal linear models.

Research paper thumbnail of Thermodynamic Bayesian model comparison

Statistics and Computing, 2016

If citing, it is advised that you check and use the publisher's definitive version for pagination... more If citing, it is advised that you check and use the publisher's definitive version for pagination, volume/issue, and date of publication details. And where the final published version is provided on the Research Portal, if citing you are again advised to check the publisher's website for any subsequent corrections.

Research paper thumbnail of Bayesian Variable Selection Using the Gibbs Sampler

Speci cation of the linear predictor for a generalised linear model requires determining which va... more Speci cation of the linear predictor for a generalised linear model requires determining which variables to include. We consider Bayesian strategies for performing this variable selection. In particular we focus on approaches based on the Gibbs sampler. Such approaches may be implemented using the publically available software BUGS. We illustrate the methods using a simple example. BUGS code is provided in an appendix.

Research paper thumbnail of Stochastic search variable selection for log-linear models

Http Dx Doi Org 10 1080 00949650008812054, Mar 20, 2007

ABSTRACT We develop a Markov chain Monte Carlo algorithm, based on ‘stochastic search variable se... more ABSTRACT We develop a Markov chain Monte Carlo algorithm, based on ‘stochastic search variable selection’ (George and McCuUoch, 1993), for identifying promising log-linear models. The method may be used in the analysis of multi-way contingency tables where the set of plausible models is very large.

Research paper thumbnail of Bayesian Analysis of Marginal Log-Linear Graphical Models for Three Way Contingency Tables

Eprint Arxiv 0807 1001, Jul 7, 2008

This paper deals with the Bayesian analysis of graphical models of marginal independence for thre... more This paper deals with the Bayesian analysis of graphical models of marginal independence for three way contingency tables. We use a marginal log-linear parametrization, under which the model is defined through suitable zero-constraints on the interaction parameters calculated within marginal distributions. We undertake a comprehensive Bayesian analysis of these models, involving suitable choices of prior distributions, estimation, model determination, as well as the allied computational issues. The methodology is illustrated with reference to two real data sets.

Research paper thumbnail of Bayesian Analysis of Graphical Models of Marginal Independence for Three Way Contingency Tables

This paper deals with the Bayesian analysis of graphical models of marginal independence for thre... more This paper deals with the Bayesian analysis of graphical models of marginal independence for three way contingency tables. Each marginal independence model corresponds to a particular factorization of the cell probabilities and a conjugate analysis based on Dirichlet prior can be performed. We illustrate a comprehensive Bayesian analysis of such models, involving suitable choices of prior parameters, estimation, model determination, as well as the allied computational issues. The posterior distributions of the marginal log-linear parameters is indirectly obtained using simple Monte Carlo schemes. The methodology is illustrated using two real data sets.

Research paper thumbnail of Power-expected-posterior priors for variable selection in Gaussian linear models

Imaginary training samples are often used in Bayesian statistics to develop prior distributions, ... more Imaginary training samples are often used in Bayesian statistics to develop prior distributions, with appealing interpretations, for use in model comparison. Expected-posterior priors are defined via imaginary training samples coming from a common underlying predictive distribution m * , using an initial baseline prior distribution. These priors can have subjective and also default Bayesian implementations, based on different choices of m * and of the baseline prior. One of the main advantages of the expected-posterior priors is that impropriety of baseline priors causes no indeterminacy of Bayes factors; but at the same time they strongly depend on the selection and the size of the training sample. Here we combine ideas from the power-prior and the unitinformation prior methodologies to greatly diminish the effect of training samples on a Bayesian variable-selection problem using the expected-posterior prior approach: we raise the likelihood involved in the expected-posterior prior distribution to a power that produces a prior information content equivalent to one data point. The result is that in practice our power-expected-posterior (PEP) methodology is sufficiently insensitive to the size n * of the training sample that one may take n * equal to the full-data sample size and dispense with training samples altogether; this promotes stability of the resulting Bayes factors, removes the arbitrariness arising from individual training-sample selections, and greatly increases computational speed, allowing many more models to be compared within a fixed CPU budget. Here we focus on Gaussian linear models and develop our method under two different baseline prior choices: the independence Jeffreys prior and the Zellner g-prior. The method's performance is compared, in simulation studies and a real example involving prediction of air-pollutant concentrations from meteorological covariates, with a variety of previously-defined variants on Bayes factors for variable selection. We find that the variable-selection procedure using our PEP prior (1) is systematically more parsimonious than the original expected-posterior prior with minimal training sample, while sacrificing no desirable performance characteristics to achieve this parsimony; (2) is robust to the size of the training sample, thus enjoying the advantages described above arising from the avoidance of training samples altogether; and (3) identifies maximum-a-posteriori models that achieve good out-of-sample predictive performance.

Research paper thumbnail of Bivariate Poisson and Diagonal Inflated Bivariate Poisson Regression Models in R

In this paper we present an R package called bivpois for maximum likelihood estimation of the par... more In this paper we present an R package called bivpois for maximum likelihood estimation of the parameters of bivariate and diagonal inflated bivariate Poisson regression models. An Expectation-Maximization (EM) algorithm is implemented. Inflated models allow for modelling both over-dispersion (or under-dispersion) and negative correlation and thus they are appropriate for a wide range of applications. Extensions of the algorithms for several other models are also discussed. Detailed guidance and implementation on simulated and real data sets using bivpois package is provided.

Research paper thumbnail of High-dimensional variable selection via low-dimensional adaptive learning

arXiv: Computation, 2019

A stochastic search method, the so-called Adaptive Subspace (AdaSub) method, is proposed for vari... more A stochastic search method, the so-called Adaptive Subspace (AdaSub) method, is proposed for variable selection in high-dimensional linear regression models. The method aims at finding the best model with respect to a certain model selection criterion and is based on the idea of adaptively solving low-dimensional sub-problems in order to provide a solution to the original high-dimensional problem. Any of the usual ell_0\ell_0ell_0-type model selection criteria can be used, such as Akaike's Information Criterion (AIC), the Bayesian Information Criterion (BIC) or the Extended BIC (EBIC), with the last being particularly suitable for high-dimensional cases. The limiting properties of the new algorithm are analysed and it is shown that, under certain conditions, AdaSub converges to the best model according to the considered criterion. In a simulation study, the performance of AdaSub is investigated in comparison to alternative methods. The effectiveness of the proposed method is illustrated...

Research paper thumbnail of Bayesian Analysis of Marginal Log-Linear Graphical Models for Three Way Contingency Tables

arXiv: Methodology, 2008

This paper deals with the Bayesian analysis of graphical models of marginal independence for thre... more This paper deals with the Bayesian analysis of graphical models of marginal independence for three way contingency tables. We use a marginal log-linear parametrization, under which the model is defined through suitable zero-constraints on the interaction parameters calculated within marginal distributions. We undertake a comprehensive Bayesian analysis of these models, involving suitable choices of prior distributions, estimation, model determination, as well as the allied computational issues. The methodology is illustrated with reference to two real data sets.

Research paper thumbnail of A Metropolized adaptive subspace algorithm for high-dimensional Bayesian variable selection

A simple and efficient adaptive Markov Chain Monte Carlo (MCMC) method, called the Metropolized A... more A simple and efficient adaptive Markov Chain Monte Carlo (MCMC) method, called the Metropolized Adaptive Subspace (MAdaSub) algorithm, is proposed for sampling from high-dimensional posterior model distributions in Bayesian variable selection. The MAdaSub algorithm is based on an independent Metropolis-Hastings sampler, where the individual proposal probabilities of the explanatory variables are updated after each iteration using a form of Bayesian adaptive learning, in a way that they finally converge to the respective covariates’ posterior inclusion probabilities. We prove the ergodicity of the algorithm and present a parallel version of MAdaSub with an adaptation scheme for the proposal probabilities based on the combination of information from multiple chains. The effectiveness of the algorithm is demonstrated via various simulated and real data examples, including a high-dimensional problem with more than 20,000 covariates.

Research paper thumbnail of On the identifiability of Bayesian factor analytic models

A well known identifiability issue in factor analytic models is the invariance with respect to or... more A well known identifiability issue in factor analytic models is the invariance with respect to orthogonal transformations. This problem burdens the inference under a Bayesian setup, where Markov chain Monte Carlo (MCMC) methods are used to generate samples from the posterior distribution. We introduce a post-processing scheme in order to deal with rotation, sign and permutation invariance of the MCMC sample. The exact version of the contributed algorithm requires to solve 2q2^q2q assignment problems per (retained) MCMC iteration, where qqq denotes the number of factors of the fitted model. For large numbers of factors two approximate schemes based on simulated annealing are also discussed. We demonstrate that the proposed method leads to interpretable posterior distributions using synthetic and publicly available data from typical factor analytic models as well as mixtures of factor analyzers. An R package is available online at CRAN web-page.

Research paper thumbnail of Bayesian epidemic models for spatially aggregated count data

Statistics in medicine, Jan 12, 2017

Epidemic data often possess certain characteristics, such as the presence of many zeros, the spat... more Epidemic data often possess certain characteristics, such as the presence of many zeros, the spatial nature of the disease spread mechanism, environmental noise, serial correlation and dependence on time-varying factors. This paper addresses these issues via suitable Bayesian modelling. In doing so, we utilize a general class of stochastic regression models appropriate for spatio-temporal count data with an excess number of zeros. The developed regression framework does incorporate serial correlation and time-varying covariates through an Ornstein-Uhlenbeck process formulation. In addition, we explore the effect of different priors, including default options and variations of mixtures of g-priors. The effect of different distance kernels for the epidemic model component is investigated. We proceed by developing branching process-based methods for testing scenarios for disease control, thus linking traditional epidemiological models with stochastic epidemic processes, useful in polic...

Research paper thumbnail of Bayesian Model and Variable Evaluation

Research paper thumbnail of Power-Expected-Posterior Priors as Mixtures of g-Priors in Normal Linear Models

Bayesian Analysis

One of the main approaches used to construct prior distributions for objective Bayes methods is t... more One of the main approaches used to construct prior distributions for objective Bayes methods is the concept of random imaginary observations. Under this setup, the expected-posterior prior (EPP) offers several advantages, among which it has a nice and simple interpretation and provides an effective way to establish compatibility of priors among models. In this paper, we study the power-expected posterior prior as a generalization to the EPP in objective Bayesian model selection under normal linear models. We prove that it can be represented as a mixture of g-prior, like a wide range of prior distributions under normal linear models, and thus posterior distributions and Bayes factors are derived in closed form, keeping therefore computational tractability. Comparisons with other mixtures of g-prior are made and emphasis is given in the posterior distribution of g and its effect on Bayesian model selection and model averaging.

Research paper thumbnail of A Bayesian quest for finding a unified model for predicting volleyball games

Journal of the Royal Statistical Society: Series C (Applied Statistics)

Volleyball is a team sport with unique and specific characteristics. We introduce a new two level... more Volleyball is a team sport with unique and specific characteristics. We introduce a new two level-hierarchical Bayesian model which accounts for theses volleyball specific characteristics. In the first level, we model the set outcome with a simple logistic regression model. Conditionally on the winner of the set, in the second level, we use a truncated negative binomial distribution for the points earned by the loosing team. An additional Poisson distributed inflation component is introduced to model the extra points played in the case that the two teams have point difference less than two points. The number of points of the winner within each set is deterministically specified by the winner of the set and the points of the inflation component. The team specific abilities and the home effect are used as covariates on all layers of the model (set, point, and extra inflated points). The implementation of the proposed model on the Italian Superlega 2017/2018 data shows an exceptional reproducibility of the final league table and a satisfactory predictive ability.

Research paper thumbnail of Bayesian Model Averaging Using Power-Expected-Posterior Priors

Econometrics

This paper focuses on the Bayesian model average (BMA) using the power–expected– posterior prior ... more This paper focuses on the Bayesian model average (BMA) using the power–expected– posterior prior in objective Bayesian variable selection under normal linear models. We derive a BMA point estimate of a predicted value, and present computation and evaluation strategies of the prediction accuracy. We compare the performance of our method with that of similar approaches in a simulated and a real data example from economics.

Research paper thumbnail of Bayesian variable selection using the hyper-g prior in WinBUGS

Wiley Interdisciplinary Reviews: Computational Statistics

Research paper thumbnail of Prior Distributions for Objective Bayesian Analysis

Bayesian Analysis

We provide a review of prior distributions for objective Bayesian analysis. We start by examining... more We provide a review of prior distributions for objective Bayesian analysis. We start by examining some foundational issues and then organize our exposition into priors for: i) estimation or prediction; ii) model selection; iii) highdimensional models. With regard to i), we present some basic notions, and then move to more recent contributions on discrete parameter space, hierarchical models, nonparametric models, and penalizing complexity priors. Point ii) is the focus of this paper: it discusses principles for objective Bayesian model comparison, and singles out some major concepts for building priors, which are subsequently illustrated in some detail for the classic problem of variable selection in normal linear models. We also present some recent contributions in the area of objective priors on model space. With regard to point iii) we only provide a short summary of some default priors for high-dimensional models, a rapidly growing area of research.

Research paper thumbnail of Probability Based Independence Sampler for Bayesian Quantitative Learning in Graphical Log-Linear Marginal Models

Bayesian Analysis

We introduce a novel Bayesian approach for quantitative learning for graphical log-linear margina... more We introduce a novel Bayesian approach for quantitative learning for graphical log-linear marginal models. These models belong to curved exponential families that are difficult to handle from a Bayesian perspective. The likelihood cannot be analytically expressed as a function of the marginal log-linear interactions, but only in terms of cell counts or probabilities. Posterior distributions cannot be directly obtained, and Markov Chain Monte Carlo (MCMC) methods are needed. Finally, a well-defined model requires parameter values that lead to compatible marginal probabilities. Hence, any MCMC should account for this important restriction. We construct a fully automatic and efficient MCMC strategy for quantitative learning for such models that handles these problems. While the prior is expressed in terms of the marginal log-linear interactions, we build an MCMC algorithm that employs a proposal on the probability parameter space. The corresponding proposal on the marginal log-linear interactions is obtained via parameter transformation. We exploit a conditional conjugate setup to build an efficient proposal on probability parameters. The proposed methodology is illustrated by a simulation study and a real dataset.

Research paper thumbnail of Information consistency of the Jeffreys power-expected-posterior prior in Gaussian linear models

METRON

Power-expected-posterior (PEP) priors have been recently introduced as generalized versions of th... more Power-expected-posterior (PEP) priors have been recently introduced as generalized versions of the expected-posterior-priors (EPPs) for variable selection in Gaussian linear models. They are minimally-informative priors that reduce the effect of training samples under the EPP approach, by combining ideas from the power-prior and unit-information-prior methodologies. In this paper we prove the information consistency of the PEP methodology, when using the independence Jeffreys as a baseline prior, for the variable selection problem in normal linear models.

Research paper thumbnail of Thermodynamic Bayesian model comparison

Statistics and Computing, 2016

If citing, it is advised that you check and use the publisher's definitive version for pagination... more If citing, it is advised that you check and use the publisher's definitive version for pagination, volume/issue, and date of publication details. And where the final published version is provided on the Research Portal, if citing you are again advised to check the publisher's website for any subsequent corrections.

Research paper thumbnail of Bayesian Variable Selection Using the Gibbs Sampler

Speci cation of the linear predictor for a generalised linear model requires determining which va... more Speci cation of the linear predictor for a generalised linear model requires determining which variables to include. We consider Bayesian strategies for performing this variable selection. In particular we focus on approaches based on the Gibbs sampler. Such approaches may be implemented using the publically available software BUGS. We illustrate the methods using a simple example. BUGS code is provided in an appendix.

Research paper thumbnail of Stochastic search variable selection for log-linear models

Http Dx Doi Org 10 1080 00949650008812054, Mar 20, 2007

ABSTRACT We develop a Markov chain Monte Carlo algorithm, based on ‘stochastic search variable se... more ABSTRACT We develop a Markov chain Monte Carlo algorithm, based on ‘stochastic search variable selection’ (George and McCuUoch, 1993), for identifying promising log-linear models. The method may be used in the analysis of multi-way contingency tables where the set of plausible models is very large.

Research paper thumbnail of Bayesian Analysis of Marginal Log-Linear Graphical Models for Three Way Contingency Tables

Eprint Arxiv 0807 1001, Jul 7, 2008

This paper deals with the Bayesian analysis of graphical models of marginal independence for thre... more This paper deals with the Bayesian analysis of graphical models of marginal independence for three way contingency tables. We use a marginal log-linear parametrization, under which the model is defined through suitable zero-constraints on the interaction parameters calculated within marginal distributions. We undertake a comprehensive Bayesian analysis of these models, involving suitable choices of prior distributions, estimation, model determination, as well as the allied computational issues. The methodology is illustrated with reference to two real data sets.

Research paper thumbnail of Bayesian Analysis of Graphical Models of Marginal Independence for Three Way Contingency Tables

This paper deals with the Bayesian analysis of graphical models of marginal independence for thre... more This paper deals with the Bayesian analysis of graphical models of marginal independence for three way contingency tables. Each marginal independence model corresponds to a particular factorization of the cell probabilities and a conjugate analysis based on Dirichlet prior can be performed. We illustrate a comprehensive Bayesian analysis of such models, involving suitable choices of prior parameters, estimation, model determination, as well as the allied computational issues. The posterior distributions of the marginal log-linear parameters is indirectly obtained using simple Monte Carlo schemes. The methodology is illustrated using two real data sets.

Research paper thumbnail of Bayesian Inference and Model Selection for Association Models in Contingency Tables Sensitivity of Bayes Factor

Aims In this work we develop novel hypothesis tests for association models for two way contingenc... more Aims In this work we develop novel hypothesis tests for association models for two way contingency tables. We focus on conjugate analysis for the uniform, row and column effect model which can be considered as Poisson log-linear or Multinomial logit models. For the row-column model we will develop an MCMC based approach which will try to explore conditional conjugancy structures of the model. Finally, we will thoroughly examine the sensitivity of these approaches on prior parameters and will explore possibilities to implement objective Bayes techniques. Notation: In I × J Contingency tables • n ij : the observed cell counts • r i = j n ij : the row total • c j = i n ij : the column total • n = i j n ij : the grand total for all i = 1, 2, · · · , I and j = 1, 2, · · · , J. H 0 : there is no association between the two categories H 1 : there is association between the two categories M 0 : n|π i+ , π j+ ∼ Multinomial(n, π) π = π ij = π i+ × π T +j π i+ ∼ Dirichlet