Ioannis Ntzoufras | Athens University of Economics and Business (original) (raw)
Uploads
Papers by Ioannis Ntzoufras
Imaginary training samples are often used in Bayesian statistics to develop prior distributions, ... more Imaginary training samples are often used in Bayesian statistics to develop prior distributions, with appealing interpretations, for use in model comparison. Expected-posterior priors are defined via imaginary training samples coming from a common underlying predictive distribution m * , using an initial baseline prior distribution. These priors can have subjective and also default Bayesian implementations, based on different choices of m * and of the baseline prior. One of the main advantages of the expected-posterior priors is that impropriety of baseline priors causes no indeterminacy of Bayes factors; but at the same time they strongly depend on the selection and the size of the training sample. Here we combine ideas from the power-prior and the unitinformation prior methodologies to greatly diminish the effect of training samples on a Bayesian variable-selection problem using the expected-posterior prior approach: we raise the likelihood involved in the expected-posterior prior distribution to a power that produces a prior information content equivalent to one data point. The result is that in practice our power-expected-posterior (PEP) methodology is sufficiently insensitive to the size n * of the training sample that one may take n * equal to the full-data sample size and dispense with training samples altogether; this promotes stability of the resulting Bayes factors, removes the arbitrariness arising from individual training-sample selections, and greatly increases computational speed, allowing many more models to be compared within a fixed CPU budget. Here we focus on Gaussian linear models and develop our method under two different baseline prior choices: the independence Jeffreys prior and the Zellner g-prior. The method's performance is compared, in simulation studies and a real example involving prediction of air-pollutant concentrations from meteorological covariates, with a variety of previously-defined variants on Bayes factors for variable selection. We find that the variable-selection procedure using our PEP prior (1) is systematically more parsimonious than the original expected-posterior prior with minimal training sample, while sacrificing no desirable performance characteristics to achieve this parsimony; (2) is robust to the size of the training sample, thus enjoying the advantages described above arising from the avoidance of training samples altogether; and (3) identifies maximum-a-posteriori models that achieve good out-of-sample predictive performance.
In this paper we present an R package called bivpois for maximum likelihood estimation of the par... more In this paper we present an R package called bivpois for maximum likelihood estimation of the parameters of bivariate and diagonal inflated bivariate Poisson regression models. An Expectation-Maximization (EM) algorithm is implemented. Inflated models allow for modelling both over-dispersion (or under-dispersion) and negative correlation and thus they are appropriate for a wide range of applications. Extensions of the algorithms for several other models are also discussed. Detailed guidance and implementation on simulated and real data sets using bivpois package is provided.
arXiv: Computation, 2019
A stochastic search method, the so-called Adaptive Subspace (AdaSub) method, is proposed for vari... more A stochastic search method, the so-called Adaptive Subspace (AdaSub) method, is proposed for variable selection in high-dimensional linear regression models. The method aims at finding the best model with respect to a certain model selection criterion and is based on the idea of adaptively solving low-dimensional sub-problems in order to provide a solution to the original high-dimensional problem. Any of the usual ell_0\ell_0ell_0-type model selection criteria can be used, such as Akaike's Information Criterion (AIC), the Bayesian Information Criterion (BIC) or the Extended BIC (EBIC), with the last being particularly suitable for high-dimensional cases. The limiting properties of the new algorithm are analysed and it is shown that, under certain conditions, AdaSub converges to the best model according to the considered criterion. In a simulation study, the performance of AdaSub is investigated in comparison to alternative methods. The effectiveness of the proposed method is illustrated...
arXiv: Methodology, 2008
This paper deals with the Bayesian analysis of graphical models of marginal independence for thre... more This paper deals with the Bayesian analysis of graphical models of marginal independence for three way contingency tables. We use a marginal log-linear parametrization, under which the model is defined through suitable zero-constraints on the interaction parameters calculated within marginal distributions. We undertake a comprehensive Bayesian analysis of these models, involving suitable choices of prior distributions, estimation, model determination, as well as the allied computational issues. The methodology is illustrated with reference to two real data sets.
A simple and efficient adaptive Markov Chain Monte Carlo (MCMC) method, called the Metropolized A... more A simple and efficient adaptive Markov Chain Monte Carlo (MCMC) method, called the Metropolized Adaptive Subspace (MAdaSub) algorithm, is proposed for sampling from high-dimensional posterior model distributions in Bayesian variable selection. The MAdaSub algorithm is based on an independent Metropolis-Hastings sampler, where the individual proposal probabilities of the explanatory variables are updated after each iteration using a form of Bayesian adaptive learning, in a way that they finally converge to the respective covariates’ posterior inclusion probabilities. We prove the ergodicity of the algorithm and present a parallel version of MAdaSub with an adaptation scheme for the proposal probabilities based on the combination of information from multiple chains. The effectiveness of the algorithm is demonstrated via various simulated and real data examples, including a high-dimensional problem with more than 20,000 covariates.
A well known identifiability issue in factor analytic models is the invariance with respect to or... more A well known identifiability issue in factor analytic models is the invariance with respect to orthogonal transformations. This problem burdens the inference under a Bayesian setup, where Markov chain Monte Carlo (MCMC) methods are used to generate samples from the posterior distribution. We introduce a post-processing scheme in order to deal with rotation, sign and permutation invariance of the MCMC sample. The exact version of the contributed algorithm requires to solve 2q2^q2q assignment problems per (retained) MCMC iteration, where qqq denotes the number of factors of the fitted model. For large numbers of factors two approximate schemes based on simulated annealing are also discussed. We demonstrate that the proposed method leads to interpretable posterior distributions using synthetic and publicly available data from typical factor analytic models as well as mixtures of factor analyzers. An R package is available online at CRAN web-page.
Statistics in medicine, Jan 12, 2017
Epidemic data often possess certain characteristics, such as the presence of many zeros, the spat... more Epidemic data often possess certain characteristics, such as the presence of many zeros, the spatial nature of the disease spread mechanism, environmental noise, serial correlation and dependence on time-varying factors. This paper addresses these issues via suitable Bayesian modelling. In doing so, we utilize a general class of stochastic regression models appropriate for spatio-temporal count data with an excess number of zeros. The developed regression framework does incorporate serial correlation and time-varying covariates through an Ornstein-Uhlenbeck process formulation. In addition, we explore the effect of different priors, including default options and variations of mixtures of g-priors. The effect of different distance kernels for the epidemic model component is investigated. We proceed by developing branching process-based methods for testing scenarios for disease control, thus linking traditional epidemiological models with stochastic epidemic processes, useful in polic...
Bayesian Analysis
One of the main approaches used to construct prior distributions for objective Bayes methods is t... more One of the main approaches used to construct prior distributions for objective Bayes methods is the concept of random imaginary observations. Under this setup, the expected-posterior prior (EPP) offers several advantages, among which it has a nice and simple interpretation and provides an effective way to establish compatibility of priors among models. In this paper, we study the power-expected posterior prior as a generalization to the EPP in objective Bayesian model selection under normal linear models. We prove that it can be represented as a mixture of g-prior, like a wide range of prior distributions under normal linear models, and thus posterior distributions and Bayes factors are derived in closed form, keeping therefore computational tractability. Comparisons with other mixtures of g-prior are made and emphasis is given in the posterior distribution of g and its effect on Bayesian model selection and model averaging.
Journal of the Royal Statistical Society: Series C (Applied Statistics)
Volleyball is a team sport with unique and specific characteristics. We introduce a new two level... more Volleyball is a team sport with unique and specific characteristics. We introduce a new two level-hierarchical Bayesian model which accounts for theses volleyball specific characteristics. In the first level, we model the set outcome with a simple logistic regression model. Conditionally on the winner of the set, in the second level, we use a truncated negative binomial distribution for the points earned by the loosing team. An additional Poisson distributed inflation component is introduced to model the extra points played in the case that the two teams have point difference less than two points. The number of points of the winner within each set is deterministically specified by the winner of the set and the points of the inflation component. The team specific abilities and the home effect are used as covariates on all layers of the model (set, point, and extra inflated points). The implementation of the proposed model on the Italian Superlega 2017/2018 data shows an exceptional reproducibility of the final league table and a satisfactory predictive ability.
Econometrics
This paper focuses on the Bayesian model average (BMA) using the power–expected– posterior prior ... more This paper focuses on the Bayesian model average (BMA) using the power–expected– posterior prior in objective Bayesian variable selection under normal linear models. We derive a BMA point estimate of a predicted value, and present computation and evaluation strategies of the prediction accuracy. We compare the performance of our method with that of similar approaches in a simulated and a real data example from economics.
Wiley Interdisciplinary Reviews: Computational Statistics
Bayesian Analysis
We provide a review of prior distributions for objective Bayesian analysis. We start by examining... more We provide a review of prior distributions for objective Bayesian analysis. We start by examining some foundational issues and then organize our exposition into priors for: i) estimation or prediction; ii) model selection; iii) highdimensional models. With regard to i), we present some basic notions, and then move to more recent contributions on discrete parameter space, hierarchical models, nonparametric models, and penalizing complexity priors. Point ii) is the focus of this paper: it discusses principles for objective Bayesian model comparison, and singles out some major concepts for building priors, which are subsequently illustrated in some detail for the classic problem of variable selection in normal linear models. We also present some recent contributions in the area of objective priors on model space. With regard to point iii) we only provide a short summary of some default priors for high-dimensional models, a rapidly growing area of research.
Bayesian Analysis
We introduce a novel Bayesian approach for quantitative learning for graphical log-linear margina... more We introduce a novel Bayesian approach for quantitative learning for graphical log-linear marginal models. These models belong to curved exponential families that are difficult to handle from a Bayesian perspective. The likelihood cannot be analytically expressed as a function of the marginal log-linear interactions, but only in terms of cell counts or probabilities. Posterior distributions cannot be directly obtained, and Markov Chain Monte Carlo (MCMC) methods are needed. Finally, a well-defined model requires parameter values that lead to compatible marginal probabilities. Hence, any MCMC should account for this important restriction. We construct a fully automatic and efficient MCMC strategy for quantitative learning for such models that handles these problems. While the prior is expressed in terms of the marginal log-linear interactions, we build an MCMC algorithm that employs a proposal on the probability parameter space. The corresponding proposal on the marginal log-linear interactions is obtained via parameter transformation. We exploit a conditional conjugate setup to build an efficient proposal on probability parameters. The proposed methodology is illustrated by a simulation study and a real dataset.
METRON
Power-expected-posterior (PEP) priors have been recently introduced as generalized versions of th... more Power-expected-posterior (PEP) priors have been recently introduced as generalized versions of the expected-posterior-priors (EPPs) for variable selection in Gaussian linear models. They are minimally-informative priors that reduce the effect of training samples under the EPP approach, by combining ideas from the power-prior and unit-information-prior methodologies. In this paper we prove the information consistency of the PEP methodology, when using the independence Jeffreys as a baseline prior, for the variable selection problem in normal linear models.
Statistics and Computing, 2016
If citing, it is advised that you check and use the publisher's definitive version for pagination... more If citing, it is advised that you check and use the publisher's definitive version for pagination, volume/issue, and date of publication details. And where the final published version is provided on the Research Portal, if citing you are again advised to check the publisher's website for any subsequent corrections.
Speci cation of the linear predictor for a generalised linear model requires determining which va... more Speci cation of the linear predictor for a generalised linear model requires determining which variables to include. We consider Bayesian strategies for performing this variable selection. In particular we focus on approaches based on the Gibbs sampler. Such approaches may be implemented using the publically available software BUGS. We illustrate the methods using a simple example. BUGS code is provided in an appendix.
Http Dx Doi Org 10 1080 00949650008812054, Mar 20, 2007
ABSTRACT We develop a Markov chain Monte Carlo algorithm, based on ‘stochastic search variable se... more ABSTRACT We develop a Markov chain Monte Carlo algorithm, based on ‘stochastic search variable selection’ (George and McCuUoch, 1993), for identifying promising log-linear models. The method may be used in the analysis of multi-way contingency tables where the set of plausible models is very large.
Eprint Arxiv 0807 1001, Jul 7, 2008
This paper deals with the Bayesian analysis of graphical models of marginal independence for thre... more This paper deals with the Bayesian analysis of graphical models of marginal independence for three way contingency tables. We use a marginal log-linear parametrization, under which the model is defined through suitable zero-constraints on the interaction parameters calculated within marginal distributions. We undertake a comprehensive Bayesian analysis of these models, involving suitable choices of prior distributions, estimation, model determination, as well as the allied computational issues. The methodology is illustrated with reference to two real data sets.
This paper deals with the Bayesian analysis of graphical models of marginal independence for thre... more This paper deals with the Bayesian analysis of graphical models of marginal independence for three way contingency tables. Each marginal independence model corresponds to a particular factorization of the cell probabilities and a conjugate analysis based on Dirichlet prior can be performed. We illustrate a comprehensive Bayesian analysis of such models, involving suitable choices of prior parameters, estimation, model determination, as well as the allied computational issues. The posterior distributions of the marginal log-linear parameters is indirectly obtained using simple Monte Carlo schemes. The methodology is illustrated using two real data sets.
Imaginary training samples are often used in Bayesian statistics to develop prior distributions, ... more Imaginary training samples are often used in Bayesian statistics to develop prior distributions, with appealing interpretations, for use in model comparison. Expected-posterior priors are defined via imaginary training samples coming from a common underlying predictive distribution m * , using an initial baseline prior distribution. These priors can have subjective and also default Bayesian implementations, based on different choices of m * and of the baseline prior. One of the main advantages of the expected-posterior priors is that impropriety of baseline priors causes no indeterminacy of Bayes factors; but at the same time they strongly depend on the selection and the size of the training sample. Here we combine ideas from the power-prior and the unitinformation prior methodologies to greatly diminish the effect of training samples on a Bayesian variable-selection problem using the expected-posterior prior approach: we raise the likelihood involved in the expected-posterior prior distribution to a power that produces a prior information content equivalent to one data point. The result is that in practice our power-expected-posterior (PEP) methodology is sufficiently insensitive to the size n * of the training sample that one may take n * equal to the full-data sample size and dispense with training samples altogether; this promotes stability of the resulting Bayes factors, removes the arbitrariness arising from individual training-sample selections, and greatly increases computational speed, allowing many more models to be compared within a fixed CPU budget. Here we focus on Gaussian linear models and develop our method under two different baseline prior choices: the independence Jeffreys prior and the Zellner g-prior. The method's performance is compared, in simulation studies and a real example involving prediction of air-pollutant concentrations from meteorological covariates, with a variety of previously-defined variants on Bayes factors for variable selection. We find that the variable-selection procedure using our PEP prior (1) is systematically more parsimonious than the original expected-posterior prior with minimal training sample, while sacrificing no desirable performance characteristics to achieve this parsimony; (2) is robust to the size of the training sample, thus enjoying the advantages described above arising from the avoidance of training samples altogether; and (3) identifies maximum-a-posteriori models that achieve good out-of-sample predictive performance.
In this paper we present an R package called bivpois for maximum likelihood estimation of the par... more In this paper we present an R package called bivpois for maximum likelihood estimation of the parameters of bivariate and diagonal inflated bivariate Poisson regression models. An Expectation-Maximization (EM) algorithm is implemented. Inflated models allow for modelling both over-dispersion (or under-dispersion) and negative correlation and thus they are appropriate for a wide range of applications. Extensions of the algorithms for several other models are also discussed. Detailed guidance and implementation on simulated and real data sets using bivpois package is provided.
arXiv: Computation, 2019
A stochastic search method, the so-called Adaptive Subspace (AdaSub) method, is proposed for vari... more A stochastic search method, the so-called Adaptive Subspace (AdaSub) method, is proposed for variable selection in high-dimensional linear regression models. The method aims at finding the best model with respect to a certain model selection criterion and is based on the idea of adaptively solving low-dimensional sub-problems in order to provide a solution to the original high-dimensional problem. Any of the usual ell_0\ell_0ell_0-type model selection criteria can be used, such as Akaike's Information Criterion (AIC), the Bayesian Information Criterion (BIC) or the Extended BIC (EBIC), with the last being particularly suitable for high-dimensional cases. The limiting properties of the new algorithm are analysed and it is shown that, under certain conditions, AdaSub converges to the best model according to the considered criterion. In a simulation study, the performance of AdaSub is investigated in comparison to alternative methods. The effectiveness of the proposed method is illustrated...
arXiv: Methodology, 2008
This paper deals with the Bayesian analysis of graphical models of marginal independence for thre... more This paper deals with the Bayesian analysis of graphical models of marginal independence for three way contingency tables. We use a marginal log-linear parametrization, under which the model is defined through suitable zero-constraints on the interaction parameters calculated within marginal distributions. We undertake a comprehensive Bayesian analysis of these models, involving suitable choices of prior distributions, estimation, model determination, as well as the allied computational issues. The methodology is illustrated with reference to two real data sets.
A simple and efficient adaptive Markov Chain Monte Carlo (MCMC) method, called the Metropolized A... more A simple and efficient adaptive Markov Chain Monte Carlo (MCMC) method, called the Metropolized Adaptive Subspace (MAdaSub) algorithm, is proposed for sampling from high-dimensional posterior model distributions in Bayesian variable selection. The MAdaSub algorithm is based on an independent Metropolis-Hastings sampler, where the individual proposal probabilities of the explanatory variables are updated after each iteration using a form of Bayesian adaptive learning, in a way that they finally converge to the respective covariates’ posterior inclusion probabilities. We prove the ergodicity of the algorithm and present a parallel version of MAdaSub with an adaptation scheme for the proposal probabilities based on the combination of information from multiple chains. The effectiveness of the algorithm is demonstrated via various simulated and real data examples, including a high-dimensional problem with more than 20,000 covariates.
A well known identifiability issue in factor analytic models is the invariance with respect to or... more A well known identifiability issue in factor analytic models is the invariance with respect to orthogonal transformations. This problem burdens the inference under a Bayesian setup, where Markov chain Monte Carlo (MCMC) methods are used to generate samples from the posterior distribution. We introduce a post-processing scheme in order to deal with rotation, sign and permutation invariance of the MCMC sample. The exact version of the contributed algorithm requires to solve 2q2^q2q assignment problems per (retained) MCMC iteration, where qqq denotes the number of factors of the fitted model. For large numbers of factors two approximate schemes based on simulated annealing are also discussed. We demonstrate that the proposed method leads to interpretable posterior distributions using synthetic and publicly available data from typical factor analytic models as well as mixtures of factor analyzers. An R package is available online at CRAN web-page.
Statistics in medicine, Jan 12, 2017
Epidemic data often possess certain characteristics, such as the presence of many zeros, the spat... more Epidemic data often possess certain characteristics, such as the presence of many zeros, the spatial nature of the disease spread mechanism, environmental noise, serial correlation and dependence on time-varying factors. This paper addresses these issues via suitable Bayesian modelling. In doing so, we utilize a general class of stochastic regression models appropriate for spatio-temporal count data with an excess number of zeros. The developed regression framework does incorporate serial correlation and time-varying covariates through an Ornstein-Uhlenbeck process formulation. In addition, we explore the effect of different priors, including default options and variations of mixtures of g-priors. The effect of different distance kernels for the epidemic model component is investigated. We proceed by developing branching process-based methods for testing scenarios for disease control, thus linking traditional epidemiological models with stochastic epidemic processes, useful in polic...
Bayesian Analysis
One of the main approaches used to construct prior distributions for objective Bayes methods is t... more One of the main approaches used to construct prior distributions for objective Bayes methods is the concept of random imaginary observations. Under this setup, the expected-posterior prior (EPP) offers several advantages, among which it has a nice and simple interpretation and provides an effective way to establish compatibility of priors among models. In this paper, we study the power-expected posterior prior as a generalization to the EPP in objective Bayesian model selection under normal linear models. We prove that it can be represented as a mixture of g-prior, like a wide range of prior distributions under normal linear models, and thus posterior distributions and Bayes factors are derived in closed form, keeping therefore computational tractability. Comparisons with other mixtures of g-prior are made and emphasis is given in the posterior distribution of g and its effect on Bayesian model selection and model averaging.
Journal of the Royal Statistical Society: Series C (Applied Statistics)
Volleyball is a team sport with unique and specific characteristics. We introduce a new two level... more Volleyball is a team sport with unique and specific characteristics. We introduce a new two level-hierarchical Bayesian model which accounts for theses volleyball specific characteristics. In the first level, we model the set outcome with a simple logistic regression model. Conditionally on the winner of the set, in the second level, we use a truncated negative binomial distribution for the points earned by the loosing team. An additional Poisson distributed inflation component is introduced to model the extra points played in the case that the two teams have point difference less than two points. The number of points of the winner within each set is deterministically specified by the winner of the set and the points of the inflation component. The team specific abilities and the home effect are used as covariates on all layers of the model (set, point, and extra inflated points). The implementation of the proposed model on the Italian Superlega 2017/2018 data shows an exceptional reproducibility of the final league table and a satisfactory predictive ability.
Econometrics
This paper focuses on the Bayesian model average (BMA) using the power–expected– posterior prior ... more This paper focuses on the Bayesian model average (BMA) using the power–expected– posterior prior in objective Bayesian variable selection under normal linear models. We derive a BMA point estimate of a predicted value, and present computation and evaluation strategies of the prediction accuracy. We compare the performance of our method with that of similar approaches in a simulated and a real data example from economics.
Wiley Interdisciplinary Reviews: Computational Statistics
Bayesian Analysis
We provide a review of prior distributions for objective Bayesian analysis. We start by examining... more We provide a review of prior distributions for objective Bayesian analysis. We start by examining some foundational issues and then organize our exposition into priors for: i) estimation or prediction; ii) model selection; iii) highdimensional models. With regard to i), we present some basic notions, and then move to more recent contributions on discrete parameter space, hierarchical models, nonparametric models, and penalizing complexity priors. Point ii) is the focus of this paper: it discusses principles for objective Bayesian model comparison, and singles out some major concepts for building priors, which are subsequently illustrated in some detail for the classic problem of variable selection in normal linear models. We also present some recent contributions in the area of objective priors on model space. With regard to point iii) we only provide a short summary of some default priors for high-dimensional models, a rapidly growing area of research.
Bayesian Analysis
We introduce a novel Bayesian approach for quantitative learning for graphical log-linear margina... more We introduce a novel Bayesian approach for quantitative learning for graphical log-linear marginal models. These models belong to curved exponential families that are difficult to handle from a Bayesian perspective. The likelihood cannot be analytically expressed as a function of the marginal log-linear interactions, but only in terms of cell counts or probabilities. Posterior distributions cannot be directly obtained, and Markov Chain Monte Carlo (MCMC) methods are needed. Finally, a well-defined model requires parameter values that lead to compatible marginal probabilities. Hence, any MCMC should account for this important restriction. We construct a fully automatic and efficient MCMC strategy for quantitative learning for such models that handles these problems. While the prior is expressed in terms of the marginal log-linear interactions, we build an MCMC algorithm that employs a proposal on the probability parameter space. The corresponding proposal on the marginal log-linear interactions is obtained via parameter transformation. We exploit a conditional conjugate setup to build an efficient proposal on probability parameters. The proposed methodology is illustrated by a simulation study and a real dataset.
METRON
Power-expected-posterior (PEP) priors have been recently introduced as generalized versions of th... more Power-expected-posterior (PEP) priors have been recently introduced as generalized versions of the expected-posterior-priors (EPPs) for variable selection in Gaussian linear models. They are minimally-informative priors that reduce the effect of training samples under the EPP approach, by combining ideas from the power-prior and unit-information-prior methodologies. In this paper we prove the information consistency of the PEP methodology, when using the independence Jeffreys as a baseline prior, for the variable selection problem in normal linear models.
Statistics and Computing, 2016
If citing, it is advised that you check and use the publisher's definitive version for pagination... more If citing, it is advised that you check and use the publisher's definitive version for pagination, volume/issue, and date of publication details. And where the final published version is provided on the Research Portal, if citing you are again advised to check the publisher's website for any subsequent corrections.
Speci cation of the linear predictor for a generalised linear model requires determining which va... more Speci cation of the linear predictor for a generalised linear model requires determining which variables to include. We consider Bayesian strategies for performing this variable selection. In particular we focus on approaches based on the Gibbs sampler. Such approaches may be implemented using the publically available software BUGS. We illustrate the methods using a simple example. BUGS code is provided in an appendix.
Http Dx Doi Org 10 1080 00949650008812054, Mar 20, 2007
ABSTRACT We develop a Markov chain Monte Carlo algorithm, based on ‘stochastic search variable se... more ABSTRACT We develop a Markov chain Monte Carlo algorithm, based on ‘stochastic search variable selection’ (George and McCuUoch, 1993), for identifying promising log-linear models. The method may be used in the analysis of multi-way contingency tables where the set of plausible models is very large.
Eprint Arxiv 0807 1001, Jul 7, 2008
This paper deals with the Bayesian analysis of graphical models of marginal independence for thre... more This paper deals with the Bayesian analysis of graphical models of marginal independence for three way contingency tables. We use a marginal log-linear parametrization, under which the model is defined through suitable zero-constraints on the interaction parameters calculated within marginal distributions. We undertake a comprehensive Bayesian analysis of these models, involving suitable choices of prior distributions, estimation, model determination, as well as the allied computational issues. The methodology is illustrated with reference to two real data sets.
This paper deals with the Bayesian analysis of graphical models of marginal independence for thre... more This paper deals with the Bayesian analysis of graphical models of marginal independence for three way contingency tables. Each marginal independence model corresponds to a particular factorization of the cell probabilities and a conjugate analysis based on Dirichlet prior can be performed. We illustrate a comprehensive Bayesian analysis of such models, involving suitable choices of prior parameters, estimation, model determination, as well as the allied computational issues. The posterior distributions of the marginal log-linear parameters is indirectly obtained using simple Monte Carlo schemes. The methodology is illustrated using two real data sets.
Aims In this work we develop novel hypothesis tests for association models for two way contingenc... more Aims In this work we develop novel hypothesis tests for association models for two way contingency tables. We focus on conjugate analysis for the uniform, row and column effect model which can be considered as Poisson log-linear or Multinomial logit models. For the row-column model we will develop an MCMC based approach which will try to explore conditional conjugancy structures of the model. Finally, we will thoroughly examine the sensitivity of these approaches on prior parameters and will explore possibilities to implement objective Bayes techniques. Notation: In I × J Contingency tables • n ij : the observed cell counts • r i = j n ij : the row total • c j = i n ij : the column total • n = i j n ij : the grand total for all i = 1, 2, · · · , I and j = 1, 2, · · · , J. H 0 : there is no association between the two categories H 1 : there is association between the two categories M 0 : n|π i+ , π j+ ∼ Multinomial(n, π) π = π ij = π i+ × π T +j π i+ ∼ Dirichlet