Compositional Data in the Presence of Covariate and Correlated Errors: A Bayesian Approach (original) (raw)

Mixture Models in Measurement Error Problems, with Reference to Epidemiological Studies

Journal of the Royal Statistical Society Series A: Statistics in Society, 2002

SummaryThe paper focuses on a Bayesian treatment of measurement error problems and on the question of the specification of the prior distribution of the unknown covariates. It presents a flexible semiparametric model for this distribution based on a mixture of normal distributions with an unknown number of components. Implementation of this prior model as part of a full Bayesian analysis of measurement error problems is described in classical set-ups that are encountered in epidemiological studies: logistic regression between unknown covariates and outcome, with a normal or log-normal error model and a validation group. The feasibility of this combined model is tested and its performance is demonstrated in a simulation study that includes an assessment of the influence of misspecification of the prior distribution of the unknown covariates and a comparison with the semiparametric maximum likelihood method of Roeder, Carroll and Lindsay. Finally, the methodology is illustrated on a d...

Bayesian inference for multivariate meta-analysis Box-Cox transformation models for individual patient data with applications to evaluation of cholesterol-lowering drugs

Statistics in Medicine, 2013

In this paper, we propose a class of Box-Cox transformation regression models with multidimensional random effects for analyzing multivariate responses for individual patient data (IPD) in meta-analysis. Our modeling formulation uses a multivariate normal response metaanalysis model with multivariate random effects, in which each response is allowed to have its own Box-Cox transformation. Prior distributions are specified for the Box-Cox transformation parameters as well as the regression coefficients in this complex model, and the Deviance Information Criterion (DIC) is used to select the best transformation model. Since the model is quite complex, a novel Monte Carlo Markov chain (MCMC) sampling scheme is developed to sample from the joint posterior of the parameters. This model is motivated by a very rich dataset comprising 26 clinical trials involving cholesterol lowering drugs where the goal is to jointly model the three dimensional response consisting of Low Density Lipoprotein Cholesterol (LDL-C), High Density Lipoprotein Cholesterol (HDL-C), and Triglycerides (TG) (LDL-C, HDL-C, TG). Since the joint distribution of (LDL-C, HDL-C, TG) is not multivariate normal and in fact quite skewed, a Box-Cox transformation is needed to achieve normality. In the clinical literature, these three variables are usually analyzed univariately: however, a multivariate approach would be more appropriate since these variables are correlated with each other. A detailed analysis of these data is carried out using the proposed methodology.

Analysis of compositional data using Dirichlet covariate models /

Typescript. Thesis (Ph. D.) -- American University, 2003. American University, Dept. of Mathematics and Statistics. Dissertation advisor: Robert W. Jernigan. Includes bibliographical references (leaves 148-151). Dissertation Abstracts: 64:1788B, Oct. 2003. University Microfilms, Inc. order no. 30-87067.

A Bayesian approach to joint analysis of longitudinal measurements and competing risks failure time data

Statistics in Medicine, 2009

In this paper, we develop a Bayesian method for joint analysis of longitudinal measurements and competing risks failure time data. The model allows one to analyze the longitudinal outcome with nonignorable missing data induced by multiple types of events, to analyze survival data with dependent censoring for the key event, and to draw inferences on multiple endpoints simultaneously. Compared with the likelihood approach, the Bayesian method has several advantages. It is computationally more tractable for high-dimensional random effects. It is also convenient to draw inference. Moreover, it provides a means to incorporate prior information that may help to improve estimation accuracy. An illustration is given using a clinical trial data of scleroderma lung disease. The performance of our method is evaluated by simulation studies.

Time Series Analysis of Compositional Data Using a Dynamic Linear Model Approach

Compositional time series data comprises of multivariate observations that at each time point are essentially proportions of a whole quantity. This kind of data occurs frequently in many disciplines such as economics, geology and ecology. Usual multivariate statistical procedures available in the literature are not applicable for the analysis of such data since they ignore the inherent constrained nature of these observations as parts of a whole. This article describes new techniques for modeling compositional time series data in a hierarchical Bayesian framework. Modified dynamic linear models are fit to compositional data via Markov chain Monte Carlo techniques. The distribution of the underlying errors is assumed to be a scale mixture of multivariate normals of which the multivariate normal, multivariate t, multivariate logistic, etc., are special cases. In particular, multivariate normal and Student-t error structures are considered and compared through predictive distributions. The approach is illustrated on a data set.

Bayesian analysis for a skew extension of the multivariate null intercept measurement error model

Journal of Applied Statistics, 2008

The skew-normal distribution is a class of distributions which includes the normal distributions as a special case. In this paper, we explore the use of Markov Chain Monte Carlo (MCMC) methods to develop a Bayesian analysis in multivariate null intercept measurement error model (Aoki et al., 2003b) where the unobserved value of the covariate (latent variable) follows a skew-normal distribution. The results and methods are applied to a real dental clinical trial presented in Hadgu and Koch (1999).

Análisis bayesiano en presencia de covariables para datos de sobrevivencia multivariados: un ejemplo de aplicación

2011

En este artículo, se introduce un análisis bayesiano para datos multivariados de sobrevivencia en presencia de un vector de covariables y observaciones censuradas. Diferentes “fragilidades” o variables latentes son consideradas para capturar la correlación entre los tiempos de sobrevivencia para un mismo individuo. Asumimos distribuciones Weibull o Gamma generalizadas considerando datos de tiempo de vida a derecha. Desarrollamos el análisis bayesiano usando métodos Markov Chain Monte Carlo (MCMC).In this paper, we introduce a Bayesian analysis for survival multivariate data in the presence of a covariate vector and censored observations. Different “frailties” or latent variables are considered to capture the correlation among the survival times for the same individual. We assumeWeibull or generalized Gamma distributions considering right censored lifetime data. We develop the Bayesian analysis using Markov Chain Monte Carlo (MCMC) methods

A Bayesian approach to case-control studies with errors in covariables

Biostatistics, 2002

We develop Bayesian methodology for the analysis of case-control data with covariate imprecision. The pretense that the distribution of the imprecisely measured covariate is discrete on a heuristically chosen support set leads to a method which is reasonably simple to implement, and can be applied to different study designs. The methodological development emphasizes the interplay between retrospective and prospective analysis. We illustrate the method on simulated data, and on data from a cancer study where smoking history is the imprecisely measured covariate.

Bayesian modeling of multivariate loss reserving data based on scale mixtures of multivariate normal distributions: estimation and case influence diagnostics

Communications in Statistics - Theory and Methods, 2018

One of the most important problems in general insurance is estimating the loss reserve distribution. In this article, we develop Bayesian multivariate loss reserving models for cases where losses and random effects are assumed to be distributed under the scale mixtures of multivariate normal (SMMN) distributions. This class of distributions, which contains heavy-tailed multivariate distributions such as student's t, Pearson type VII, variance-gamma, slash and contaminated normal distributions, can be often used for robust inferences; when the assumptions of normality become questionable. The hierarchical structure of the SMMN representation has the advantage that under a Bayesian paradigm, the parameter estimation is simplified by sampling from multivariate normal distribution using Markov Chain Monte Carlo (MCMC) methods. A Bayesian case deletion influence diagnostics based on q-divergence measures is also presented. Further, simulated and real data sets are analyzed, where we show that the models under the Pearson type VII and the variance-gamma distributions outperform the usual normal models.

A Bayesian analysis for pseudo-compositional data with spatial structure

Statistical Methods in Medical Research, 2019

We proposed a Bayesian analysis of pseudo-compositional data in presence of a latent factor, assuming a spatial structure. This development was motivated by a dataset containing information on the number of newborns of primiparous mothers living in each of the microregions of the state of Sao Paulo, Brazil, in the year of 2015, stratified by the age of the mothers (15-18, 19-29 and 30 years or more). Considering that data on newborns are not stochastically distributed among the three age groups, but they are explained in relation to women's population structure, we adopted the expression ''pseudo-compositional data'' to refer to this data structure. The hypothesis of interest establishes that the age of the first pregnancy is associated with the economic conditions of the geographic area where the mother lives. The incidence of poverty was included as an independent variable. Additive log-ratio (alr) and isometric log-ratio (ilr) transformations were considered, as is usually done in the analysis of compositional data. The model included a random effect related to the spatial effect assumed to have a conditional autoregressive structure. A Bayesian Markov Chain Monte Carlo (MCMC) simulation procedure was used to get the posterior summaries of interest. The model based on the (ilr) transformation was well fitted to the data, showing that in the microregions with the highest incidence of poverty, there are higher proportions of women who have their first child in adolescence, while in the microregions with the lowest incidence of poverty, there are higher proportions of women who have their first child after the age of 30 years. From these results it is possible to conclude that this Bayesian approach was very useful in the estimation of the parameters of the proposed model. The proposed method should have a broad application to other problems involving pseudo-compositional data.