Bayesian Nonparametric Longitudinal Data Analysis (original) (raw)

Bayesian Inference in Semiparametric Mixed Models for Longitudinal Data

Biometrics, 2010

We consider Bayesian inference in semiparametric mixed models (SPMMs) for longitudinal data. SPMMs are a class of models that use a nonparametric function to model a time effect, a parametric function to model other covariate effects, and parametric or nonparametric random effects to account for the within-subject correlation. We model the nonparametric function using a Bayesian formulation of a cubic smoothing spline, and the random effect distribution using a normal distribution and alternatively a nonparametric Dirichlet process (DP) prior. When the random effect distribution is assumed to be normal, we propose a uniform shrinkage prior (USP) for the variance components and the smoothing parameter. When the random effect distribution is modeled nonparametrically, we use a DP prior with a normal base measure and propose a USP for the hyperparameters of the DP base measure. We argue that the commonly assumed DP prior implies a non-zero mean of the random effect distribution, even when a base measure with mean zero is specified. This implies weak identifiability for the fixed effects, and can therefore lead to biased estimators and poor inference for the regression coefficients and the spline estimator of the nonparametric function. We propose an adjustment using a post-processing technique. We show that under mild conditions the posterior is proper under the proposed USP, a flat prior for the fixed effect parameters, and an improper prior for the residual variance. We illustrate the proposed approach using a longitudinal hormone dataset, and carry out extensive simulation studies to compare its finite sample performance with existing methods.

A nonparametric Bayesian model for inference in related longitudinal studies

Journal of the Royal Statistical Society: Series C (Applied Statistics), 2005

We discuss a method for combining different but related longitudinal studies to improve predictive precision. The motivation is to borrow strength across clinical studies in which the same measurements are collected at different frequencies. Key features of the data are heterogeneous populations and an unbalanced design across three studies of interest. The first two studies are phase I studies with very detailed observations on a relatively small number of patients. The third study is a large phase III study with over 1500 enrolled patients, but with relatively few measurements on each patient. Patients receive different doses of several drugs in the studies, with the phase III study containing significantly less toxic treatments. Thus, the main challenges for the analysis are to accommodate heterogeneous population distributions and to formalize borrowing strength across the studies and across the various treatment levels. We describe a hierarchical extension over suitable semiparametric longitudinal data models to achieve the inferential goal. A nonparametric random-effects model accommodates the heterogeneity of the population of patients. A hierarchical extension allows borrowing strength across different studies and different levels of treatment by introducing dependence across these nonparametric random-effects distributions. Dependence is introduced by building an analysis of variance (ANOVA) like structure over the random-effects distributions for different studies and treatment combinations. Model structure and parameter interpretation are similar to standard ANOVA models. Instead of the unknown normal means as in standard ANOVA models, however, the basic objects of inference are random distributions, namely the unknown population distributions under each study. The analysis is based on a mixture of Dirichlet processes model as the underlying semiparametric model.

A Bayesian semiparametric model for bivariate sparse longitudinal data

Statistics in Medicine, 2013

Mixed effects models have recently become popular for analyzing sparse longitudinal data that arise naturally in biological, agricultural and bio-medical studies. Traditional approaches assume independent residuals over time and explain the longitudinal dependence by random effects. However, when bivariate or multivariate traits are measured longitudinally, this fundamental assumption is likely to be violated because of inter-trait dependence over time. We provide a more general framework where the dependence of the observations from the same subject over time are not assumed to be explained completely by the random effects of the model. We propose a novel, mixed model based approach and estimate the error-covariance structure nonparametrically under a generalized linear model framework. Penalized splines are used to model the general effect of time and we consider a Dirichlet process mixture of normal prior for the random effects distribution. We analyze blood pressure data from the Framingham Heart Study where body mass index, gender and time are treated as covariates. We compare our method with the traditional methods including parametric modeling of the random effects and independent residual errors over time. Extensive simulation studies are conducted to investigate the practical usefulness of the proposed method. The current approach is very helpful in analyzing bivariate irregular longitudinal traits.

A Semiparametric Bayesian Approach for Analyzing Longitudinal Data from Multiple Related Groups

Often the biological and/or clinical experiments result in longitudinal data from multiple related groups. The analysis of such data is quite challenging due to the fact that groups might have shared information on the mean and/or covariance functions. In this article, we consider a Bayesian semiparametric approach of modeling the mean trajectories for longitudinal response coming from multiple related groups. We consider matrix stick-breaking process priors on the group mean parameters which allows information sharing on the mean trajectories across the groups. Simulation studies are performed to demonstrate the effectiveness of the proposed approach compared to the more traditional approaches. We analyze data from a one-year follow-up of nutrition education for hypercholesterolemic children with three different treatments where the children are from different age-groups. Our analysis provides more clinically useful information than the previous analysis of the same dataset. The proposed approach will be a very powerful tool for analyzing data from clinical trials and other medical experiments.

Bayesian models for longitudinal data

2010

Longitudinal data, where data are repeatedly observed or measured on a temporal basis of time or age provides the foundation of the analysis of processes which evolve over time, and these can be referred to as growth or trajectory models. One of the traditional ways of looking at growth models is to employ either linear or polynomial functional forms to model trajectory shape, and account for variation around an overall mean trend with the inclusion of random effects or individual variation on the functional shape parameters. The identification of distinct subgroups or sub-classes (latent classes) within these trajectory models which are not based on some pre-existing individual classification provides an important methodology with substantive implications. The identification of subgroups or classes has a wide application in the medical arena where responder/non-responder identification based on distinctly differing trajectories delivers further information for clinical processes. This thesis develops Bayesian statistical models and techniques for the identification of subgroups in the analysis of longitudinal data where the number of time intervals is limited. These models are then applied to a single case study which investigates the neuropsychological cognition for early stage breast cancer patients undergoing adjuvant chemotherapy treatment from the Cognition in Breast Cancer Study undertaken by the Wesley Research Institute of Brisbane, Queensland. Alternative formulations to the linear or polynomial approach are taken which use piecewise linear models with a single turning point, change-point or knot at a known time point and latent basis models for the non-linear trajectories found for the verbal memory domain of cognitive function before and after chemotherapy treatment. Hierarchical Bayesian random effects models are used as a starting point for the latent class modelling process and are extended with the incorporation of covariates in the trajectory profiles and as predictors of class membership.

Bayesian regression analysis of data with random effects covariates from nonlinear longitudinal measurements

Journal of Multivariate Analysis, 2015

Joint models for a wide class of response variables and longitudinal measurements consist on a mixed-effects model to fit longitudinal trajectories whose random effects enter as covariates in a generalized linear model for the primary response. They provide a useful way to asses association between these two kinds of data, which in clinical studies are often collected jointly on a series of individuals and may help understanding, for instance, the mechanisms of recovery of a certain disease or the efficacy of a given therapy. The most common joint model in this framework is based on a linear mixed model for the longitudinal data. However, for complex datasets the linearity assumption may be too restrictive. Some works have considered generalizing this setting with the use of a nonlinear mixedeffects model for the longitudinal trajectories but the proposed estimation procedures based on likelihood approximations have been shown [De la Cruz et al., 2011] to exhibit some computational efficiency problems. In this article we propose an MCMC-based estimation procedure in the joint model with a nonlinear mixed-effects model for the longitudinal data and a generalized linear model for the primary response. Moreover, we consider that the errors in the longitudinal model may be correlated. We apply our method to the analysis of hormone levels measured at the early stages of pregnancy that can be used to predict normal versus abnormal pregnancy outcomes. We also conduct a simulation study to asses the importance of modelling correlated errors and quantify the consequences of model misspecification.

Semiparametric Bayesian classification with longitudinal markers

Journal of the Royal Statistical Society: Series C (Applied Statistics), 2007

We analyse data from a study involving 173 pregnant women.The data are observed values of the β human chorionic gonadotropin hormone measured during the first 80 days of gestational age, including from one up to six longitudinal responses for each woman. The main objective in this study is to predict normal versus abnormal pregnancy outcomes from data that are available at the early stages of pregnancy. We achieve the desired classification with a semiparametric hierarchical model. Specifically, we consider a Dirichlet process mixture prior for the distribution of the random effects in each group. The unknown random-effects distributions are allowed to vary across groups but are made dependent by using a design vector to select different features of a single underlying random probability measure. The resulting model is an extension of the dependent Dirichlet process model, with an additional probability model for group classification. The model is shown to perform better than an alternative model which is based on independent Dirichlet processes for the groups. Relevant posterior distributions are summarized by using Markov chain Monte Carlo methods.

Bayesian inferences on shape constrained hormone trajectories in the menstrual cycle

2003

In many biomedical applications, one can assume that the mean of an outcome variable increases monotonically with increases in a predictor to an unknown peak and decreases thereafter. To account for dependency in outcome measurements, one can apply a hierarchical model with random effects and autocorrelated errors. In the absence of shape constraints, Bayesian computation can proceed via a Gibbs sampling algorithm. Unfortunately, standard approaches for incorporating parameter constraints in Bayesian analyses cannot be used when the constraints are on higher level parameters in the hierarchy. To solve this problem, this article proposes a transformation approach in which samples from the unconstrained posterior density for the higher level parameters are transformed to the restricted space using a minimal distance projection. This approach is shown to suggest limited bias induced by the order constraint as well as a potential improvement in efficiency relative to unconstrained analyses and analyses that place constraints on the population parameters. The methods are illustrated through application to progesterone data from the literature.

Flexible Bayesian semiparametric mixed-effects model for skewed longitudinal data

BMC Medical research methodology, 2024

Background In clinical trials and epidemiological research, mixed-effects models are commonly used to examine population-level and subject-specific trajectories of biomarkers over time. Despite their increasing popularity and application, the specification of these models necessitates a great deal of care when analysing longitudinal data with non-linear patterns and asymmetry. Parametric (linear) mixed-effect models may not capture these complexities flexibly and adequately. Additionally, assuming a Gaussian distribution for random effects and/or model errors may be overly restrictive, as it lacks robustness against deviations from symmetry. Methods This paper presents a semiparametric mixed-effects model with flexible distributions for complex longitudinal data in the Bayesian paradigm. The non-linear time effect on the longitudinal response was modelled using a spline approach. The multivariate skew-t distribution, which is a more flexible distribution, is utilized to relax the normality assumptions associated with both random-effects and model errors. Results To assess the effectiveness of the proposed methods in various model settings, simulation studies were conducted. We then applied these models on chronic kidney disease (CKD) data and assessed the relationship between covariates and estimated glomerular filtration rate (eGFR). First, we compared the proposed semiparametric partially linear mixed-effect (SPPLM) model with the fully parametric one (FPLM), and the results indicated that the SPPLM model outperformed the FPLM model. We then further compared four different SPPLM models, each assuming different distributions for the random effects and model errors. The model with a skew-t distribution exhibited a superior fit to the CKD data compared to the Gaussian model. The findings from the application revealed that hypertension, diabetes, and follow-up time had a substantial association with kidney function, specifically leading to a decrease in GFR estimates. Conclusions The application and simulation studies have demonstrated that our work has made a significant contribution towards a more robust and adaptable methodology for modeling intricate longitudinal data. We achieved this by proposing a semiparametric Bayesian modeling approach with a spline smoothing function and a skew-t distribution.