A Note on the Use of Unbiased Estimating Equations to Estimate Correlation in Analysis of Longitudinal Trials (original) (raw)
Related papers
2006
It is well-known that the correlation among binary outcomes is constrained by the marginal means, yet approaches such as generalized estimating equations (GEE) do not check that the constraints for the correlations are satisfied. We explore this issue for Markovian dependence in the context of a GEE analysis of a clinical trial that compares Venlafaxine with Lithium in the prevention of major depressive episode. We obtain simplified expressions for the constraints for the logistic model and the equicorrelated and first-order autoregressive correlation structures. We then obtain the limiting values of the GEE and quasi-least squares (QLS) estimates of the correlation parameter when the working structure has been misspecified and prove that misidentification can lead to a severe violation of bounds. As a result, we suggest that violation of bounds can provide additional evidence in ruling out application of a particular working correlation structure. For a structure that is otherwise plausible and results in only a minor violation, we propose an iterative algorithm that yields an estimate that satifies the constraints. We compare our algorithm with two other approaches for estimation of the correlation that have been proposed to avoid a violation of bounds and demonstrate that it estimates the correlation parameter and bivariate probabilities with smaller mean square error and bias, especially when the correlation is large.
Gaussian estimation and joint modeling of dispersions and correlations in longitudinal data
Computer Methods and Programs in Biomedicine, 2006
Extended generalized estimating equations Link functions Model formulation Quasi-least squares Quasilikelihood functions Unconstrained parameterization Working correlation matrix a b s t r a c t Analysis of longitudinal, spatial and epidemiological data often requires modelling dispersions and dependence among the measurements. Moreover, data involving counts or proportions usually exhibit greater variation than would be predicted by the Poisson and binomial models. We propose a strategy for the joint modelling of mean, dispersion and correlation matrix of nonnormal multivariate correlated data. The parameter estimation for dispersions and correlations is based on the Whittle's [P. Whittle, Gaussian estimation in stationary time series, Bull Inst. Statist. Inst. 39 (1962) 105-129.] Gaussian likelihood of the partially standardized data which eliminates the mean parameters. The model formulation for the dispersions and correlations relies on a recent unconstrained parameterization of covariance matrices and a graphical method [M. Pourahmadi, Joint mean-covariance models with applications to longitudinal data: unconstrained parameterization, Biometrika 86 (1999) 677-690] similar to the correlogram in time series analysis. We show that the estimating equations for the regression and dependence parameters derived from a modified Gaussian likelihood (involving two distinct covariance matrices) are broad enough to include generalized estimating equations and its many recent extensions and improvements. The results are illustrated using two datasets. (M. Pourahmadi).
Biometrics, 2010
Ye et al. (2008) proposed a joint model for longitudinal measurements and time-to-event data in which the longitudinal measurements are modeled with a semiparametric mixed model to allow for the complex patterns in longitudinal biomarker data. They proposed a two-stage regression calibration approach which is simpler to implement than a joint modeling approach. In the first stage of their approach, the mixed model is fit without regard to the time-to-event data. In the second stage, the posterior expectation of an individual's random effects from the mixed-model are included as covariates in a Cox model. Although Ye et al. (2008) acknowledged that their regression calibration approach may cause bias due to the problem of informative dropout and measurement error, they argued that the bias is small relative to alternative methods. In this article, we show that this bias may be substantial. We show how to alleviate much of this bias with an alternative regression calibration approach which can be applied for both discrete and continuous time-to-event data. Through simulations, the proposed approach is shown to have substantially less bias than the regression calibration approach proposed by Ye et al. (2008). In agreement with the methodology proposed by Ye et al., an advantage of our proposed approach over joint modeling is that it can be implemented with standard statistical software and does not require complex estimation techniques.
Marginal Correlation in Longitudinal Binary Data Based on Generalized Linear Mixed Models
Communications in Statistics - Theory and Methods, 2010
This work aims at investigating marginal correlation within and between longitudinal data sequences. Useful and intuitive approximate expressions are derived based on generalized linear mixed models. Data from four double-blind randomized clinical trials are used to estimate the intra-class coefficient of reliability for a binary response. Additionally, the correlation between such a binary response and a continuous response is derived to evaluate the criterion validity of the binary response variable and the established continuous response variable.
Selection of Working Correlation Structure and Best Model in GEE Analyses of Longitudinal Data
Communications in Statistics, Simulation and Computation, 2007
The Generalized Estimating Equations (GEE) method is one of the most commonly used statistical methods for the analysis of longitudinal data in epidemiological studies. A working correlation structure for the repeated measures of the outcome variable of a subject needs to be specified by this method. However, statistical criteria for selecting the best correlation structure and the best subset of explanatory variables in GEE are only available recently because the GEE method is developed on the basis of quasi-likelihood theory. Maximum likelihood based model selection methods, such as the widely used Akaike Information Criterion (AIC), are not applicable to GEE directly. Pan (2001) proposed a selection method called QIC which can be used to select the best correlation structure and the best subset of explanatory variables. Based on the QIC method, we developed a computing program to calculate the QIC value for a range of different distributions, link functions and correlation structures. This program was written in Stata software. In this article, we introduce this program and demonstrate how to use it to select the most parsimonious model in GEE analyses of longitudinal data through several representative examples. Communications in Statistics, Simulation and Computation 2007; 36(5):987-996.
Communications in Statistics - Simulation and Computation, 2013
The well-known generalized estimating equations is a very popular approach for analyzing longitudinal data. Selecting an appropriate correlation structure in the generalized estimating equations framework is a key step for estimating parameters efficiently and deriving reliable statistical inferences. We present two new criteria for selecting the best among the candidates with any arbitrary structures, even for irregularly timed measurements. The simulation results demonstrate that the new criteria perform more similarly to EAIC and EBIC as the sample size becomes large. However, their performance is much enhanced when the sample size is small and the number of measurements is large. Finally, three real datasets are used to illustrate the proposed criteria.
Statistical Methods in Medical Research, 2015
Different types of outcomes (e.g. binary, count, continuous) can be simultaneously modeled with multivariate generalized linear mixed models by assuming: (1) same or different link functions, (2) same or different conditional distributions, and (3) conditional independence given random subject effects. Others have used this approach for determining simple associations between subject-specific parameters (e.g. correlations between slopes). We demonstrate how more complex associations (e.g. partial regression coefficients between slopes adjusting for intercepts, time lags of maximum correlation) can be estimated. Reparameterizing the model to directly estimate coefficients allows us to compare standard errors based on the inverse of the Hessian matrix with more usual standard errors approximated by the delta method; a mathematical proof demonstrates their equivalence when the gradient vector approaches zero. Reparameterization also allows us to evaluate significance of coefficients with likelihood ratio tests and to compare this approach with more usual Wald-type t-tests and Fisher's z transformations. Simulations indicate that the delta method and inverse Hessian standard errors are nearly equivalent and consistently overestimate the true standard error. Only the likelihood ratio test based on the reparameterized model has an acceptable type I error rate and is therefore recommended for testing associations between stochastic parameters. Online supplementary materials include our medical data example, annotated code, and simulation details.
Efficient parameter estimation in longitudinal data analysis using a hybrid GEE method
Biostatistics, 2009
The method of generalized estimating equations (GEEs) provides consistent estimates of the regression parameters in a marginal regression model for longitudinal data, even when the working correlation model is misspecified (Liang and Zeger, 1986). However, the efficiency of a GEE estimate can be seriously affected by the choice of the working correlation model. This study addresses this problem by proposing a hybrid method that combines multiple GEEs based on different working correlation models, using the empirical likelihood method (Qin and Lawless, 1994). Analyses show that this hybrid method is more efficient than a GEE using a misspecified working correlation model. Furthermore, if one of the working correlation structures correctly models the within-subject correlations, then this hybrid method provides the most efficient parameter estimates. In simulations, the hybrid method's finite-sample performance is superior to a GEE under any of the commonly used working correlation models and is almost fully efficient in all scenarios studied. The hybrid method is illustrated using data from a longitudinal study of the respiratory infection rates in 275 Indonesian children.
Statistical Analysis of Correlated Data Using Generalized Estimating Equations: An Orientation
American Journal of Epidemiology, 2003
The method of generalized estimating equations (GEE) is often used to analyze longitudinal and other correlated response data, particularly if responses are binary. However, few descriptions of the method are accessible to epidemiologists. In this paper, the authors use small worked examples and one real data set, involving both binary and quantitative response data, to help end-users appreciate the essence of the method. The examples are simple enough to see the behind-the-scenes calculations and the essential role of weighted observations, and they allow nonstatisticians to imagine the calculations involved when the GEE method is applied to more complex multivariate data. FIGURE 6. Estimates of (part a) mean height µ (measured as the number of standard deviations above US norms) and (part b) the proportion P of short children calculated using data from households with a socioeconomic status index of 5 or lower (see ). μ by guest on June 3, 2013 http://aje.oxfordjournals.org/ Downloaded from FIGURE 7. Comparison of (part a) estimated mean height µ (measured as the number of standard deviations above US norms) and (part b) the proportion P of short children among children of lower and higher socioeconomic status.
Longitudinal data analysis using generalized linear models
Biometrika, 1986
This paper proposes an extension of generalized linear models to the analysis of longitudinal data. We introduce a class of estimating equations that give consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence. The estimating equations are derived without specifying the joint distribution of a subject's observations yet they reduce to the score equations for multivariate Gaussian outcomes. Asymptotic theory is presented for the general class of estimators. Specific cases in which we assume independence, m-dependence and exchangeable correlation structures from each subject are discussed. Efficiency of the proposed estimators in two simple situations is considered. The approach is closely related to quasi-likelihood.