Applying linear mixed models to estimate reliability in clinical trial data with repeated measurements (original) (raw)

Estimating the reliability of repeatedly measured endpoints based on linear mixed-effects models. A tutorial

Pharmaceutical Statistics, 2016

There are various settings in which researchers are interested in the assessment of the correlation between repeated measurements that are taken within the same subject (i.e., reliability). For example, the same rating scale may be used to assess the symptom severity of the same patients by multiple physicians, or the same outcome may be measured repeatedly over time in the same patients. Reliability can be estimated in various ways, e.g., using the classical Pearson correlation or the intra-class correlation in clustered data. However, contemporary data often have a complex structure that goes well beyond the restrictive assumptions that are needed with the more conventional methods to estimate reliability. In the current paper, we propose a general and exible modeling approach that allows for the derivation of reliability estimates, standard errors, and condence intervals appropriately taking hierarchies and covariates in the data into account. Our methodology is developed for continuous outcomes together with covariates of an arbitrary type. The methodology is illustrated in a case study, and a Web Appendix is provided which details the computations using the R package CorrMixed and the SAS software.

Generalized reliability estimation using repeated measurements

British Journal of Mathematical and Statistical Psychology, 2006

Reliability can be studied in a generalized way using repeated measurements. Linear mixed models are used to derive generalized test-retest reliability measures. The method allows for repeated measures with a different mean structure due to correction for covariate effects. Furthermore, different variance-covariance structures between measurements can be implemented. When the variance structure reduces to a random intercept (compound symmetry), classical methods are recovered. With more complex variance structures (e.g. including random slopes of time and/or serial correlation), time-dependent reliability functions are obtained. The effect of time lag between measurements on reliability estimates can be evaluated. The methodology is applied to a psychiatric scale for schizophrenia.

A Measure for the Reliability of a Rating Scale Based on Longitudinal Clinical Trial Data

Psychometrika, 2007

A new measure for reliability of a rating scale is introduced, based on the classical definition of reliability, as the ratio of the true score variance and the total variance. Clinical trial data can be employed to estimate the reliability of the scale in use, whenever repeated measurements are taken. The reliability is estimated from the covariance parameters obtained from a linear mixed model. The method provides a single number to express the reliability of the scale, but allows for the study of the reliability’s time evolution. The method is illustrated using a case study in schizophrenia.

Generalizability in NonGaussian Longitudinal Clinical Trial Data Based on Generalized Linear Mixed Models

Journal of Biopharmaceutical Statistics, 2008

This work investigates how generalizability, an extension of reliability, can be defined and estimated based on longitudinal data sequences resulting from, for example, clinical studies. Useful and intuitive approximate expressions are derived based on generalized linear mixed models. Data from four double-blind randomized clinical trials in schizophrenia motivate the research and are used to estimate generalizability for a binary response parameter.

Using longitudinal data from a clinical trial in depression to assess the reliability of its outcome scales

Journal of Psychiatric Research, 2009

Longitudinal studies are permeating clinical trials in psychiatry. Additionally, in the same field, rating scales are frequently used to evaluate the status of the patients and the efficacy of new therapeutic procedures. Therefore, it is of utmost importance to study the psychometric properties of these instruments within a longitudinal framework. In the area of depression, the Hamilton Depression Rating Scale (HAMD) is regularly used for antidepressant treatment evaluation. However, the use of HAMD has not been exempted from criticism what has lead to the development of new scales that are expected to be more sensitive for change, such as the Montgomery-Åsberg Depression Rating Scale (MADRS). In general, the reliability of these scales has been extensively studied by using classical methods for reliability estimation, developed for specifically designed reliability studies. Unfortunately, the settings customarily considered in these reliability studies are usually far from the practical conditions in which these scales are applied in clinical trials and practice. In the present paper we assess the reliability of these instruments in a more realistic scenario thereby using longitudinal data coming from clinical studies. Nowadays, newly developed methodology based on an extended concept of reliability, allow us to use longitudinal data for reliability estimation. This new approach not only enables to avoid bias by offering a better control of disturbing factors but it also produces more precise estimates by taken advantage of the large samples sizes available in clinical trials. Further, it offers practical guidelines for an optimal use of a rating scale in order to achieve a particular level of reliability. The merits of this new approach are illustrated by applying it on two clinical trials in depression to assess the reliability of the three outcome scales, HAMD, MADRS, and the Hamilton Anxiety Rating scale (HAMA).

Investigation of mixed model repeated measures analyses and non-linear random coefficient models in the context of long-term efficacy data

Pharmaceutical statistics, 2018

The longitudinal data from 2 published clinical trials in adult subjects with upper limb spasticity (a randomized placebo-controlled study [NCT01313299] and its long-term open-label extension [NCT01313312]) were combined. Their study designs involved repeat intramuscular injections of abobotulinumtoxinA (Dysport®), and efficacy endpoints were collected accordingly. With the objective of characterizing the pattern of response across cycles, Mixed Model Repeated Measures analyses and Non-Linear Random Coefficient (NLRC) analyses were performed and their results compared. The Mixed Model Repeated Measures analyses, commonly used in the context of repeated measures with missing dependent data, did not involve any parametric shape for the curve of changes over time. Based on clinical expectations, the NLRC included a negative exponential function of the number of treatment cycles, with its asymptote and rate included as random coefficients in the model. Our analysis focused on 2 specific...

A family of measures to evaluate scale reliability in a longitudinal setting

Journal of The Royal Statistical Society Series A-statistics in Society, 2009

The concept of reliability denotes one of the most important psychometric properties of a measurement scale. Reliability refers to the capacity of the scale to discriminate between subjects in a given population. In classical test theory, it is often estimated using the intraclass correlation coefficient based on two replicate measurements. However, the modelling framework used in this theory is often too narrow when applied in practical situations. Generalizability theory has extended reliability theory to a much broader a framework, but is confronted with some limitations when applied in a longitudinal setting. In this paper, we explore how the definition of reliability can be generalized to a setting where subjects are measured repeatedly over time. Based on four defining properties for the concept of reliability, we propose a family of reliability measures, which circumscribes the area in which reliability measures should be sought for. It is shown how different members assess different aspects of the problem and that the reliability of the instrument can depend on the way it is used. The methodology is motivated by and illustrated on data from a clinical study on schizophrenia. Based on this study, we estimate and compare the reliabilities of two different rating scales to evaluate the severity of the disorder.

Marginal Correlation in Longitudinal Binary Data Based on Generalized Linear Mixed Models

Communications in Statistics - Theory and Methods, 2010

This work aims at investigating marginal correlation within and between longitudinal data sequences. Useful and intuitive approximate expressions are derived based on generalized linear mixed models. Data from four double-blind randomized clinical trials are used to estimate the intra-class coefficient of reliability for a binary response. Additionally, the correlation between such a binary response and a continuous response is derived to evaluate the criterion validity of the binary response variable and the established continuous response variable.

The Estimation of Reliability in Longitudinal Models

International Journal of Behavioral Development, 1998

Despite the increasing attention devoted to the study and analysis of longitudinal data, relatively little consideration has been directed toward understanding the issues of reliability and measurement error. Perhaps one reason for this neglect has been that traditional methods of estimation (e.g. generalisability theory) require assumptions that are often not tenable in longitudinal designs. This paper first examines applications of generalisability theory to the estimation of m easurement error and reliability in longitudinal research, and notes how factors such as missing data, correlated errors, and true score instability prohibit traditional variance com ponent estimation. Next, we discuss how estimation methods using restricted maximum likelihood can account for these factors, thereby providing m any advantages over traditional estimation methods. Finally, we provide a substantive exam ple illustrating these advantages, and include brief discussions of programming and software...