Fitting Flexible Marginalized Models for Binary Correlated Outcomes (original) (raw)
The next two sections explain how different specifications of mTLV models can be fitted using the binaryMM
package. The data used are part of the package.
The Madras Longitudinal Schizophrenia Study
madras
contains a subset of the data from the Madras Longitudinal Schizophrenia Study Diggle et al. (2002), which collected monthly symptom data on 86 schizophrenia patients after their initial hospitalization. The dataframe has 922 observations on 86 patients and includes the variables:
though
. An indicator for thought disordersage
. An indicator for age-at-onset \(\geq\) 20 yearsgender
. An indicator for female gendermonth
. Months since hospitalizationid
. A unique patient identifiers
The primary question of interest is whether subjects with an older age-at-onset tend to recover more or less quickly, and whether female patients recover more or less quickly. Recovery is measured by a reduction in the presentation of symptoms.
data(madras)
str(madras)
#> 'data.frame': 922 obs. of 5 variables:
#> $ id : int 1 1 1 1 1 1 1 1 1 1 ...
#> $ thought: int 1 1 1 1 1 0 0 0 0 0 ...
#> $ month : int 0 1 2 3 4 5 6 7 8 9 ...
#> $ gender : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ age : num 1 1 1 1 1 1 1 1 1 1 ...
The marginal mean model is defined as:
\[logit(\mu_{ij}^m) = \beta_0 + \beta_1month_{ij} + \beta_2age_i + \beta_3gender_i + \beta_4 age_i \times month_{ij} + \beta_5 gender_i \times month_{ij}\]
Multiple dependence models are explored to demonstrate how themm
function can be used. The different dependence models are declared by changing the t.formula
andlv.formula
arguments. Note that by default formula both are initially assigned NULL
and if neither association models are specified, then an error is returned.
- Case 1. A dependence model with a transition term only:
\[logit(\mu_{ij}^c) = \Delta_{ij}(\boldsymbol{X}_i) + \gamma Y_{i(j-1)}\]
mod.mt <- mm(thought ~ month*gender + month*age, t.formula = ~1,
data = madras, id = id)
summary(mod.mt)
#>
#> Class:
#> MMLong
#>
#> Call:
#> mm(mean.formula = thought ~ month * gender + month * age, t.formula = ~1,
#> id = id, data = madras)
#>
#> Information Criterion:
#> AIC BIC logLik Deviance
#> 688.3789 705.5594 -337.1895 674.3789
#>
#> Marginal Mean Parameters:
#> Estimate Model SE Chi Square Pr(>Chi)
#> (Intercept) 1.183683 0.444318 7.0971 0.007721
#> month -0.342857 0.081841 17.5501 2.798e-05
#> gender -0.141884 0.416152 0.1162 0.733147
#> age -0.649770 0.449183 2.0925 0.148021
#> month:gender -0.143788 0.081853 3.0859 0.078975
#> month:age 0.111555 0.085896 1.6867 0.194040
#>
#> Dependence Model Parameters:
#> Estimate Model SE Chi Square Pr(>Chi)
#> gamma:(Intercept) 3.16583 0.23014 189.23 < 2.2e-16
#>
#> Number of clusters: 86
#> Maximum cluster size: 12
#> Convergence status (nlm code): 1
#> Number of iterations: 22
- Case 2. A dependence model with a latent term only:
\[logit(\mu_{ij}^c) = \Delta_{ij}(\boldsymbol{X}_i) + \sigma U_i\]
mod.mlv <- mm(thought ~ month*gender + month*age, lv.formula = ~1,
data = madras, id = id)
summary(mod.mlv)
#>
#> Class:
#> MMLong
#>
#> Call:
#> mm(mean.formula = thought ~ month * gender + month * age, lv.formula = ~1,
#> id = id, data = madras)
#>
#> Information Criterion:
#> AIC BIC logLik Deviance
#> 750.5767 767.7571 -368.2883 736.5767
#>
#> Marginal Mean Parameters:
#> Estimate Model SE Chi Square Pr(>Chi)
#> (Intercept) 1.569015 0.399400 15.4326 8.550e-05
#> month -0.399007 0.062845 40.3107 2.166e-10
#> gender -0.539932 0.355893 2.3016 0.12924
#> age -0.911165 0.393487 5.3621 0.02058
#> month:gender -0.081899 0.060179 1.8521 0.17354
#> month:age 0.140215 0.063107 4.9366 0.02629
#>
#> Dependence Model Parameters:
#> Estimate Model SE Chi Square Pr(>Chi)
#> log(sigma):(Intercept) 0.81289 0.12764 40.559 1.908e-10
#>
#> Number of clusters: 86
#> Maximum cluster size: 12
#> Convergence status (nlm code): 1
#> Number of iterations: 42
- Case 3. A dependence model with a transition and a latent term:
\[logit(\mu_{ij}^c) = \Delta_{ij}(\boldsymbol{X}_i) + \gamma Y_{i(j-1)} + \sigma U_i\]
mod.mtlv <- mm(thought ~ month*gender + month*age,
t.formula = ~1, lv.formula = ~1,
data = madras, id = id)
summary(mod.mtlv)
#>
#> Class:
#> MMLong
#>
#> Call:
#> mm(mean.formula = thought ~ month * gender + month * age, lv.formula = ~1,
#> t.formula = ~1, id = id, data = madras)
#>
#> Information Criterion:
#> AIC BIC logLik Deviance
#> 680.1283 699.7631 -332.0642 664.1283
#>
#> Marginal Mean Parameters:
#> Estimate Model SE Chi Square Pr(>Chi)
#> (Intercept) 1.327440 0.434588 9.3299 0.002255
#> month -0.367564 0.077443 22.5268 2.072e-06
#> gender -0.282497 0.402015 0.4938 0.482241
#> age -0.732176 0.436180 2.8177 0.093228
#> month:gender -0.111740 0.078306 2.0362 0.153591
#> month:age 0.117705 0.080870 2.1184 0.145536
#>
#> Dependence Model Parameters:
#> Estimate Model SE Chi Square Pr(>Chi)
#> gamma:(Intercept) 2.511119 0.303963 68.2486 <2e-16
#> log(sigma):(Intercept) 0.074494 0.244870 0.0925 0.761
#>
#> Number of clusters: 86
#> Maximum cluster size: 12
#> Convergence status (nlm code): 1
#> Number of iterations: 50
- Case 4. A dependence model with a transition term that is modified by gender.
\[logit(\mu_{ij}^c) = \Delta_{ij}(\boldsymbol{X}_i) + (\gamma_0 + \gamma_1 gender_i) Y_{i(j-1)}\]
mod.mtgender <- mm(thought ~ month*gender + month*age,
t.formula = ~gender, data = madras, id = id)
summary(mod.mtgender)
#>
#> Class:
#> MMLong
#>
#> Call:
#> mm(mean.formula = thought ~ month * gender + month * age, t.formula = ~gender,
#> id = id, data = madras)
#>
#> Information Criterion:
#> AIC BIC logLik Deviance
#> 690.2915 709.9263 -337.1458 674.2915
#>
#> Marginal Mean Parameters:
#> Estimate Model SE Chi Square Pr(>Chi)
#> (Intercept) 1.175157 0.444200 6.9990 0.008156
#> month -0.342431 0.081702 17.5662 2.775e-05
#> gender -0.137202 0.416448 0.1085 0.741809
#> age -0.635415 0.452122 1.9752 0.159901
#> month:gender -0.143961 0.082080 3.0763 0.079443
#> month:age 0.110312 0.085973 1.6464 0.199456
#>
#> Dependence Model Parameters:
#> Estimate Model SE Chi Square Pr(>Chi)
#> gamma:(Intercept) 3.11501 0.28599 118.634 <2e-16
#> gamma:gender 0.14321 0.48550 0.087 0.768
#>
#> Number of clusters: 86
#> Maximum cluster size: 12
#> Convergence status (nlm code): 1
#> Number of iterations: 34
- Case 5. A dependence model with a latent term that is modified by gender. Note that because \(\sigma\) is a positive quantity, to fit a mTLV model where the latent term is modified by gender, we need to specify two indicator variables: I0 for gender = 0, and I1 for gender = 1. The model to be specified in
lv.fomula
will take the form:~0+I0+I1
.
\[logit(\mu_{ij}^c) = \Delta_{ij}(\boldsymbol{X}_i) + [\sigma_0I(gender_i == 0) + \sigma_1I(gender_i == 1)]U_i\]
# set-up two new indicator variables for gender
madras$g0 <- ifelse(madras$gender == 0, 1, 0)
madras$g1 <- ifelse(madras$gender == 1, 1, 0)
mod.mlvgender <- mm(thought ~ month*gender + month*age,
lv.formula = ~0+g0+g1, data = madras, id = id)
summary(mod.mlvgender)
#>
#> Class:
#> MMLong
#>
#> Call:
#> mm(mean.formula = thought ~ month * gender + month * age, lv.formula = ~0 +
#> g0 + g1, id = id, data = madras)
#>
#> Information Criterion:
#> AIC BIC logLik Deviance
#> 752.4992 772.1340 -368.2496 736.4992
#>
#> Marginal Mean Parameters:
#> Estimate Model SE Chi Square Pr(>Chi)
#> (Intercept) 1.566764 0.400938 15.2705 9.316e-05
#> month -0.395507 0.064395 37.7227 8.155e-10
#> gender -0.515176 0.366086 1.9804 0.15935
#> age -0.921331 0.395394 5.4296 0.01980
#> month:gender -0.091526 0.069640 1.7273 0.18875
#> month:age 0.140228 0.063187 4.9251 0.02647
#>
#> Dependence Model Parameters:
#> Estimate Model SE Chi Square Pr(>Chi)
#> log(sigma):g0 0.84429 0.17128 24.297 8.257e-07
#> log(sigma):g1 0.77199 0.19523 15.636 7.678e-05
#>
#> Number of clusters: 86
#> Maximum cluster size: 12
#> Convergence status (nlm code): 1
#> Number of iterations: 50
The parameters from the marginal mean model have the same interpretation regardless of the dependence model used. Overall, older individuals tend to have slower recovery time than younger subjects, while females recover quicker than males.
Weighted Likelihood
The binaryMM
package allows user to add sampling weights and estimates the parameters of interest in those cases where the available sample might not be representative of the target population (i.e., survey data). This section shows how the sampling weights can be added in the mm
syntax using the datarand
dataframe.
The dataframe has 24,999 observation on 2,500 subjects and includes the variables:
id
. A unique patient identifierY
. A binary longitudinal outcometime
. A continuous time-varying covariate indicating time of each follow-upbinary
. A binary time-fixed covariate indicating whether a patient was assigned to a treatment arm (1) or a control arm (0)
data(datrand)
str(datrand)
#> 'data.frame': 24999 obs. of 4 variables:
#> $ id : int 1 1 1 1 1 1 1 1 1 2 ...
#> $ Y : int 0 0 1 1 0 0 0 0 0 0 ...
#> $ time : num 0 1 2 3 4 5 6 7 8 0 ...
#> $ binary: num 0 0 0 0 0 0 0 0 0 0 ...
From datarand
a biased sampled can be created by assuming that complete data are available only for 1) every one who experienced the event Y
at least once, and 2) 20% of the subjects who never experienced the event Y
.
# create the sampling scheme
Ymean <- tapply(datrand$Y, FUN = mean, INDEX = datrand$id)
some.id <- names(Ymean[Ymean != 0])
none.id <- names(Ymean)[!(names(Ymean) %in% some.id)]
samp.some <- some.id[rbinom(length(none.id), 1, 1) == 1]
samp.none <- none.id[rbinom(length(none.id), 1, 0.20) == 1]
# sample subjects and create a weight vector
datrand$sampled <- ifelse(datrand$id %in% c(samp.none, samp.some), 1, 0)
dat.small <- subset(datrand, sampled == 1)
wt <- ifelse(dat.small$id %in% samp.none, 1/1, 1/0.2)
# fit the mTLV model
mod.wt <- mm(Y ~ time*binary, t.formula = ~1, data = dat.small,
id = id, weight = wt)
summary(mod.wt)
#> Warning in summary.MMLong(mod.wt): When performing a weighted likelihood
#> analysis (by specifying the weight argument), robust standard errors are
#> reported. Model based standard errors will not be correct and should not be
#> used.
#>
#> Class:
#> MMLong
#>
#> Call:
#> mm(mean.formula = Y ~ time * binary, t.formula = ~1, id = id,
#> data = dat.small, weight = wt)
#>
#> Information Criterion:
#> AIC BIC logLik Deviance
#> 74768.35 74795.57 -37379.17 74758.35
#>
#> Marginal Mean Parameters:
#> Estimate Robust SE Chi Square Pr(>Chi)
#> (Intercept) -1.014456 0.045303 501.435 < 2.2e-16
#> time -0.161408 0.010246 248.177 < 2.2e-16
#> binary 0.320099 0.072732 19.369 1.077e-05
#> time:binary 0.158889 0.013982 129.130 < 2.2e-16
#>
#> Dependence Model Parameters:
#> Estimate Robust SE Chi Square Pr(>Chi)
#> gamma:(Intercept) 1.043764 0.042677 598.17 < 2.2e-16
#>
#> Number of clusters: 1712
#> Maximum cluster size: 15
#> Convergence status (nlm code): 1
#> Number of iterations: 27
Note that when the weight
argument is specified, model-based standard error will not be correct and should not be reported. Thus, the software will return robust standard errors only together with a warning message.