A Novel Method for Identifying a Parsimonious and Accurate Predictive Model for Multiple Clinical Outcomes - PubMed (original) (raw)

A Novel Method for Identifying a Parsimonious and Accurate Predictive Model for Multiple Clinical Outcomes

L Grisell Diaz-Ramirez et al. Comput Methods Programs Biomed. 2021 Jun.

Abstract

Background and objective: Most methods for developing clinical prognostic models focus on identifying parsimonious and accurate models to predict a single outcome; however, patients and providers often want to predict multiple outcomes simultaneously. As an example, for older adults one is often interested in predicting nursing home admission as well as mortality. We propose and evaluate a novel predictor-selection computing method for multiple outcomes and provide the code for its implementation.

Methods: Our proposed algorithm selected the best subset of common predictors based on the minimum average normalized Bayesian Information Criterion (BIC) across outcomes: the Best Average BIC (baBIC) method. We compared the predictive accuracy (Harrell's C-statistic) and parsimony (number of predictors) of the model obtained using the baBIC method with: 1) a subset of common predictors obtained from the union of optimal models for each outcome (Union method), 2) a subset obtained from the intersection of optimal models for each outcome (Intersection method), and 3) a model with no variable selection (Full method). We used a case-study data from the Health and Retirement Study (HRS) to demonstrate our method and conducted a simulation study to investigate performance.

Results: In the case-study data and simulations, the average Harrell's C-statistics across outcomes of the models obtained with the baBIC and Union methods were comparable. Despite the similar discrimination, the baBIC method produced more parsimonious models than the Union method. In contrast, the models selected with the Intersection method were the most parsimonious, but with worst predictive accuracy, and the opposite was true in the Full method. In the simulations, the baBIC method performed well by identifying many of the predictors selected in the baBIC model of the case-study data most of the time and excluding those not selected in the majority of the simulations.

Conclusions: Our method identified a common subset of variables to predict multiple clinical outcomes with superior balance between parsimony and predictive accuracy to current methods.

Keywords: Bayesian Information Criterion; backward elimination; prognostic models; survival analysis; variable selection.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors declare that they have no competing interests.

Figures

Fig. 1.

Overview of Algorithm for the Selection of Subset of (p-1) Predictors with Minimum Average Normalized BIC across 4 Outcomes. ADL: time to first Activities of Daily Living (ADL) dependence. DEATH: time to death. IADL: time to first Instrumental Activities of Daily Living (IADL) difficulty. nBIC: normalized Bayesian Information Criterion. p: number of predictors. Subset1, Subset2, Subset37: combination of predictors obtained by removing 1 predictor at a time. *Subset37: Maximum number of subsets of predictors fitted in the first step of backward elimination. In the first step, the full model has 39 predictors, and there are 2 predictors that are forced into all models. Thus, the maximum number of subsets fitted by removing one predictor at a time is 37 since the initial pool contains 37 available predictors. WALK: time to first mobility dependence.

Fig. 2.

Overview of Algorithm for the Selection of Final Subset of Predictors with Minimum Average Normalized BIC across 4 outcomes. nBIC: normalized Bayesian Information Criterion. p: number of predictors.

Fig. 3.

Subsets of Predictors selected with Individual Outcome, Union, baBIC, and Intersection Methods using the Case-study Data. ADL: time to first Activities of Daily Living (ADL) dependence. baBIC Method: best Average BIC method, selects best subset of predictors based on the minimum average normalized BIC across the 4 outcomes. BIC: Bayesian Information Criterion. COGDLRC3G: number of words from 10-word list recalled correctly after 5 minutes. dAGE: age deciles groups. DEATH: time to death. DIABETES: whether has diabetes with and without medicine. DRIVE: whether able to drive. EDU: education 12+ years. EXERCISE: exercise frequency. FEMALE: whether female. HEARAID: whether wears hearing aid. HEARTFAILURE: whether has heart failure or others heart problems (e.g. angina, heart attack, heart disease). HYPERTENSION: whether has hypertension. IADL: time to first Instrumental Activities of Daily Living (IADL) difficulty. INCOTINENCE: whether has incontinence. Individual Outcome Method: selects final subset of predictors based on the minimum BIC for each individual outcome. Intersection Method: selects final subset of predictors that were in all 4 final subsets based on the minimum BIC for each individual outcome. LUNG: chronic lung disease. MSTAT: marital status. OTHERARM: having difficulty reaching above shoulder. OTHERCLIM3G: having difficulty climbing stairs. OTHERLIFT: having difficulty with lifting weights over 10 pounds. OTHERPUSH: having difficulty with pushing large objects. OTHERSIT: having difficulty with sitting for 2 hours. OTHERSTOOP: having difficulty with stooping, kneeling, or crouching. OTHERWALK: having difficulty with walking one block or in the room. p: number of predictors. qBMI: quintile groups. SMOKING: whether smokes. Union Method: selects final subset of all the predictors that were in at least 1 of the 4 final subsets based on the minimum BIC for each individual outcome. VOLUNTEER: whether helps as volunteer. WALK: time to first mobility dependence.

Fig. 4.

Selection with the baBIC Method and Individual Outcome Methods in the Case-study Data. ADL: time to first Activities of Daily Living (ADL) dependence. baBIC Method: best Average BIC method, selects best subset of predictors based on the minimum average normalized BIC across the 4 outcomes. BIC: Bayesian Information Criterion. DEATH: time to death. IADL: time to first Instrumental Activities of Daily Living (IADL) difficulty. Individual Outcome Method: selects final subset of predictors based on the minimum BIC for each individual outcome. nBIC: normalized Bayesian Information Criterion. WALK: time to first mobility dependence.

Fig. 5.

Predicted Cumulative Incidence by Outcome at the Mean of the Predictors Selected with the baBIC Method in the Case-study Data using simulations (lighter color) of Scenario 1 with Case-study Levels of Censoring (darker color: mean of simulations). ADL: time to first Activities of Daily Living (ADL) dependence. Case-study levels of censoring: ADL= 66.55%, IADL= 64.98%, WALK=81.90%, DEATH=31.87%. DEATH: time to death. IADL: time to first Instrumental Activities of Daily Living (IADL) difficulty. Scenario 1: simulated data generated using 15 non-zero coefficients corresponding to the common subset of predictors obtained with the baBIC method in the case-study data. WALK: time to first mobility dependence.

Fig. 6.

Predicted Cumulative Incidence by Outcome at the Mean of the Predictors Selected with the baBIC Method in the Case-study Data using simulations (lighter color) of Scenario 1 with 25% Censoring (darker color: mean of simulations). ADL: time to first Activities of Daily Living (ADL) dependence. DEATH: time to death. IADL: time to first Instrumental Activities of Daily Living (IADL) difficulty. Scenario 1: simulated data generated using 15 non-zero coefficients corresponding to the common subset of predictors obtained with the baBIC method in the case-study data. WALK: time to first mobility dependence.

Fig. 7.

Comparison of Number of Predictors Selected (mean, 2.5th -97.5th percentiles) Across Case-study Bootstrap Data and Simulations with Case-study Levels of Censoring and 25% Censoring. ADL: time to first Activities of Daily Living (ADL) dependence. baBIC Method: best Average BIC method, selects best subset of predictors based on the minimum average normalized BIC across the 4 outcomes. Case-study levels of censoring: ADL= 66.55%, IADL= 64.98%, WALK=81.90%, DEATH=31.87%. DEATH: time to death. BIC: Bayesian Information Criterion. DEATH: time to death. Full Method: includes all 39 candidate predictors of the case-study data. IADL: time to first Instrumental Activities of Daily Living (IADL) difficulty. Individual Outcome Method: selects final subset of predictors based on the minimum BIC for each individual outcome. Intersection Method: selects final subset of predictors that were in all 4 final subsets based on the minimum BIC for each individual outcome. Scenario 1: simulated data generated using 15 non-zero coefficients corresponding to the common subset of predictors obtained with the baBIC method in the case-study data. Scenario 2: simulated data generated using the outcome specific non-zero coefficients corresponding to those obtained with the Individual Outcome method in the case-study data. Scenario 3: simulated data generated using non-zero coefficients for all 39 candidate predictors using estimates from the case-study data. Union Method: selects final subset of all the predictors that were in at least 1 of the 4 final subsets based on the minimum BIC for each individual outcome. WALK: time to first mobility dependence.

Fig. 8.

Comparison of Mean Harrell’s C-statistic Across Case-study Bootstrap Data and Simulations with Case-study Levels of Censoring and 25% Censoring. ADL: time to first Activities of Daily Living (ADL) dependence. baBIC Method: best Average BIC method, selects best subset of predictors based on the minimum average normalized BIC across the 4 outcomes. BIC: Bayesian Information Criterion. Case-study levels of censoring: ADL= 66.55%, IADL= 64.98%, WALK=81.90%, DEATH=31.87%. DEATH: time to death. Full Method: includes all 39 candidate predictors of the case-study data. IADL: time to first Instrumental Activities of Daily Living (IADL) difficulty. Individual Outcome Method: selects final subset of predictors based on the minimum BIC for each individual outcome. Intersection Method: selects final subset of predictors that were in all 4 final subsets based on the minimum BIC for each individual outcome. Scenario 1: simulated data generated using 15 non-zero coefficients corresponding to the common subset of predictors obtained with the baBIC method in the case-study data. Scenario 2: simulated data generated using the outcome specific non-zero coefficients corresponding to those obtained with the Individual Outcome method in the case-study data. Scenario 3: simulated data generated using non-zero coefficients for all 39 candidate predictors using estimates from the case-study data. Union Method: selects final subset of all the predictors that were in at least 1 of the 4 final subsets based on the minimum BIC for each individual outcome. WALK: time to first mobility dependence.

References

1. Akaike H, Petrov BN, Csaki F, Information theory and an extension of the maximum likelihood principle, in: Second international symposium on information theory, Budapest, Hungary, Akadémiai Kiado, 1973, pp. 267–281. 10.1007/978-1-4612-1694-0_15. -DOI
1. Schwarz G, Estimating the dimension of a model, Ann Statist 6 (1978) 461–464 10.1214/aos/1176344136. -DOI
1. Steinhauser KE, Christakis NA, Clipp EC, McNeilly M, McIntyre L, Tulsky JA, Factors considered important at the end of life by patients, family, physicians, and other care providers, JAMA 284 (2000) 2476–2482, doi: 10.1001/jama.284.19.2476. -DOI -PubMed
1. Fried TR, Bradley EH, Towle VR, Phil M, Allore H, Understanding the treatment preferences of seriously ill patients, N Engl J Med 346 (2002) 1061–1066, doi: 10.1056/NEJMsa012528. -DOI -PubMed
1. Singer DE, Chang Y, Fang MC, et al. , The net clinical benefit of warfarin anticoagulation in atrial fibrillation, Ann Intern Med 151 (2009) 297–305, doi: 10.7326/0003-4819-151-5-200909010-00003. -DOI -PMC -PubMed

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

A Novel Method for Identifying a Parsimonious and Accurate Predictive Model for Multiple Clinical Outcomes - PubMed (original) (raw)