Kenney Ng - Academia.edu (original) (raw)
Papers by Kenney Ng
medRxiv (Cold Spring Harbor Laboratory), Jan 15, 2022
doi: medRxiv preprint NOTE: This preprint reports new research that has not been certified by pee... more doi: medRxiv preprint NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.
IBM journal of research and development, 2018
The volume and variety of health-related data continues to grow, spurred by the increasing adopti... more The volume and variety of health-related data continues to grow, spurred by the increasing adoption and use of electronic health records (EHRs), the explosion of "omics" data, and the proliferation of user-generated health data. User-generated data spans a wide spectrum and includes data related to activity, diet, exercise, sleep, symptoms, treatments, and outcomes that are collected by the patient outside of clinical settings. Examples include data generated from wearables, mobile applications, sensors, online surveys, and social media. These data offer unprecedented opportunities for the analysis of complex multidimensional interactions at the user level that can enable broader insights about the individual. In particular, these data allow the capturing of events that extend beyond the clinical or institutional setting and experiences that are not filtered through the lens of the healthcare provider or payer. At the same time, using these data for research purposes also brings challenges. Since such data are typically created at the discretion of the individual, it is prone to inconsistencies and other quality issues that are very difficult to regulate. Additionally, as with EHRs, provenance, privacy, security, and linking of disparate data sets continue to be a challenge. Beyond the data challenges, understanding how insights generated from the data can be made useful for key stakeholders (e.g., payers, providers, and patients) is essential for creating actual value from the data. This issue of the IBM Journal of Research and Development emphasizes new methods, models, capabilities, and technologies that focus on the collection, processing, privacy, curation, analysis, interpretation, and use of insights from user-generated health data. The first paper of this issue, by Bocu and Costache, concerns a homomorphic encryption-based system for securely managing personal health metrics data. The authors note that hardware and software solutions for the collection of personal health information continue to evolve. The reliable gathering of personal health information, previously usually possible only in dedicated medical settings, has recently become possible through wearable specialized medical devices. Among other drawbacks, these devices usually do not store the data locally and offer, at most, limited basic data processing features and few advanced processing capabilities for the collected personal health data. This paper describes an integrated personal health information system, which allows secure storage and processing of medical data in the cloud by using a comprehensive homomorphic encryption model to preserve the data privacy. The system collects the user data through a client application module, which is usually installed on the user's smartphone or smartwatch,
medRxiv (Cold Spring Harbor Laboratory), Nov 9, 2021
doi: medRxiv preprint NOTE: This preprint reports new research that has not been certified by pee... more doi: medRxiv preprint NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.
Nature Genetics, Jun 1, 2022
Take-down policy If you believe that this document breaches copyright please contact us providing... more Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.
Nature Communications, Jun 30, 2022
For any given level of overall adiposity, individuals vary considerably in fat distribution. The ... more For any given level of overall adiposity, individuals vary considerably in fat distribution. The inherited basis of fat distribution in the general population is not fully understood. Here, we study up to 38,965 UK Biobank participants with MRI-derived visceral (VAT), abdominal subcutaneous (ASAT), and gluteofemoral (GFAT) adipose tissue volumes. Because these fat depot volumes are highly correlated with BMI, we additionally study six local adiposity traits: VAT adjusted for BMI and height (VATadj), ASATadj, GFATadj, VAT/ASAT, VAT/GFAT, and ASAT/GFAT. We identify 250 independent common variants (39 newly-identified) associated with at least one trait, with many associations more pronounced in female participants. Rare variant association studies extend prior evidence for PDE3B as an important modulator of fat distribution. Local adiposity traits (1) highlight depot-specific genetic architecture and (2) enable construction of depot-specific polygenic scores that have divergent associations with type 2 diabetes and coronary artery disease. These resultsusing MRI-derived, BMI-independent measures of local adiposityconfirm fat distribution as a highly heritable trait with important implications for cardiometabolic health outcomes.
arXiv (Cornell University), Sep 13, 2022
Subgroup builder shows the distribution of patients by variables that can be used to create subgr... more Subgroup builder shows the distribution of patients by variables that can be used to create subgroups; (B) Risk score distribution displays the summary of risk scores and the estimated disease onset rate; (C) Subgroup summary presents the model performance (top) and the fairness of subgroups (bottom); (D) Model behavior explanations provide feature contributions to the risk scores; (E) Feature distributions show the mean and confidence intervals of a selected subgroup across variables.
Diabetes, Jun 1, 2022
Our previous data-driven analysis from five large-scale prospective studies discovered three traj... more Our previous data-driven analysis from five large-scale prospective studies discovered three trajectories (TR1, TR2, and TR3) composed of latent states for evolving patterns of islet autoantibodies (IAbs) : IAA, GADA and IA-2A. Here we examined the evolution of IAb levels within these trajectories for 2145 IAb positive participants, followed from early life, and compared those who progressed to T1D (n=643) to those who remained undiagnosed (n=1502) . Using threshold values determined by 5-year T1D risk, four levels were defined for each IAb (L0: negative; L1: lowest; L2: middle; L3: highest) and overlayed onto each visit (Figure) . In the diagnosed participants, high IAA levels were seen in TR1 and TR2 at ages <3 years, whereas IAA remained at lower levels in the undiagnosed. Proportions of dwell times at the four IAb levels significantly differed between the diagnosed and undiagnosed for GADA and IA-2A in all three trajectories (p<0.001) , but for IAA dwell times differed only within TR2 (p<0.05) . Overall, undiagnosed participants more frequently had low IAb levels and later appearance of IAb than those who were diagnosed (Figure) . Better characterization considering both timing and levels within distinct IAb trajectories towards T1D may lead to more personalized approaches to risk prediction and intervention. Disclosure B. Kwon: Employee; IBM. P. Achenbach: None. V. Anand: Employee; IBM, IBM. W. Hagopian: Research Support; Janssen Research & Development, LLC. J. Hu: Employee; IBM. E. Koski: Employee; IBM. Å. Lernmark: None. K. Ng: Employee; IBM. R. Veijola: None. B.I. Frohnert: Advisory Panel; Provention Bio, Inc. Funding JDRF (IBM: 1-RSC-2017-368-I-X, 1-IND-2019-717-I-X) , (DAISY: 1-SRA-2019-722-I-X, 1-RSC-2017-517-I-X, 5-ECR-2017-388-A-N) , (DiPiS: 1-SRA-2019-720-I-X, 1-RSC-2017-526-I-X) , (DIPP: 1-RSC-2018-555-I-X, 1-SRA-2016-342-M-R, 1-SRA-2019-732-M-B) , (DEW-IT: 1-SRA-2019-719-I-X, 1-RSC-2017-516-I-X) NIH (DAISY: DK032493, DK032083, DK104351; and DK116073; DiPiS: DK26190 CDC (DEW-IT: UR6/CCU017247) . European Union (DIPP: BMH4-CT98-3314) ; Novo Nordisk Foundation; Academy of Finland (Decision 292538 and Centre of Excellence in Molecular Systems Immunology and Physiology Research 2012-2017, Decision No. 250114) ; Special Research Funds for University Hospitals in Finland; Diabetes Research Foundation, Finland; and Sigrid Juselius Foundation, Finland. German Federal Ministry of Education and Research to the German Center for Diabetes Research. Swedish Research Council (grant no. 14064) , Swedish Childhood Diabetes Foundation, Swedish Diabetes Association, Nordisk Insulin Fund, SUS funds, Lion Club International, district 101-S, The royal Physiographic society, Skåne County Council Foundation for Research and Development as well as LUDC-IRC/EXODIAB funding from the Swedish foundation for strategic research (Dnr IRC15-0067) and Swedish research council (Dnr 2009-1039) . Hussman Foundation and by the Washington State Life Science Discovery Fund.
Research Square (Research Square), Jan 2, 2020
Background: Nonalcoholic fatty liver disease (NAFLD) is a highly prevalent yet under-diagnosed an... more Background: Nonalcoholic fatty liver disease (NAFLD) is a highly prevalent yet under-diagnosed and under-discussed disease. Given that NAFLD has not been explored su ciently compared with other diseases, opportunities abound for scientists to discover new biomarkers (such as laboratory observations, current comorbidities, behavioral descriptors) that can be linked to the development of conditions and complications that may develop at a later stage of the patient's life. Methods: We analyzed IBM Explorys, a repository that contains electronic medical records (EMRs) of more than 60
Circulation, Nov 16, 2021
Circulation, Nov 17, 2020
Background: Individuals of South Asian ancestry represent 23% of the world population and experie... more Background: Individuals of South Asian ancestry represent 23% of the world population and experience substantially increased risk of coronary artery disease (CAD) compared to most other ethnicities. AHA/ACC practice guidelines recognize South Asian ancestry as an important ‘risk-enhancing’ factor. The magnitude of increased risk despite contemporary care, extent to which it is captured by existing risk estimators, and its potential mechanisms warrant further study. Methods: We studied 9,310 individuals of South Asian ancestry and 456,454 individuals of European ancestry free of baseline CAD from the UK Biobank prospective cohort study. Results: Over a median follow-up of 8.1 years, we confirm a striking increase in incident CAD events in individuals of South Asian ancestry, occurring in 352 (3.8%) of South Asians versus 10,165 (2.2%) of European ancestry individuals. After adjusting for age, sex, and enrollment site, this corresponded to a hazard ratio of 2.25 (95% CI 2.02 - 2.52; p&lt;0.001). Importantly, this increased risk was not predicted by the AHA/ACC Pooled Cohorts Equation, which estimates similar 10-year risk for South Asians and European ancestry individuals. Adjustment for a broad range of clinical, anthropometric, and lifestyle risk factors led to modest attenuation of the hazard ratio to 1.73 (95% CI 1.43 - 2.09; p&lt;0.001). By analyzing the population attributable fractions (PAF) of various risk factors, we observe that diabetes accounts for an outsized proportion of risk in South Asians (PAF 0.20, 95% CI 0.14 - 0.25 vs. 0.07, 95% CI 0.06 - 0.08 in European ancestry individuals) and current smoking accounted for a higher proportion of risk in individuals of European ancestry (PAF of 0.08, 95% CI 0.07 - 0.09 vs. 0.02, 95% CI -0.02 - 0.05 in South Asians). Conclusion: In the largest prospective study to date, we confirm and extend prior observations of a substantially increased risk of CAD among South Asians that is not captured by current clinical risk estimators.
Journal of General Internal Medicine, Aug 24, 2022
BACKGROUND: The first surge of the COVID-19 pandemic entirely altered healthcare delivery. Whethe... more BACKGROUND: The first surge of the COVID-19 pandemic entirely altered healthcare delivery. Whether this also altered the receipt of high-and low-value care is unknown. OBJECTIVE: To test the association between the April through June 2020 surge of COVID-19 and various high-and low-value care measures to determine how the delivery of care changed. DESIGN: Difference in differences analysis, examining the difference in quality measures between the April through June 2020 surge quarter and the January through March 2020 quarter with the same 2 quarters' difference the year prior. PARTICIPANTS: Adults in the MarketScan® Commercial Database and Medicare Supplemental Database. MAIN MEASURES: Fifteen low-value and 16 high-value quality measures aggregated into 8 clinical quality composites (4 of these low-value). KEY RESULTS: We analyzed 9,352,569 adults. Mean age was 44 years (SD, 15.03), 52% were female, and 75% were employed. Receipt of nearly every type of low-value care decreased during the surge. For example, low-value cancer screening decreased 0.86% (95% CI, −1.03 to −0.69). Use of opioid medications for back and neck pain (DiD +0.94 [95% CI, +0.82 to +1.07]) and use of opioid medications for headache (DiD +0.38 [95% CI, 0.07 to 0.69]) were the only two measures to increase. Nearly all highvalue care measures also decreased. For example, highvalue diabetes care decreased 9.75% (95% CI, −10.79 to −8.71). CONCLUSIONS: The first COVID-19 surge was associated with receipt of less low-value care and substantially less high-value care for most measures, with the notable exception of increases in low-value opioid use.
arXiv (Cornell University), Apr 26, 2019
Clinical researchers use disease progression models to understand patient status and characterize... more Clinical researchers use disease progression models to understand patient status and characterize progression patterns from longitudinal health records. One approach for disease progression modeling is to describe patient status using a small number of states that represent distinctive distributions over a set of observed measures. Hidden Markov models (HMMs) and its variants are a class of models that both discover these states and make inferences of health states for patients. Despite the advantages of using the algorithms for discovering interesting patterns, it still remains challenging for medical experts to interpret model outputs, understand complex modeling parameters, and clinically make sense of the patterns. To tackle these problems, we conducted a design study with clinical scientists, statisticians, and visualization experts, with the goal to investigate disease progression pathways of chronic diseases, namely type 1 diabetes (T1D), Huntington's disease, Parkinson's disease, and chronic obstructive pulmonary disease (COPD). As a result, we introduce DPVis which seamlessly integrates model parameters and outcomes of HMMs into interpretable and interactive visualizations. In this study, we demonstrate that DPVis is successful in evaluating disease progression models, visually summarizing disease states, interactively exploring disease progression patterns, and building, analyzing, and comparing clinically relevant patient subgroups.
PubMed, 2020
Analyzing disease progression patterns can provide useful insights into the disease processes of ... more Analyzing disease progression patterns can provide useful insights into the disease processes of many chronic conditions. These analyses may help inform recruitment for prevention trials or the development and personalization of treatments for those affected. We learn disease progression patterns using Hidden Markov Models (HMM) and distill them into distinct trajectories using visualization methods. We apply it to the domain of Type 1 Diabetes (T1D) using large longitudinal observational data from the T1DI study group. Our method discovers distinct disease progression trajectories that corroborate with recently published findings. In this paper, we describe the iterative process of developing the model. These methods may also be applied to other chronic conditions that evolve over time.
The Journal of Clinical Endocrinology and Metabolism, Mar 4, 2022
Context: Rapid growth has been suggested to promote islet autoimmunity and progression to type 1 ... more Context: Rapid growth has been suggested to promote islet autoimmunity and progression to type 1 diabetes (T1D). Childhood growth has not been analyzed separately from the infant growth period in most previous studies, but it may have distinct features due to differences between the stages of development. Objective: We aimed to analyze the association of childhood growth with development of islet autoimmunity and progression to T1D diagnosis in children 1 to 8 years of age. Methods: Longitudinal data of childhood growth and development of islet autoimmunity and T1D were analyzed in a prospective cohort study including 10 145 children from Finland, Germany, Sweden, and the United States, 1-8 years of age with at least 3 height and weight measurements and at least 1 measurement of islet autoantibodies. The primary outcome was the appearance of islet autoimmunity and progression from islet autoimmunity to T1D. Results: Rapid increase in height (cm/year) was associated with increased risk of seroconversion to glutamic acid decarboxylase autoantibody, insulin autoantibody, or insulinoma-like antigen-2 autoantibody (hazard ratio [HR] = 1.26 [95% CI = 1.05, 1.51] for 1-3 years of age and HR = 1.48 [95% CI = 1.28, 1.73] for >3 years of age). Furthermore, height rate was positively associated with development of T1D (HR = 1.80 [95% CI = 1.15, 2.81]) in the analyses from seroconversion with insulin autoantibody to diabetes. Conclusion: Rapid height growth rate in childhood is associated with increased risk of islet autoimmunity and progression to T1D. Further work is needed to investigate the biological mechanism that may explain this association.
Nature Communications, Mar 21, 2022
the T1DI Study Group* Development of islet autoimmunity precedes the onset of type 1 diabetes in ... more the T1DI Study Group* Development of islet autoimmunity precedes the onset of type 1 diabetes in children, however, the presence of autoantibodies does not necessarily lead to manifest disease and the onset of clinical symptoms is hard to predict. Here we show, by longitudinal sampling of islet autoantibodies (IAb) to insulin, glutamic acid decarboxylase and islet antigen-2 that disease progression follows distinct trajectories. Of the combined Type 1 Data Intelligence cohort of 24662 participants, 2172 individuals fulfill the criteria of two or more follow-up visits and IAb positivity at least once, with 652 progressing to type 1 diabetes during the 15 years course of the study. Our Continuous-Time Hidden Markov Models, that are developed to discover and visualize latent states based on the collected data and clinical characteristics of the patients, show that the health state of participants progresses from 11 distinct latent states as per three trajectories (TR1, TR2 and TR3), with associated 5-year cumulative diabetes-free survival of 40% (95% confidence interval [CI], 35% to 47%), 62% (95% CI, 57% to 67%), and 88% (95% CI, 85% to 91%), respectively (p < 0.0001). Age, sex, and HLA-DR status further refine the progression rates within trajectories, enabling clinically useful prediction of disease onset.
BMC Medical Genomics, Oct 1, 2021
Background: Polygenic scores-which quantify inherited risk by integrating information from many c... more Background: Polygenic scores-which quantify inherited risk by integrating information from many common sites of DNA variation-may enable a tailored approach to clinical medicine. However, alongside considerable enthusiasm, we and others have highlighted a lack of standardized approaches for score disclosure. Here, we review the landscape of polygenic score reporting and describe a generalizable approach for development of a polygenic score disclosure tool for coronary artery disease. Methods: We assembled a working group of clinicians, geneticists, data visualization specialists, and software developers. The group reviewed existing polygenic score reports and then designed a two-page mock report for coronary artery disease. We then conducted a qualitative user-experience study with this report using an interview guide focused on comprehension, experience, and attitudes. Interviews were transcribed and analyzed for themes identification to inform report revision. Results: Review of nine existing polygenic score reports from commercial and academic groups demonstrated significant heterogeneity, reinforcing the need for additional efforts to study and standardize score disclosure. Using a newly developed mock score report, we conducted interviews with ten adult individuals (50% females, 70% without prior genetic testing experience, age range 20-70 years) recruited via an online platform. We identified three themes from interviews: (1) visual elements, such as color and simple graphics, enable participants to interpret, relate to, and contextualize their polygenic score, (2) word-based descriptions of risk and polygenic scores presented as percentiles were the best recognized and understood, (3) participants had varying levels of interest in understanding complex genomic information and therefore would benefit from additional resources that can adapt to their individual needs in real time. In response to user feedback, colors used for communicating risk were modified to minimize unintended color associations and odds ratios were removed. All 10 participants expressed interest in receiving a polygenic score report based on their personal genomic information.
Nature Communications, Aug 20, 2020
Genetic variation can predispose to disease both through (i) monogenic risk variants that disrupt... more Genetic variation can predispose to disease both through (i) monogenic risk variants that disrupt a physiologic pathway with large effect on disease and (ii) polygenic risk that involves many variants of small effect in different pathways. Few studies have explored the interplay between monogenic and polygenic risk. Here, we study 80,928 individuals to examine whether polygenic background can modify penetrance of disease in tier 1 genomic conditionsfamilial hypercholesterolemia, hereditary breast and ovarian cancer, and Lynch syndrome. Among carriers of a monogenic risk variant, we estimate substantial gradients in disease risk based on polygenic backgroundthe probability of disease by age 75 years ranged from 17% to 78% for coronary artery disease, 13% to 76% for breast cancer, and 11% to 80% for colon cancer. We propose that accounting for polygenic background is likely to increase accuracy of risk estimation for individuals who inherit a monogenic risk variant.
arXiv (Cornell University), May 3, 2023
In empirical studies with time-to-event outcomes, investigators often leverage observational data... more In empirical studies with time-to-event outcomes, investigators often leverage observational data to conduct causal inference on the effect of exposure when randomized controlled trial data is unavailable. Model misspecification and lack of overlap are common issues in observational studies, and they often lead to inconsistent and inefficient estimators of the average treatment effect. Estimators targeting overlap weighted effects have been proposed to address the challenge of poor overlap, and methods enabling flexible machine learning for nuisance models address model misspecification. However, the approaches that allow machine learning for nuisance models have not been extended to the setting of weighted average treatment effects for time-to-event outcomes when there is poor overlap. In this work, we propose a class of one-step cross-fitted double/debiased machine learning estimators for the weighted cumulative causal effect as a function of restriction time. We prove that the proposed estimators are consistent, asymptotically linear, and reach semiparametric efficiency bounds under regularity conditions. Our simulations show that the proposed estimators using nonparametric machine learning nuisance models perform as well as established methods that require correctly-specified parametric 1
Diabetes Care
OBJECTIVE To estimate the risk of progression to stage 3 type 1 diabetes based on varying definit... more OBJECTIVE To estimate the risk of progression to stage 3 type 1 diabetes based on varying definitions of multiple islet autoantibody positivity (mIA). RESEARCH DESIGN AND METHODS Type 1 Diabetes Intelligence (T1DI) is a combined prospective data set of children from Finland, Germany, Sweden, and the U.S. who have an increased genetic risk for type 1 diabetes. Analysis included 16,709 infants-toddlers enrolled by age 2.5 years and comparison between groups using Kaplan-Meier survival analysis. RESULTS Of 865 (5%) children with mIA, 537 (62%) progressed to type 1 diabetes. The 15-year cumulative incidence of diabetes varied from the most stringent definition (mIA/Persistent/2: two or more islet autoantibodies positive at the same visit with two or more antibodies persistent at next visit; 88% [95% CI 85–92%]) to the least stringent (mIA/Any: positivity for two islet autoantibodies without co-occurring positivity or persistence; 18% [5–40%]). Progression in mIA/Persistent/2 was signifi...
medRxiv (Cold Spring Harbor Laboratory), Jan 15, 2022
doi: medRxiv preprint NOTE: This preprint reports new research that has not been certified by pee... more doi: medRxiv preprint NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.
IBM journal of research and development, 2018
The volume and variety of health-related data continues to grow, spurred by the increasing adopti... more The volume and variety of health-related data continues to grow, spurred by the increasing adoption and use of electronic health records (EHRs), the explosion of "omics" data, and the proliferation of user-generated health data. User-generated data spans a wide spectrum and includes data related to activity, diet, exercise, sleep, symptoms, treatments, and outcomes that are collected by the patient outside of clinical settings. Examples include data generated from wearables, mobile applications, sensors, online surveys, and social media. These data offer unprecedented opportunities for the analysis of complex multidimensional interactions at the user level that can enable broader insights about the individual. In particular, these data allow the capturing of events that extend beyond the clinical or institutional setting and experiences that are not filtered through the lens of the healthcare provider or payer. At the same time, using these data for research purposes also brings challenges. Since such data are typically created at the discretion of the individual, it is prone to inconsistencies and other quality issues that are very difficult to regulate. Additionally, as with EHRs, provenance, privacy, security, and linking of disparate data sets continue to be a challenge. Beyond the data challenges, understanding how insights generated from the data can be made useful for key stakeholders (e.g., payers, providers, and patients) is essential for creating actual value from the data. This issue of the IBM Journal of Research and Development emphasizes new methods, models, capabilities, and technologies that focus on the collection, processing, privacy, curation, analysis, interpretation, and use of insights from user-generated health data. The first paper of this issue, by Bocu and Costache, concerns a homomorphic encryption-based system for securely managing personal health metrics data. The authors note that hardware and software solutions for the collection of personal health information continue to evolve. The reliable gathering of personal health information, previously usually possible only in dedicated medical settings, has recently become possible through wearable specialized medical devices. Among other drawbacks, these devices usually do not store the data locally and offer, at most, limited basic data processing features and few advanced processing capabilities for the collected personal health data. This paper describes an integrated personal health information system, which allows secure storage and processing of medical data in the cloud by using a comprehensive homomorphic encryption model to preserve the data privacy. The system collects the user data through a client application module, which is usually installed on the user's smartphone or smartwatch,
medRxiv (Cold Spring Harbor Laboratory), Nov 9, 2021
doi: medRxiv preprint NOTE: This preprint reports new research that has not been certified by pee... more doi: medRxiv preprint NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.
Nature Genetics, Jun 1, 2022
Take-down policy If you believe that this document breaches copyright please contact us providing... more Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.
Nature Communications, Jun 30, 2022
For any given level of overall adiposity, individuals vary considerably in fat distribution. The ... more For any given level of overall adiposity, individuals vary considerably in fat distribution. The inherited basis of fat distribution in the general population is not fully understood. Here, we study up to 38,965 UK Biobank participants with MRI-derived visceral (VAT), abdominal subcutaneous (ASAT), and gluteofemoral (GFAT) adipose tissue volumes. Because these fat depot volumes are highly correlated with BMI, we additionally study six local adiposity traits: VAT adjusted for BMI and height (VATadj), ASATadj, GFATadj, VAT/ASAT, VAT/GFAT, and ASAT/GFAT. We identify 250 independent common variants (39 newly-identified) associated with at least one trait, with many associations more pronounced in female participants. Rare variant association studies extend prior evidence for PDE3B as an important modulator of fat distribution. Local adiposity traits (1) highlight depot-specific genetic architecture and (2) enable construction of depot-specific polygenic scores that have divergent associations with type 2 diabetes and coronary artery disease. These resultsusing MRI-derived, BMI-independent measures of local adiposityconfirm fat distribution as a highly heritable trait with important implications for cardiometabolic health outcomes.
arXiv (Cornell University), Sep 13, 2022
Subgroup builder shows the distribution of patients by variables that can be used to create subgr... more Subgroup builder shows the distribution of patients by variables that can be used to create subgroups; (B) Risk score distribution displays the summary of risk scores and the estimated disease onset rate; (C) Subgroup summary presents the model performance (top) and the fairness of subgroups (bottom); (D) Model behavior explanations provide feature contributions to the risk scores; (E) Feature distributions show the mean and confidence intervals of a selected subgroup across variables.
Diabetes, Jun 1, 2022
Our previous data-driven analysis from five large-scale prospective studies discovered three traj... more Our previous data-driven analysis from five large-scale prospective studies discovered three trajectories (TR1, TR2, and TR3) composed of latent states for evolving patterns of islet autoantibodies (IAbs) : IAA, GADA and IA-2A. Here we examined the evolution of IAb levels within these trajectories for 2145 IAb positive participants, followed from early life, and compared those who progressed to T1D (n=643) to those who remained undiagnosed (n=1502) . Using threshold values determined by 5-year T1D risk, four levels were defined for each IAb (L0: negative; L1: lowest; L2: middle; L3: highest) and overlayed onto each visit (Figure) . In the diagnosed participants, high IAA levels were seen in TR1 and TR2 at ages <3 years, whereas IAA remained at lower levels in the undiagnosed. Proportions of dwell times at the four IAb levels significantly differed between the diagnosed and undiagnosed for GADA and IA-2A in all three trajectories (p<0.001) , but for IAA dwell times differed only within TR2 (p<0.05) . Overall, undiagnosed participants more frequently had low IAb levels and later appearance of IAb than those who were diagnosed (Figure) . Better characterization considering both timing and levels within distinct IAb trajectories towards T1D may lead to more personalized approaches to risk prediction and intervention. Disclosure B. Kwon: Employee; IBM. P. Achenbach: None. V. Anand: Employee; IBM, IBM. W. Hagopian: Research Support; Janssen Research & Development, LLC. J. Hu: Employee; IBM. E. Koski: Employee; IBM. Å. Lernmark: None. K. Ng: Employee; IBM. R. Veijola: None. B.I. Frohnert: Advisory Panel; Provention Bio, Inc. Funding JDRF (IBM: 1-RSC-2017-368-I-X, 1-IND-2019-717-I-X) , (DAISY: 1-SRA-2019-722-I-X, 1-RSC-2017-517-I-X, 5-ECR-2017-388-A-N) , (DiPiS: 1-SRA-2019-720-I-X, 1-RSC-2017-526-I-X) , (DIPP: 1-RSC-2018-555-I-X, 1-SRA-2016-342-M-R, 1-SRA-2019-732-M-B) , (DEW-IT: 1-SRA-2019-719-I-X, 1-RSC-2017-516-I-X) NIH (DAISY: DK032493, DK032083, DK104351; and DK116073; DiPiS: DK26190 CDC (DEW-IT: UR6/CCU017247) . European Union (DIPP: BMH4-CT98-3314) ; Novo Nordisk Foundation; Academy of Finland (Decision 292538 and Centre of Excellence in Molecular Systems Immunology and Physiology Research 2012-2017, Decision No. 250114) ; Special Research Funds for University Hospitals in Finland; Diabetes Research Foundation, Finland; and Sigrid Juselius Foundation, Finland. German Federal Ministry of Education and Research to the German Center for Diabetes Research. Swedish Research Council (grant no. 14064) , Swedish Childhood Diabetes Foundation, Swedish Diabetes Association, Nordisk Insulin Fund, SUS funds, Lion Club International, district 101-S, The royal Physiographic society, Skåne County Council Foundation for Research and Development as well as LUDC-IRC/EXODIAB funding from the Swedish foundation for strategic research (Dnr IRC15-0067) and Swedish research council (Dnr 2009-1039) . Hussman Foundation and by the Washington State Life Science Discovery Fund.
Research Square (Research Square), Jan 2, 2020
Background: Nonalcoholic fatty liver disease (NAFLD) is a highly prevalent yet under-diagnosed an... more Background: Nonalcoholic fatty liver disease (NAFLD) is a highly prevalent yet under-diagnosed and under-discussed disease. Given that NAFLD has not been explored su ciently compared with other diseases, opportunities abound for scientists to discover new biomarkers (such as laboratory observations, current comorbidities, behavioral descriptors) that can be linked to the development of conditions and complications that may develop at a later stage of the patient's life. Methods: We analyzed IBM Explorys, a repository that contains electronic medical records (EMRs) of more than 60
Circulation, Nov 16, 2021
Circulation, Nov 17, 2020
Background: Individuals of South Asian ancestry represent 23% of the world population and experie... more Background: Individuals of South Asian ancestry represent 23% of the world population and experience substantially increased risk of coronary artery disease (CAD) compared to most other ethnicities. AHA/ACC practice guidelines recognize South Asian ancestry as an important ‘risk-enhancing’ factor. The magnitude of increased risk despite contemporary care, extent to which it is captured by existing risk estimators, and its potential mechanisms warrant further study. Methods: We studied 9,310 individuals of South Asian ancestry and 456,454 individuals of European ancestry free of baseline CAD from the UK Biobank prospective cohort study. Results: Over a median follow-up of 8.1 years, we confirm a striking increase in incident CAD events in individuals of South Asian ancestry, occurring in 352 (3.8%) of South Asians versus 10,165 (2.2%) of European ancestry individuals. After adjusting for age, sex, and enrollment site, this corresponded to a hazard ratio of 2.25 (95% CI 2.02 - 2.52; p&lt;0.001). Importantly, this increased risk was not predicted by the AHA/ACC Pooled Cohorts Equation, which estimates similar 10-year risk for South Asians and European ancestry individuals. Adjustment for a broad range of clinical, anthropometric, and lifestyle risk factors led to modest attenuation of the hazard ratio to 1.73 (95% CI 1.43 - 2.09; p&lt;0.001). By analyzing the population attributable fractions (PAF) of various risk factors, we observe that diabetes accounts for an outsized proportion of risk in South Asians (PAF 0.20, 95% CI 0.14 - 0.25 vs. 0.07, 95% CI 0.06 - 0.08 in European ancestry individuals) and current smoking accounted for a higher proportion of risk in individuals of European ancestry (PAF of 0.08, 95% CI 0.07 - 0.09 vs. 0.02, 95% CI -0.02 - 0.05 in South Asians). Conclusion: In the largest prospective study to date, we confirm and extend prior observations of a substantially increased risk of CAD among South Asians that is not captured by current clinical risk estimators.
Journal of General Internal Medicine, Aug 24, 2022
BACKGROUND: The first surge of the COVID-19 pandemic entirely altered healthcare delivery. Whethe... more BACKGROUND: The first surge of the COVID-19 pandemic entirely altered healthcare delivery. Whether this also altered the receipt of high-and low-value care is unknown. OBJECTIVE: To test the association between the April through June 2020 surge of COVID-19 and various high-and low-value care measures to determine how the delivery of care changed. DESIGN: Difference in differences analysis, examining the difference in quality measures between the April through June 2020 surge quarter and the January through March 2020 quarter with the same 2 quarters' difference the year prior. PARTICIPANTS: Adults in the MarketScan® Commercial Database and Medicare Supplemental Database. MAIN MEASURES: Fifteen low-value and 16 high-value quality measures aggregated into 8 clinical quality composites (4 of these low-value). KEY RESULTS: We analyzed 9,352,569 adults. Mean age was 44 years (SD, 15.03), 52% were female, and 75% were employed. Receipt of nearly every type of low-value care decreased during the surge. For example, low-value cancer screening decreased 0.86% (95% CI, −1.03 to −0.69). Use of opioid medications for back and neck pain (DiD +0.94 [95% CI, +0.82 to +1.07]) and use of opioid medications for headache (DiD +0.38 [95% CI, 0.07 to 0.69]) were the only two measures to increase. Nearly all highvalue care measures also decreased. For example, highvalue diabetes care decreased 9.75% (95% CI, −10.79 to −8.71). CONCLUSIONS: The first COVID-19 surge was associated with receipt of less low-value care and substantially less high-value care for most measures, with the notable exception of increases in low-value opioid use.
arXiv (Cornell University), Apr 26, 2019
Clinical researchers use disease progression models to understand patient status and characterize... more Clinical researchers use disease progression models to understand patient status and characterize progression patterns from longitudinal health records. One approach for disease progression modeling is to describe patient status using a small number of states that represent distinctive distributions over a set of observed measures. Hidden Markov models (HMMs) and its variants are a class of models that both discover these states and make inferences of health states for patients. Despite the advantages of using the algorithms for discovering interesting patterns, it still remains challenging for medical experts to interpret model outputs, understand complex modeling parameters, and clinically make sense of the patterns. To tackle these problems, we conducted a design study with clinical scientists, statisticians, and visualization experts, with the goal to investigate disease progression pathways of chronic diseases, namely type 1 diabetes (T1D), Huntington's disease, Parkinson's disease, and chronic obstructive pulmonary disease (COPD). As a result, we introduce DPVis which seamlessly integrates model parameters and outcomes of HMMs into interpretable and interactive visualizations. In this study, we demonstrate that DPVis is successful in evaluating disease progression models, visually summarizing disease states, interactively exploring disease progression patterns, and building, analyzing, and comparing clinically relevant patient subgroups.
PubMed, 2020
Analyzing disease progression patterns can provide useful insights into the disease processes of ... more Analyzing disease progression patterns can provide useful insights into the disease processes of many chronic conditions. These analyses may help inform recruitment for prevention trials or the development and personalization of treatments for those affected. We learn disease progression patterns using Hidden Markov Models (HMM) and distill them into distinct trajectories using visualization methods. We apply it to the domain of Type 1 Diabetes (T1D) using large longitudinal observational data from the T1DI study group. Our method discovers distinct disease progression trajectories that corroborate with recently published findings. In this paper, we describe the iterative process of developing the model. These methods may also be applied to other chronic conditions that evolve over time.
The Journal of Clinical Endocrinology and Metabolism, Mar 4, 2022
Context: Rapid growth has been suggested to promote islet autoimmunity and progression to type 1 ... more Context: Rapid growth has been suggested to promote islet autoimmunity and progression to type 1 diabetes (T1D). Childhood growth has not been analyzed separately from the infant growth period in most previous studies, but it may have distinct features due to differences between the stages of development. Objective: We aimed to analyze the association of childhood growth with development of islet autoimmunity and progression to T1D diagnosis in children 1 to 8 years of age. Methods: Longitudinal data of childhood growth and development of islet autoimmunity and T1D were analyzed in a prospective cohort study including 10 145 children from Finland, Germany, Sweden, and the United States, 1-8 years of age with at least 3 height and weight measurements and at least 1 measurement of islet autoantibodies. The primary outcome was the appearance of islet autoimmunity and progression from islet autoimmunity to T1D. Results: Rapid increase in height (cm/year) was associated with increased risk of seroconversion to glutamic acid decarboxylase autoantibody, insulin autoantibody, or insulinoma-like antigen-2 autoantibody (hazard ratio [HR] = 1.26 [95% CI = 1.05, 1.51] for 1-3 years of age and HR = 1.48 [95% CI = 1.28, 1.73] for >3 years of age). Furthermore, height rate was positively associated with development of T1D (HR = 1.80 [95% CI = 1.15, 2.81]) in the analyses from seroconversion with insulin autoantibody to diabetes. Conclusion: Rapid height growth rate in childhood is associated with increased risk of islet autoimmunity and progression to T1D. Further work is needed to investigate the biological mechanism that may explain this association.
Nature Communications, Mar 21, 2022
the T1DI Study Group* Development of islet autoimmunity precedes the onset of type 1 diabetes in ... more the T1DI Study Group* Development of islet autoimmunity precedes the onset of type 1 diabetes in children, however, the presence of autoantibodies does not necessarily lead to manifest disease and the onset of clinical symptoms is hard to predict. Here we show, by longitudinal sampling of islet autoantibodies (IAb) to insulin, glutamic acid decarboxylase and islet antigen-2 that disease progression follows distinct trajectories. Of the combined Type 1 Data Intelligence cohort of 24662 participants, 2172 individuals fulfill the criteria of two or more follow-up visits and IAb positivity at least once, with 652 progressing to type 1 diabetes during the 15 years course of the study. Our Continuous-Time Hidden Markov Models, that are developed to discover and visualize latent states based on the collected data and clinical characteristics of the patients, show that the health state of participants progresses from 11 distinct latent states as per three trajectories (TR1, TR2 and TR3), with associated 5-year cumulative diabetes-free survival of 40% (95% confidence interval [CI], 35% to 47%), 62% (95% CI, 57% to 67%), and 88% (95% CI, 85% to 91%), respectively (p < 0.0001). Age, sex, and HLA-DR status further refine the progression rates within trajectories, enabling clinically useful prediction of disease onset.
BMC Medical Genomics, Oct 1, 2021
Background: Polygenic scores-which quantify inherited risk by integrating information from many c... more Background: Polygenic scores-which quantify inherited risk by integrating information from many common sites of DNA variation-may enable a tailored approach to clinical medicine. However, alongside considerable enthusiasm, we and others have highlighted a lack of standardized approaches for score disclosure. Here, we review the landscape of polygenic score reporting and describe a generalizable approach for development of a polygenic score disclosure tool for coronary artery disease. Methods: We assembled a working group of clinicians, geneticists, data visualization specialists, and software developers. The group reviewed existing polygenic score reports and then designed a two-page mock report for coronary artery disease. We then conducted a qualitative user-experience study with this report using an interview guide focused on comprehension, experience, and attitudes. Interviews were transcribed and analyzed for themes identification to inform report revision. Results: Review of nine existing polygenic score reports from commercial and academic groups demonstrated significant heterogeneity, reinforcing the need for additional efforts to study and standardize score disclosure. Using a newly developed mock score report, we conducted interviews with ten adult individuals (50% females, 70% without prior genetic testing experience, age range 20-70 years) recruited via an online platform. We identified three themes from interviews: (1) visual elements, such as color and simple graphics, enable participants to interpret, relate to, and contextualize their polygenic score, (2) word-based descriptions of risk and polygenic scores presented as percentiles were the best recognized and understood, (3) participants had varying levels of interest in understanding complex genomic information and therefore would benefit from additional resources that can adapt to their individual needs in real time. In response to user feedback, colors used for communicating risk were modified to minimize unintended color associations and odds ratios were removed. All 10 participants expressed interest in receiving a polygenic score report based on their personal genomic information.
Nature Communications, Aug 20, 2020
Genetic variation can predispose to disease both through (i) monogenic risk variants that disrupt... more Genetic variation can predispose to disease both through (i) monogenic risk variants that disrupt a physiologic pathway with large effect on disease and (ii) polygenic risk that involves many variants of small effect in different pathways. Few studies have explored the interplay between monogenic and polygenic risk. Here, we study 80,928 individuals to examine whether polygenic background can modify penetrance of disease in tier 1 genomic conditionsfamilial hypercholesterolemia, hereditary breast and ovarian cancer, and Lynch syndrome. Among carriers of a monogenic risk variant, we estimate substantial gradients in disease risk based on polygenic backgroundthe probability of disease by age 75 years ranged from 17% to 78% for coronary artery disease, 13% to 76% for breast cancer, and 11% to 80% for colon cancer. We propose that accounting for polygenic background is likely to increase accuracy of risk estimation for individuals who inherit a monogenic risk variant.
arXiv (Cornell University), May 3, 2023
In empirical studies with time-to-event outcomes, investigators often leverage observational data... more In empirical studies with time-to-event outcomes, investigators often leverage observational data to conduct causal inference on the effect of exposure when randomized controlled trial data is unavailable. Model misspecification and lack of overlap are common issues in observational studies, and they often lead to inconsistent and inefficient estimators of the average treatment effect. Estimators targeting overlap weighted effects have been proposed to address the challenge of poor overlap, and methods enabling flexible machine learning for nuisance models address model misspecification. However, the approaches that allow machine learning for nuisance models have not been extended to the setting of weighted average treatment effects for time-to-event outcomes when there is poor overlap. In this work, we propose a class of one-step cross-fitted double/debiased machine learning estimators for the weighted cumulative causal effect as a function of restriction time. We prove that the proposed estimators are consistent, asymptotically linear, and reach semiparametric efficiency bounds under regularity conditions. Our simulations show that the proposed estimators using nonparametric machine learning nuisance models perform as well as established methods that require correctly-specified parametric 1
Diabetes Care
OBJECTIVE To estimate the risk of progression to stage 3 type 1 diabetes based on varying definit... more OBJECTIVE To estimate the risk of progression to stage 3 type 1 diabetes based on varying definitions of multiple islet autoantibody positivity (mIA). RESEARCH DESIGN AND METHODS Type 1 Diabetes Intelligence (T1DI) is a combined prospective data set of children from Finland, Germany, Sweden, and the U.S. who have an increased genetic risk for type 1 diabetes. Analysis included 16,709 infants-toddlers enrolled by age 2.5 years and comparison between groups using Kaplan-Meier survival analysis. RESULTS Of 865 (5%) children with mIA, 537 (62%) progressed to type 1 diabetes. The 15-year cumulative incidence of diabetes varied from the most stringent definition (mIA/Persistent/2: two or more islet autoantibodies positive at the same visit with two or more antibodies persistent at next visit; 88% [95% CI 85–92%]) to the least stringent (mIA/Any: positivity for two islet autoantibodies without co-occurring positivity or persistence; 18% [5–40%]). Progression in mIA/Persistent/2 was signifi...