Longitudinal Screening for Diabetic Retinopathy in a Nationwide Screening Program: Comparing Deep Learning and Human Graders (original) (raw)

Deep learning versus human graders for classifying diabetic retinopathy severity in a nationwide screening program

npj Digital Medicine, 2019

Deep learning algorithms have been used to detect diabetic retinopathy (DR) with specialist-level accuracy. This study aims to validate one such algorithm on a large-scale clinical population, and compare the algorithm performance with that of human graders. A total of 25,326 gradable retinal images of patients with diabetes from the community-based, nationwide screening program of DR in Thailand were analyzed for DR severity and referable diabetic macular edema (DME). Grades adjudicated by a panel of international retinal specialists served as the reference standard. Relative to human graders, for detecting referable DR (moderate NPDR or worse), the deep learning algorithm had significantly higher sensitivity (0.97 vs. 0.74, p < 0.001), and a slightly lower specificity (0.96 vs. 0.98, p < 0.001). Higher sensitivity of the algorithm was also observed for each of the categories of severe or worse NPDR, PDR, and DME (p < 0.001 for all comparisons). The quadratic-weighted kappa for determination of DR severity levels by the algorithm and human graders was 0.85 and 0.78 respectively (p < 0.001 for the difference). Across different severity levels of DR for determining referable disease, deep learning significantly reduced the false negative rate (by 23%) at the cost of slightly higher false positive rates (2%). Deep learning algorithms may serve as a valuable tool for DR screening.

Deep Learning vs. Human Graders for Classifying Severity Levels of Diabetic Retinopathy in a Real-World Nationwide Screening Program

ArXiv, 2018

Deep learning algorithms have been used to detect diabetic retinopathy (DR) with specialist-level accuracy. This study aims to validate one such algorithm on a large-scale clinical population, and compare the algorithm performance with that of human graders. 25,326 gradable retinal images of patients with diabetes from the community-based, nation-wide screening program of DR in Thailand were analyzed for DR severity and referable diabetic macular edema (DME). Grades adjudicated by a panel of international retinal specialists served as the reference standard. Across different severity levels of DR for determining referable disease, deep learning significantly reduced the false negative rate (by 23%) at the cost of slightly higher false positive rates (2%). Deep learning algorithms may serve as a valuable tool for DR screening.

Deep Learning for Automated Diabetic Retinopathy Screening Fused With Heterogeneous Data From EHRs Can Lead to Earlier Referral Decisions

Translational Vision Science & Technology, 2021

Purpose Fundus images are typically used as the sole training input for automated diabetic retinopathy (DR) classification. In this study, we considered several well-known DR risk factors and attempted to improve the accuracy of DR screening. Metphods Fusing nonimage data (e.g., age, gender, smoking status, International Classification of Disease code, and laboratory tests) with data from fundus images can enable an end-to-end deep learning architecture for DR screening. We propose a neural network that simultaneously trains heterogeneous data and increases the performance of DR classification in terms of sensitivity and specificity. In the current retrospective study, 13,410 fundus images and their corresponding nonimage data were collected from the Chung Shan Medical University Hospital in Taiwan. The images were classified as either nonreferable or referable for DR by a panel of ophthalmologists. Cross-validation was used for the training models and to evaluate the classification performance. Results The proposed fusion model achieved 97.96% area under the curve with 96.84% sensitivity and 89.44% specificity for determining referable DR from multimodal data, and significantly outperformed the models that used image or nonimage information separately. Conclusions The fusion model with heterogeneous data has the potential to improve referable DR screening performance for earlier referral decisions. Translational Relevance Artificial intelligence fused with heterogeneous data from electronic health records could provide earlier referral decisions from DR screening.

ARTICLE Deep learning algorithm predicts diabetic retinopathy progression in individual patients

The global burden of diabetic retinopathy (DR) continues to worsen and DR remains a leading cause of vision loss worldwide. Here, we describe an algorithm to predict DR progression by means of deep learning (DL), using as input color fundus photographs (CFPs) acquired at a single visit from a patient with DR. The proposed DL models were designed to predict future DR progression, defined as 2-step worsening on the Early Treatment Diabetic Retinopathy Diabetic Retinopathy Severity Scale, and were trained against DR severity scores assessed after 6, 12, and 24 months from the baseline visit by masked, well-trained, human reading center graders. The performance of one of these models (prediction at month 12) resulted in an area under the curve equal to 0.79. Interestingly, our results highlight the importance of the predictive signal located in the peripheral retinal fields, not routinely collected for DR assessments, and the importance of microvascular abnormalities. Our findings show the feasibility of predicting future DR progression by leveraging CFPs of a patient acquired at a single visit. Upon further development on larger and more diverse datasets, such an algorithm could enable early diagnosis and referral to a retina specialist for more frequent monitoring and even consideration of early intervention. Moreover, it could also improve patient recruitment for clinical trials targeting DR. npj Digital Medicine (2019) 2:92 ; https://doi.

Deep learning in estimating prevalence and systemic risk factors for diabetic retinopathy: a multi-ethnic study

npj Digital Medicine, 2019

In any community, the key to understanding the burden of a specific condition is to conduct an epidemiological study. The deep learning system (DLS) recently showed promising diagnostic performance for diabetic retinopathy (DR). This study aims to use DLS as the grading tool, instead of human assessors, to determine the prevalence and the systemic cardiovascular risk factors for DR on fundus photographs, in patients with diabetes. This is a multi-ethnic (5 races), multi-site (8 datasets from Singapore, USA, Hong Kong, China and Australia), cross-sectional study involving 18,912 patients (n = 93,293 images). We compared these results and the time taken for DR assessment by DLS versus 17 human assessors-10 retinal specialists/ophthalmologists and 7 professional graders). The estimation of DR prevalence between DLS and human assessors is comparable for any DR, referable DR and vision-threatening DR (VTDR) (Human assessors: 15.9, 6.5% and 4.1%; DLS: 16.1%, 6.4%, 3.7%). Both assessment methods identified similar risk factors (with comparable AUCs), including younger age, longer diabetes duration, increased HbA1c and systolic blood pressure, for any DR, referable DR and VTDR (p > 0.05). The total time taken for DLS to evaluate DR from 93,293 fundus photographs was~1 month compared to 2 years for human assessors. In conclusion, the prevalence and systemic risk factors for DR in multiethnic population could be determined accurately using a DLS, in significantly less time than human assessors. This study highlights the potential use of AI for future epidemiology or clinical trials for DR grading in the global communities.

Artificial Intelligence Screening for Diabetic Retinopathy: the Real-World Emerging Application

Current Diabetes Reports, 2019

Purpose of Review This paper systematically reviews the recent progress in diabetic retinopathy screening. It provides an integrated overview of the current state of knowledge of emerging techniques using artificial intelligence integration in national screening programs around the world. Existing methodological approaches and research insights are evaluated. An understanding of existing gaps and future directions is created. Recent Findings Over the past decades, artificial intelligence has emerged into the scientific consciousness with breakthroughs that are sparking increasing interest among computer science and medical communities. Specifically, machine learning and deep learning (a subtype of machine learning) applications of artificial intelligence are spreading into areas that previously were thought to be only the purview of humans, and a number of applications in ophthalmology field have been explored. Multiple studies all around the world have demonstrated that such systems can behave on par with clinical experts with robust diagnostic performance in diabetic retinopathy diagnosis. However, only few tools have been evaluated in clinical prospective studies. Summary Given the rapid and impressive progress of artificial intelligence technologies, the implementation of deep learning systems into routinely practiced diabetic retinopathy screening could represent a cost-effective alternative to help reduce the incidence of preventable blindness around the world. Valentina Bellemo and Gilbert Lim contributed equally to this work. This article is part of the Topical Collection on Microvascular Complications-Retinopathy

A Human-Centered Evaluation of a Deep Learning System Deployed in Clinics for the Detection of Diabetic Retinopathy

Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems

Deep learning algorithms promise to improve clinician workflows and patient outcomes. However, these gains have yet to be fully demonstrated in real world clinical settings. In this paper, we describe a human-centered study of a deep learning system used in clinics for the detection of diabetic eye disease. From interviews and observation across eleven clinics in Thailand, we characterize current eye-screening workflows, user expectations for an AI-assisted screening process, and post-deployment experiences. Our findings indicate that several socio-environmental factors impact model performance, nursing workflows, and the patient experience. We draw on these findings to reflect on the value of conducting humancentered evaluative research alongside prospective evaluations of model accuracy.

Deep learning algorithm predicts diabetic retinopathy progression in individual patients

npj Digital Medicine, 2019

The global burden of diabetic retinopathy (DR) continues to worsen and DR remains a leading cause of vision loss worldwide. Here, we describe an algorithm to predict DR progression by means of deep learning (DL), using as input color fundus photographs (CFPs) acquired at a single visit from a patient with DR. The proposed DL models were designed to predict future DR progression, defined as 2-step worsening on the Early Treatment Diabetic Retinopathy Diabetic Retinopathy Severity Scale, and were trained against DR severity scores assessed after 6, 12, and 24 months from the baseline visit by masked, well-trained, human reading center graders. The performance of one of these models (prediction at month 12) resulted in an area under the curve equal to 0.79. Interestingly, our results highlight the importance of the predictive signal located in the peripheral retinal fields, not routinely collected for DR assessments, and the importance of microvascular abnormalities. Our findings show ...

Application of deep learning image assessment software VeriSee™ for diabetic retinopathy screening

Journal of the Formosan Medical Association, 2021

Purpose: To develop a deep learning image assessment software VeriSeeä and to validate its accuracy in grading the severity of diabetic retinopathy (DR). Methods: Diabetic patients who underwent single-field, nonmydriatic, 45-degree color retinal fundus photography at National Taiwan University Hospital between July 2007 and June 2017 were retrospectively recruited. A total of 7524 judgeable color fundus images were collected and were graded for the severity of DR by ophthalmologists. Among these pictures, 5649 along with another 31,612 color fundus images from the EyePACS dataset were used for model training of VeriSeeä. The other 1875 images were used for validation and were graded for the severity of DR by VeriSeeä, ophthalmologists, and internal physicians. Area under the receiver operating characteristic curve (AUC) for VeriSeeä, and the sensitivities and specificities for VeriSeeä, ophthalmologists, and internal physicians in diagnosing DR were calculated. Results: The AUCs for VeriSeeä in diagnosing any DR, referable DR and proliferative diabetic retinopathy (PDR) were 0.955, 0.955 and 0.984, respectively. VeriSeeä had better sensitivities

Cost-Utility Analysis of Deep Learning and Trained Human Graders for Diabetic Retinopathy Screening in a Nationwide Program

Ophthalmology and Therapy

Introduction: Deep learning (DL) for screening diabetic retinopathy (DR) has the potential to address limited healthcare resources by enabling expanded access to healthcare. However, there is still limited health economic evaluation, particularly in low-and middle-income countries, on this subject to aid decision-making for DL adoption. Methods: In the context of a middle-income country (MIC), using Thailand as a model, we constructed a decision tree-Markov hybrid model to estimate lifetime costs and outcomes of Thailand's national DR screening program via DL and trained human graders (HG). We calculated the incremental cost-effectiveness ratio (ICER) between the two strategies. Sensitivity analyses were performed to probe the influence of modeling parameters.