Expert-Level Diagnosis of Nonpigmented Skin Cancer by Combined Convolutional Neural Networks (original) (raw)

Man against machine reloaded: performance of a market-approved convolutional neural network in classifying a broad spectrum of skin lesions in comparison with 96 dermatologists working under less artificial conditions

Annals of Oncology, 2020

Background: Convolutional neural networks (CNNs) efficiently differentiate skin lesions by image analysis. Studies comparing a market-approved CNN in a broad range of diagnoses to dermatologists working under less artificial conditions are lacking. Materials and methods: One hundred cases of pigmented/non-pigmented skin cancers and benign lesions were used for a two-level reader study in 96 dermatologists (level I: dermoscopy only; level II: clinical close-up images, dermoscopy, and textual information). Additionally, dermoscopic images were classified by a CNN approved for the European market as a medical device (Moleanalyzer Pro, FotoFinder Systems, Bad Birnbach, Germany). Primary endpoints were the sensitivity and specificity of the CNN's dichotomous classification in comparison with the dermatologists' management decisions. Secondary endpoints included the dermatologists' diagnostic decisions, their performance according to their level of experience, and the CNN's area under the curve (AUC) of receiver operating characteristics (ROC). Results: The CNN revealed a sensitivity, specificity, and ROC AUC with corresponding 95% confidence intervals (CI) of 95.0% (95% CI 83.5% to 98.6%), 76.7% (95% CI 64.6% to 85.6%), and 0.918 (95% CI 0.866e0.970), respectively. In level I, the dermatologists' management decisions showed a mean sensitivity and specificity of 89.0% (95% CI 87.4% to 90.6%) and 80.7% (95% CI 78.8% to 82.6%). With level II information, the sensitivity significantly improved to 94.1% (95% CI 93.1% to 95.1%; P < 0.001), while the specificity remained unchanged at 80.4% (95% CI 78.4% to 82.4%; P ¼ 0.97). When fixing the CNN's specificity at the mean specificity of the dermatologists' management decision in level II (80.4%), the CNN's sensitivity was almost equal to that of human raters, at 95% (95% CI 83.5% to 98.6%) versus 94.1% (95% CI 93.1% to 95.1%); P ¼ 0.1. In contrast, dermatologists were outperformed by the CNN in their level I management decisions and level I and II diagnostic decisions. More experienced dermatologists frequently surpassed the CNN's performance. Conclusions: Under less artificial conditions and in a broader spectrum of diagnoses, the CNN and most dermatologists performed on the same level. Dermatologists are trained to integrate information from a range of sources rendering comparative studies that are solely based on one single case image inadequate.

Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists

Annals of Oncology, 2018

Background: Deep learning convolutional neural networks (CNN) may facilitate melanoma detection, but data comparing a CNN's diagnostic performance to larger groups of dermatologists are lacking. Methods: Google's Inception v4 CNN architecture was trained and validated using dermoscopic images and corresponding diagnoses. In a comparative cross-sectional reader study a 100-image test-set was used (level-I: dermoscopy only; level-II: dermoscopy plus clinical information and images). Main outcome measures were sensitivity, specificity and area under the curve (AUC) of receiver operating characteristics (ROC) for diagnostic classification (dichotomous) of lesions by the CNN versus an international group of 58 dermatologists during level-I or-II of the reader study. Secondary end points included the dermatologists' diagnostic performance in their management decisions and differences in the diagnostic performance of dermatologists during level-I and-II of the reader study. Additionally, the CNN's performance was compared with the top-five algorithms of the 2016 International Symposium on Biomedical Imaging (ISBI) challenge. Results: In level-I dermatologists achieved a mean (6standard deviation) sensitivity and specificity for lesion classification of 86.6% (69.3%) and 71.3% (611.2%), respectively. More clinical information (level-II) improved the sensitivity to 88.9% (69.6%, P ¼ 0.19) and specificity to 75.7% (611.7%, P < 0.05). The CNN ROC curve revealed a higher specificity of 82.5% when compared with dermatologists in level-I (71.3%, P < 0.01) and level-II (75.7%, P < 0.01) at their sensitivities of 86.6% and 88.9%, respectively. The CNN ROC AUC was greater than the mean ROC area of dermatologists (0.86 versus 0.79, P < 0.01). The CNN scored results close to the top three algorithms of the ISBI 2016 challenge. Conclusions: For the first time we compared a CNN's diagnostic performance with a large international group of 58 dermatologists, including 30 experts. Most dermatologists were outperformed by the CNN. Irrespective of any physicians' experience, they may benefit from assistance by a CNN's image classification.

Association Between Surgical Skin Markings in Dermoscopic Images and Diagnostic Performance of a Deep Learning Convolutional Neural Network for Melanoma Recognition

JAMA Dermatology, 2019

IMPORTANCE Deep learning convolutional neural networks (CNNs) have shown a performance at the level of dermatologists in the diagnosis of melanoma. Accordingly, further exploring the potential limitations of CNN technology before broadly applying it is of special interest. OBJECTIVE To investigate the association between gentian violet surgical skin markings in dermoscopic images and the diagnostic performance of a CNN approved for use as a medical device in the European market. DESIGN AND SETTING A cross-sectional analysis was conducted from August 1, 2018, to November 30, 2018, using a CNN architecture trained with more than 120 000 dermoscopic images of skin neoplasms and corresponding diagnoses. The association of gentian violet skin markings in dermoscopic images with the performance of the CNN was investigated in 3 image sets of 130 melanocytic lesions each (107 benign nevi, 23 melanomas). EXPOSURES The same lesions were sequentially imaged with and without the application of a gentian violet surgical skin marker and then evaluated by the CNN for their probability of being a melanoma. In addition, the markings were removed by manually cropping the dermoscopic images to focus on the melanocytic lesion. MAIN OUTCOMES AND MEASURES Sensitivity, specificity, and area under the curve (AUC) of the receiver operating characteristic (ROC) curve for the CNN's diagnostic classification in unmarked, marked, and cropped images. RESULTS In all, 130 melanocytic lesions (107 benign nevi and 23 melanomas) were imaged. In unmarked lesions, the CNN achieved a sensitivity of 95.7% (95% CI, 79%-99.2%) and a specificity of 84.1% (95% CI, 76.0%-89.8%). The ROC AUC was 0.969. In marked lesions, an increase in melanoma probability scores was observed that resulted in a sensitivity of 100% (95% CI, 85.7%-100%) and a significantly reduced specificity of 45.8% (95% CI, 36.7%-55.2%, P < .001). The ROC AUC was 0.922. Cropping images led to the highest sensitivity of 100% (95% CI, 85.7%-100%), specificity of 97.2% (95% CI, 92.1%-99.0%), and ROC AUC of 0.993. Heat maps created by vanilla gradient descent backpropagation indicated that the blue markings were associated with the increased false-positive rate. CONCLUSIONS AND RELEVANCE This study's findings suggest that skin markings significantly interfered with the CNN's correct diagnosis of nevi by increasing the melanoma probability scores and consequently the false-positive rate. A predominance of skin markings in melanoma training images may have induced the CNN's association of markings with a melanoma diagnosis. Accordingly, these findings suggest that skin markings should be avoided in dermoscopic images intended for analysis by a CNN.

A convolutional neural network trained with dermoscopic images performed on par with 145 dermatologists in a clinical melanoma image classification task

European Journal of Cancer, 2019

Background: Recent studies have demonstrated the use of convolutional neural networks (CNNs) to classify images of melanoma with accuracies comparable to those achieved by board-certified dermatologists. However, the performance of a CNN exclusively trained with dermoscopic images in a clinical image classification task in direct competition with a large number of dermatologists has not been measured to date. This study compares the performance of a convolutional neuronal network trained with dermoscopic images exclusively for identifying melanoma in clinical photographs with the manual grading of the same images by dermatologists.

The Possibility of Selective Skin Lesion Classification in Convolutional Neural Networks

International Journal of Sciences: Basic and Applied Research, 2020

Selective classification of skin lesion images and uncertainty estimation is examined to increase the adoption of convolutional neural networks(CNNs) in automated skin cancer diagnostic systems. Research on the application of deep learning models to skin cancer diagnosis has shown success as models outperform medical experts [1]. However, concerns on uncertainty in classifiers and difficulty in approximating uncertainty has caused limited adoption of CNNs in Computer-aided diagnostic systems (CADs) in health care. This research propose selective classification to increase confidence in CNN models for skin cancer diagnosis. The methodology is based on SoftMax response(SR), MC dropout and risk-coverage performance evaluation metric. Risk-coverage curves gives physicians and dermatologist information about the expected rate of misclassification by a model. This enable them to measure the reliability of the classifier's predictions and inform their decision during skin cancer diagnosis. MC dropout uncertainty estimate was shown to increase accuracy for Melanoma detection by 1.48%. The proposed selective classifier achieved increase melanoma detection. The sensitivity of melanoma increased by 9.91% and 9.73% after selective classification at a coverage of 0.7. This study showed that selective classification and uncertainty estimation can be combined to promote adoption of CNNs in CADs for skin lesions classification.

Artificial Intelligence and Its Effect on Dermatologists’ Accuracy in Dermoscopic Melanoma Image Classification: Web-Based Survey Study

Journal of Medical Internet Research, 2020

Background: Early detection of melanoma can be lifesaving but this remains a challenge. Recent diagnostic studies have revealed the superiority of artificial intelligence (AI) in classifying dermoscopic images of melanoma and nevi, concluding that these algorithms should assist a dermatologist's diagnoses. Objective: The aim of this study was to investigate whether AI support improves the accuracy and overall diagnostic performance of dermatologists in the dichotomous image-based discrimination between melanoma and nevus. Methods: Twelve board-certified dermatologists were presented disjoint sets of 100 unique dermoscopic images of melanomas and nevi (total of 1200 unique images), and they had to classify the images based on personal experience alone (part I) and with the support of a trained convolutional neural network (CNN, part II). Additionally, dermatologists were asked to rate their confidence in their final decision for each image. Results: While the mean specificity of the dermatologists based on personal experience alone remained almost unchanged (70.6% vs 72.4%; P=.54) with AI support, the mean sensitivity and mean accuracy increased significantly (59.4% vs 74.6%;

ARTIFICIAL INTELLIGENCE TRAINED DERMATOLOGIST-LEVEL CLASSIFICATION OF SKIN CANCER

Skin cancer, the most common human malignancy, is primarily diagnosed visually, beginning with an initial clinical screening and followed potentially by dermoscopic analysis, a biopsy and histopathological examination .Recently, there has been great interest in developing Artificial Intelligence (AI) enabled computer-aided diagnostics solutions for the diagnosis of skin cancer. Today, computer aided diagnosis is a common occurrence in hospitals. With image recognition, computers are able to detect signs of breast cancer and different kinds of lung diseases. For a convolutional neural network (CNN) that classifies images, the accuracy depends on the amount of data it is trained on and performs better as the amount of training data increase. This introduces a need for relevant images for the classes the classifier is supposed to differentiate between. However, when input data is increased, so does the computational cost, leading to a trade-off between accuracy and computational time. In a study by Cho et al. the accuracy improvement stagnates, when comparing the accuracy with different amounts of training data. With the increasing incidence of skin cancers, low awareness among a growing population, and a lack of adequate clinical expertise and services, there is an immediate need for AI systems to assist clinicians in this domain.

Deep neural networks are superior to dermatologists in melanoma image classification

European Journal of Cancer, 2019

Background: Melanoma is the most dangerous type of skin cancer but is curable if detected early. Recent publications demonstrated that artificial intelligence is capable in classifying images of benign nevi and melanoma with dermatologist-level precision. However, a statistically significant improvement compared with dermatologist classification has not been reported to date. Methods: For this comparative study, 4204 biopsy-proven images of melanoma and nevi (1:1) were used for the training of a convolutional neural network (CNN). New techniques of deep learning were integrated. For the experiment, an additional 804 biopsy-proven dermoscopic images of melanoma and nevi (1:1) were randomly presented to dermatologists of nine German university hospitals, who evaluated the quality of each image and stated their

Deep Neural Frameworks Improve the Accuracy of General Practitioners in the Classification of Pigmented Skin Lesions

Diagnostics

This study evaluated whether deep learning frameworks trained in large datasets can help non-dermatologist physicians improve their accuracy in categorizing the seven most common pigmented skin lesions. Open-source skin images were downloaded from the International Skin Imaging Collaboration (ISIC) archive. Different deep neural networks (DNNs) (n = 8) were trained based on a random dataset constituted of 8015 images. A test set of 2003 images was used to assess the classifiers’ performance at low (300 × 224 RGB) and high (600 × 450 RGB) image resolution and aggregated data (age, sex and lesion localization). We also organized two different contests to compare the DNN performance to that of general practitioners by means of unassisted image observation. Both at low and high image resolution, the DNN framework differentiated dermatological images with appreciable performance. In all cases, the accuracy was improved when adding clinical data to the framework. Finally, the least accura...