Staining Invariant Features for Improving Generalization of Deep Convolutional Neural Networks in Computational Pathology - PubMed (original) (raw)

Staining Invariant Features for Improving Generalization of Deep Convolutional Neural Networks in Computational Pathology

Sebastian Otálora et al. Front Bioeng Biotechnol. 2019.

Abstract

One of the main obstacles for the implementation of deep convolutional neural networks (DCNNs) in the clinical pathology workflow is their low capability to overcome variability in slide preparation and scanner configuration, that leads to changes in tissue appearance. Some of these variations may not be not included in the training data, which means that the models have a risk to not generalize well. Addressing such variations and evaluating them in reproducible scenarios allows understanding of when the models generalize better, which is crucial for performance improvements and better DCNN models. Staining normalization techniques (often based on color deconvolution and deep learning) and color augmentation approaches have shown improvements in the generalization of the classification tasks for several tissue types. Domain-invariant training of DCNN's is also a promising technique to address the problem of training a single model for different domains, since it includes the source domain information to guide the training toward domain-invariant features, achieving state-of-the-art results in classification tasks. In this article, deep domain adaptation in convolutional networks (DANN) is applied to computational pathology and compared with widely used staining normalization and color augmentation methods in two challenging classification tasks. The classification tasks rely on two openly accessible datasets, targeting Gleason grading in prostate cancer, and mitosis classification in breast tissue. The benchmark of the different techniques and their combination in two DCNN architectures allows us to assess the generalization abilities and advantages of each method in the considered classification tasks. The code for reproducing our experiments and preprocessing the data is publicly available. Quantitative and qualitative results show that the use of DANN helps model generalization to external datasets. The combination of several techniques to manage color heterogeneity suggests that several methods together, such as color augmentation methods with DANN training, can generalize even further. The results do not show a single best technique among the considered methods, even when combining them. However, color augmentation and DANN training obtain most often the best results (alone or combined with color normalization and color augmentation). The statistical significance of the results and the embeddings visualizations provide useful insights to design DCNN that generalizes to unseen staining appearances. Furthermore, in this work, we release for the first time code for DANN evaluation in open access datasets for computational pathology. This work opens the possibility for further research on using DANN models together with techniques that can overcome the tissue preparation differences across datasets to tackle limited generalization.

Keywords: adversarial neural networks; color augmentation; color normalization; digital pathology; domain shift; staining normalization.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Test images with different staining conditions can affect the performance of a DCNN model trained with images with a limited set of similar staining and preparation methods: Gleason pattern 3 (top row) and pattern 4 (bottom row) patches; the internal test set probability (third column) can lead to biased estimates of the performance of the model. The last column shows how probability drops in the baseline DCNN when predicting the class in patches with different staining.

Figure 2

Figure 2

GP3 patch locations extracted (red bounding boxes) from slide TCGA-2A-A8VL (Left) belonging to the training set and the slide TCGA-EJ-7321 (Right) from the internal test set using the heatmap resulting from Equation (2).

Figure 3

Figure 3

Mitotic figure examples with an original patch size of 96 × 96 pixels. Staining differences between the internal and the test sets are evident. These changes are also noticeable in the quantitative results of section 4.

Figure 4

Figure 4

Example patches from each partition for the Gleason pattern classification task. In this case the external test set differs considerably from the training, validation, and test partitions.

Figure 5

Figure 5

Staining normalization scheme. First, a target or template image is selected to extract its staining concentrations. With brightness normalization, the images are less dependent regarding brightness. Therefore, brightness standardization is done by modifying the luminosity channel in the LAB color space such that at least 5% of the pixels are white. Then, the staining concentration matrix from the brightness-corrected template image is extracted using the Macenko method (Macenko et al., 2009). Finally, all the images in the dataset are normalized using the fixed template staining (*indicates pixelwise multiplication with the template).

Figure 6

Figure 6

Examples of random color augmentations for training patches induced by Equation (3).

Figure 7

Figure 7

Domain adversarial scheme: A domain-balanced batch of images is passed as input to the network that has two types of outputs: the task classification output and the domain classification output. The shared representation θ_f_ is optimal for the task classification and unable to discriminate between the n domains.

Figure 8

Figure 8

UMAP embedding of the 128-dimensional first fully connected layer features of the task branch. The points are 80 randomly sampled patches of the external test set using the baseline model with dropout: Full disks correspond to mitotic embeddings, empty circles correspond to non-mitotic ones. Red elements are from a different center than the black ones. The baseline DCNN model of the first cell shows how same-center features are clustered (ellipses), next cell shows how the baseline model with dropout drastically changes this by having a better intra-class variability than the baseline feature embeddings, presumably linked to the regularization effect induced by dropout. Staining normalization alone shows an inter-class mixed embedding, which depicts the possible overfitting in the training sources. Color augmentation also shows an excellent intra-class mixing while at the same time nicely separating mitosis from non-mitosis samples. There are local clusters in the non-mitotic samples that are visible. The joint color augmentation and staining normalization model display a similar behavior to color augmentation but with fewer separated inter-class embeddings. Finally, DANN embeddings show how the intra-class embeddings are mixed while retaining the inter-class separability, showing that it is feasible to learn the desired property of staining-invariant features.

Similar articles

Cited by

References

    1. Al-Janabi S., Huisman A., Van Diest P. J. (2012). Digital pathology: current status and future perspectives. Histopathology 61, 1–9. 10.1111/j.1365-2559.2011.03814.x - DOI - PubMed
    1. Arvaniti E., Fricker K. S., Moret M., Rupp N., Hermanns T., Fankhauser C., et al. . (2018). Automated gleason grading of prostate cancer tissue microarrays via deep learning. Sci. Rep. 8:12054. 10.1038/s41598-018-30535-1 - DOI - PMC - PubMed
    1. Bandi P., Geessink O., Manson Q., Van Dijk M., Balkenhol M., Hermsen M., et al. . (2019). From detection of individual metastases to classification of lymph node status at the patient level: the camelyon17 challenge. IEEE Trans. Med. Imaging 38, 550–560. 10.1109/TMI.2018.2867350 - DOI - PubMed
    1. Bejnordi B. E., Litjens G., Timofeeva N., Otte-Höller I., Homeyer A., Karssemeijer N., et al. . (2016). Stain specific standardization of whole-slide histopathological images. IEEE Trans. Med. Imaging 35, 404–415. 10.1109/TMI.2015.2476509 - DOI - PubMed
    1. Bejnordi B. E., Veta M., Van Diest P. J., Van Ginneken B., Karssemeijer N., Litjens G., et al. (2017). Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318, 2199–2210. 10.1001/jama.2017.14585 - DOI - PMC - PubMed

LinkOut - more resources