Weakly-supervised High-resolution Segmentation of Mammography Images for Breast Cancer Diagnosis - PubMed (original) (raw)

Weakly-supervised High-resolution Segmentation of Mammography Images for Breast Cancer Diagnosis

Kangning Liu et al. Proc Mach Learn Res. 2021 Jul.

Abstract

In the last few years, deep learning classifiers have shown promising results in image-based medical diagnosis. However, interpreting the outputs of these models remains a challenge. In cancer diagnosis, interpretability can be achieved by localizing the region of the input image responsible for the output, i.e. the location of a lesion. Alternatively, segmentation or detection models can be trained with pixel-wise annotations indicating the locations of malignant lesions. Unfortunately, acquiring such labels is labor-intensive and requires medical expertise. To overcome this difficulty, weakly-supervised localization can be utilized. These methods allow neural network classifiers to output saliency maps highlighting the regions of the input most relevant to the classification task (e.g. malignant lesions in mammograms) using only image-level labels (e.g. whether the patient has cancer or not) during training. When applied to high-resolution images, existing methods produce low-resolution saliency maps. This is problematic in applications in which suspicious lesions are small in relation to the image size. In this work, we introduce a novel neural network architecture to perform weakly-supervised segmentation of high-resolution images. The proposed model selects regions of interest via coarse-level localization, and then performs fine-grained segmentation of those regions. We apply this model to breast cancer diagnosis with screening mammography, and validate it on a large clinically-realistic dataset. Measured by Dice similarity score, our approach outperforms existing methods by a large margin in terms of localization performance of benign and malignant lesions, relatively improving the performance by 39.6% and 20.0%, respectively. Code and the weights of some of the models are available at https://github.com/nyukat/GLAM.

Keywords: breast cancer screening; high-resolution medical images; weakly supervised learning.

PubMed Disclaimer

Figures

Figure 6:

Figure 6:

Architecture of the global module. Both high-level (h2∈ℝ46×30×256) and low-level feature maps (h0∈ℝ184×120×64, h1∈ℝ92×60×128) are utilized to obtain multi-scale saliency maps (S0∈ℝ184×120×2, S1∈ℝ92×60×2 and S2∈ℝ46×30×2). For each scale n ∈ {0, 1, 2}, we use top t% pooling to transform saliency maps Sn into class predictions y˜n. A combined cross-entropy loss from the three scales is used for training. We compute the global saliency map Sg by combining the individual saliency maps from three scales (S0, S1 and S2). We apply maximum pooling on the spatial dimension of h2∈ℝ46×30×256 to obtain the representation vector zg∈ℝ256, which is fed to the fusion module during joint training.

Figure 7:

Figure 7:

Architecture of the local module. The backbone network (left) is applied to each of the selected patches x˜k(k∈1,…,K) to extract a patch-level saliency map Ak and a feature map h k. The patch-level saliency maps can be combined using two aggregation strategies: (1) Concatenation-based aggregation (top right), where we concatenate the saliency maps spatially and apply top t% pooling. (2) Attention-based aggregation (bottom right), where top t% pooling is used to obtain patch-level predictions y^k from each patch, which are then combined using attention weights α i ∈ [0, 1] computed by the gated attention mechanism (Ilse et al., 2018). That is, the classification prediction is computed as y^l=∑i=1Kαiy^i, and the representation vector as zl=∑i=1Kαizi.

Figure 8:

Figure 8:

Dice score of the local module when using one or three patches as its input during inference. The results correspond to 400 examples from the validation set. Using one patch leads to better performance for most of the images, but completely fails for a some. Failure may occur when the patch selected by the global module for the positive example does not contain lesions.

Figure 1:

Figure 1:

Comparison of saliency maps generated by CAM (Zhou et al., 2016), GMIC (Shen et al., 2021), and the proposed method on a mammography image containing a malignant lesion (first row, red) and a benign lesion (second row, green). Both CAM and GMIC produce coarse saliency maps that fail to localize the lesion accurately. The proposed method generates a high-resolution saliency map that precisely localizes the lesions.

Figure 2:

Figure 2:

Inference pipeline of GLAM. 1) The global network f g is applied to the whole image x to obtain a coarse image-level segmentation map Sg. 2) Based on this coarse-level segmentation, several patches are extracted from the input image. 3) The local network f l processes these patches to generate a high-resolution saliency map Sl.

Figure 3:

Figure 3:

Proposed training strategy. 1) Train the global module and select the best segmentation model. 2) Freeze the global module and use it to extract input patches for the local module. 3) Train the local module on the selected patches. 4) Joint training with the fusion module. We use BCE loss and sparsity loss to train the system.

Figure 4:

Figure 4:

Scatter plot of Dice score of the global Sg and local Sl modules for 400 validation examples. Sl outperforms Sg for most small lesions, but may miss some larger lesions and fails entirely if the wrong patches are selected as input.

Figure 5:

Figure 5:

Two failure cases of the local module. (Top) The lesion is larger than the input patch, so Sl only captures it partially. (Bottom) The input patches to Sl (in blue) do not cover the lesion.

Similar articles

Cited by

References

    1. Baumgartner Christian F, Kamnitsas Konstantinos, Matthew Jacqueline, Fletcher Tara P, Smith Sandra, Koch Lisa M, Kainz Bernhard, and Rueckert Daniel. Sononet: real-time detection and localisation of fetal standard scan planes in freehand ultrasound. IEEE Transactions on Medical Imaging, 36(11), 2017. - PMC - PubMed
    1. Bekkers Erik J, Lafarge Maxime W, Veta Mitko, Eppenhof Koen AJ, Pluim Josien PW, and Duits Remco. Roto-translation covariant convolutional networks for medical image analysis. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2018.
    1. Bergstra James and Bengio Yoshua. Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(Feb), 2012.
    1. Bray Freddie, Ferlay Jacques, Soerjomataram Isabelle, Siegel Rebecca L, Torre Lindsey A, and Jemal Ahmedin. Global cancer statistics 2018: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: a cancer journal for clinicians, 68(6):394–424, 2018. - PubMed
    1. Ching Travers, Himmelstein Daniel S, Beaulieu-Jones Brett K, Kalinin Alexandr A, Do Brian T, Way Gregory P, Ferrero Enrico, Agapow Paul-Michael, Zietz Michael, Hoffman Michael M, et al. Opportunities and obstacles for deep learning in biology and medicine. Journal of The Royal Society Interface, 15(141):20170387, 2018. - PMC - PubMed

LinkOut - more resources