Domain Adaptive Semantic Segmentation via Entropy-Ranking and Uncertain Learning-Based Self-Training (original) (raw)

Self-Training for Class-Incremental Semantic Segmentation

IEEE Transactions on Neural Networks and Learning Systems, 2022

We study incremental learning for semantic segmentation where when learning new classes we have no access to the labeled data of previous tasks. When incrementally learning new classes, deep neural networks suffer from catastrophic forgetting of previous learned knowledge. To address this problem, we propose to apply a self-training approach that leverages unlabeled data, which is used for rehearsal of previous knowledge. Additionally, conflict reduction is proposed to resolve the conflicts of pseudo labels generated from both the old and new models. We show that maximizing self-entropy can further improve results by smoothing the overconfident predictions. The experiments demonstrate state-of-the-art results: obtaining a relative gain of up to 114% on Pascal-VOC 2012 and 8.5% on the more challenging ADE20K compared to previous state-of-the-art methods.

Exploiting Negative Learning for Implicit Pseudo Label Rectification in Source-Free Domain Adaptive Semantic Segmentation

ArXiv, 2021

It is desirable to transfer the knowledge stored in a welltrained source model onto non-annotated target domain in the absence of source data. However, state-of-the-art methods for source free domain adaptation (SFDA) are subject to strict limits: 1) access to internal specifications of source models is a must; and 2) pseudo labels should be clean during selftraining, making critical tasks relying on semantic segmentation unreliable. Aiming at these pitfalls, this study develops a domain adaptive solution to semantic segmentation with pseudo label rectification (namely PR-SFDA), which operates in two phases: 1) Confidence-regularized unsupervised learning: Maximum squares loss applies to regularize the target model to ensure the confidence in prediction; and 2) Noiseaware pseudo label learning: Negative learning enables tolerance to noisy pseudo labels in training, meanwhile positive learning achieves fast convergence. Extensive experiments have been performed on domain adaptive sem...

Self-training via Metric Learning for Source-Free Domain Adaptation of Semantic Segmentation

arXiv (Cornell University), 2022

Unsupervised source-free domain adaptation methods aim to train a model to be used in the target domain utilizing the pretrained source-domain model and unlabeled target-domain data, where the source data may not be accessible due to intellectual property or privacy issues. These methods frequently utilize self-training with pseudo-labeling thresholded by prediction confidence. In a source-free scenario, only supervision comes from target data, and thresholding limits the contribution of the self-training. In this study, we utilize self-training with a mean-teacher approach. The student network is trained with all predictions of the teacher network. Instead of thresholding the predictions, the gradients calculated from the pseudo-labels are weighted based on the reliability of the teacher's predictions. We propose a novel method that uses proxy-based metric learning to estimate reliability. We train a metric network on the encoder features of the teacher network. Since the teacher is updated with the moving average, the encoder feature space is slowly changing. Therefore, the metric network can be updated in training time, which enables end-to-end training. We also propose a metricbased online ClassMix method to augment the input of the student network where the patches to be mixed are decided based on the metric reliability. We evaluated our method in synthetic-to-real and cross-city scenarios. The benchmarks show that our method significantly outperforms the existing state-ofthe-art methods.

ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019

Semantic segmentation is a key problem for many computer vision tasks. While approaches based on convolutional neural networks constantly break new records on different benchmarks, generalizing well to diverse testing environments remains a major challenge. In numerous real world applications, there is indeed a large gap between data distributions in train and test domains, which results in severe performance loss at run-time. In this work, we address the task of unsupervised domain adaptation in semantic segmentation with losses based on the entropy of the pixel-wise predictions. To this end, we propose two novel, complementary methods using (i) an entropy loss and (ii) an adversarial loss respectively. We demonstrate state-of-theart performance in semantic segmentation on two challenging "synthetic-2-real" setups 1 and show that the approach can also be used for detection.

Exploiting Diverse Characteristics and Adversarial Ambivalence for Domain Adaptive Segmentation

Proceedings of the AAAI Conference on Artificial Intelligence

Adapting semantic segmentation models to new domains is an important but challenging problem. Recently enlightening progress has been made, but the performance of existing methods is unsatisfactory on real datasets where the new target domain comprises of heterogeneous sub-domains (e.g. diverse weather characteristics). We point out that carefully reasoning about the multiple modalities in the target domain can improve the robustness of adaptation models. To this end, we propose a condition-guided adaptation framework that is empowered by a special attentive progressive adversarial training (APAT) mechanism and a novel self-training policy. The APAT strategy progressively performs condition-specific alignment and attentive global feature matching. The new self-training scheme exploits the adversarial ambivalences of easy and hard adaptation regions and the correlations among target sub-domains effectively. We evaluate our method (DCAA) on various adaptation scenarios where the targe...

Exploring Robustness of Unsupervised Domain Adaptation in Semantic Segmentation

2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021

Recent studies imply that deep neural networks are vulnerable to adversarial examples, i.e., inputs with a slight but intentional perturbation are incorrectly classified by the network. Such vulnerability makes it risky for some security-related applications (e.g., semantic segmentation in autonomous cars) and triggers tremendous concerns on the model reliability. For the first time, we comprehensively evaluate the robustness of existing UDA methods and propose a robust UDA approach. It is rooted in two observations: i) the robustness of UDA methods in semantic segmentation remains unexplored, which poses a security concern in this field; and ii) although commonly used self-supervision (e.g., rotation and jigsaw) benefits model robustness in classification and recognition tasks, they fail to provide the critical supervision signals that are essential in semantic segmentation. These observations motivate us to propose adversarial self-supervision UDA (or ASSUDA) that maximizes the agreement between clean images and their adversarial examples by a contrastive loss in the output space. Extensive empirical studies on commonly used benchmarks demonstrate that ASSUDA is resistant to adversarial attacks.

Unsupervised Domain Adaptation in Semantic Segmentation: A Review

Technologies

The aim of this paper is to give an overview of the recent advancements in the Unsupervised Domain Adaptation (UDA) of deep networks for semantic segmentation. This task is attracting a wide interest since semantic segmentation models require a huge amount of labeled data and the lack of data fitting specific requirements is the main limitation in the deployment of these techniques. This field has been recently explored and has rapidly grown with a large number of ad-hoc approaches. This motivates us to build a comprehensive overview of the proposed methodologies and to provide a clear categorization. In this paper, we start by introducing the problem, its formulation and the various scenarios that can be considered. Then, we introduce the different levels at which adaptation strategies may be applied: namely, at the input (image) level, at the internal features representation and at the output level. Furthermore, we present a detailed overview of the literature in the field, dividi...

Cross-Loss Pseudo Labeling for Semi-Supervised Segmentation

IEEE Access

Training semantic segmentation models requires pixel-level annotations, leading to a significant labeling cost in dataset creation. To alleviate this issue, recent research has focused on semi-supervised learning, which utilizes only a small amount of annotation. In this context, pseudo labeling techniques are frequently employed to assign labels to unlabeled data based on the model's predictions. However, there are fundamental limitations associated with the widespread application of pseudo labeling in this regard. Since pseudo labels are generally determined by the model's predictions, these labels could be overconfidently assigned even for erroneous predictions, especially when the model has a confirmation bias. We observed that the overconfident prediction tendency of the cross-entropy loss exacerbates this issue, and to address it, we discover the focal loss, known for enabling more reliable confidence estimation, can complement the cross-entropy loss. The cross-entropy loss produces rich labels since it tends to be overconfident. On the other hand, the focal loss provides more conservative confidence, therefore, it produces a smaller number of pseudo labels compared to the cross-entropy. Based on such complementary mechanisms of two loss functions, we propose a simple yet effective pseudo labeling technique, Cross-Loss Pseudo Labeling (CLP), that alleviates the confirmation bias and lack of pseudo label problems. Intuitively, we can mitigate the overconfidence of the cross-entropy with the conservative predictions of the focal loss, while increasing the number of pseudo labels marked by the focal loss based on the cross-entropy. Additionally, CLP also contributes to improving the performance of the tail classes in class-imbalanced datasets through the class bias mitigation effect of the focal loss. In experimental results, our simple CLP improves mIoU by up to +10.4%p compared to a supervised model when only 1/32 true labels are available on PASCAL VOC 2012, and it surpassed the performance of the state-of-the-art methods.

CLUDA : Contrastive Learning in Unsupervised Domain Adaptation for Semantic Segmentation

2022

In this work, we propose CLUDA, a simple, yet novel method for performing unsupervised domain adaptation (UDA) for semantic segmentation by incorporating contrastive losses into a student-teacher learning paradigm, that makes use of pseudo-labels generated from the target domain by the teacher network. More specifically, we extract a multi-level fused-feature map from the encoder, and apply contrastive loss across different classes and different domains, via source-target mixing of images. We consistently improve performance on various feature encoder architectures and for different domain adaptation datasets in semantic segmentation. Furthermore, we introduce a learned-weighted contrastive loss to improve upon on a state-of-the-art multi-resolution training approach in UDA. We produce state-of-the-art results on GTA → Cityscapes (74.4 mIOU, +0.6) and Synthia → Cityscapes (67.2 mIOU, +1.4) datasets. CLUDA effectively demonstrates contrastive learning in UDA as a generic method, which can be easily integrated into any existing UDA for semantic segmentation tasks. Please refer to the supplementary material for the details on implementation.

Label-Driven Reconstruction for Domain Adaptation in Semantic Segmentation

Computer Vision – ECCV 2020, 2020

Unsupervised domain adaptation enables to alleviate the need for pixel-wise annotation in the semantic segmentation. One of the most common strategies is to translate images from the source domain to the target domain and then align their marginal distributions in the feature space using adversarial learning. However, source-to-target translation enlarges the bias in translated images and introduces extra computations, owing to the dominant data size of the source domain. Furthermore, consistency of the joint distribution in source and target domains cannot be guaranteed through global feature alignment. Here, we present an innovative framework, designed to mitigate the image translation bias and align cross-domain features with the same category. This is achieved by 1) performing the target-to-source translation and 2) reconstructing both source and target images from their predicted labels. Extensive experiments on adapting from synthetic to real urban scene understanding demonstrate that our framework competes favorably against existing state-of-the-art methods.