Incremental Multi-Target Domain Adaptation for Object Detection with Efficient Domain Transfer (original) (raw)
Related papers
One-Shot Unsupervised Domain Adaptation for Object Detection
2020 International Joint Conference on Neural Networks (IJCNN), 2020
The existing unsupervised domain adaptation (UDA) methods require not only labeled source samples but also a large number of unlabeled target samples for domain adaptation. Collecting these target samples is generally time-consuming, which hinders the rapid deployment of these UDA methods in new domains. Besides, most of these UDA methods are developed for image classification. In this paper, we address a new problem called one-shot unsupervised domain adaptation for object detection, where only one unlabeled target sample is available. To the best of our knowledge, this is the first time this problem is investigated. To solve this problem, a one-shot feature alignment (OSFA) algorithm is proposed to align the low-level features of the source domain and the target domain. Specifically, the domain shift is reduced by aligning the average activation of the feature maps in the lower layer of CNN. The proposed OSFA is evaluated under two scenarios: adapting from clear weather to foggy weather; adapting from synthetic images to real-world images. Experimental results show that the proposed OSFA can significantly improve the object detection performance in target domain compared to the baseline model without domain adaptation.
Adapting Object Detectors with Conditional Domain Normalization
Computer Vision – ECCV 2020
Real-world object detectors are often challenged by the domain gaps between different datasets. In this work, we present the Conditional Domain Normalization (CDN) to bridge the domain distribution gap. CDN is designed to encode different domain inputs into a shared latent space, where the features from different domains carry the same domain attribute. To achieve this, we first disentangle the domain-specific attribute out of the semantic features from source domain via a domain embedding module, which learns a domain-vector to characterize the domain attribute information. Then this domain-vector is used to encode the features from target domain through a conditional normalization, resulting in different domains' features carrying the same domain attribute. We incorporate CDN into various convolution stages of an object detector to adaptively address the domain shifts of different level's representation. In contrast to existing adaptation works that conduct domain confusion learning on semantic features to remove domainspecific factors, CDN aligns different domain distributions by modulating the semantic features of target domains conditioned on the learned domain-vector of the source domain. Extensive experiments show that CDN outperforms existing methods remarkably on both real-to-real and synthetic-to-real adaptation benchmarks, including 2D image detection and 3D point cloud detection.
Diversify and Match: A Domain Adaptive Representation Learning Paradigm for Object Detection
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019
We introduce a novel unsupervised domain adaptation approach for object detection. We aim to alleviate the imperfect translation problem of pixel-level adaptations, and the source-biased discriminativity problem of feature-level adaptations simultaneously. Our approach is composed of two stages, i.e., Domain Diversification (DD) and Multidomain-invariant Representation Learning (MRL). At the DD stage, we diversify the distribution of the labeled data by generating various distinctive shifted domains from the source domain. At the MRL stage, we apply adversarial learning with a multi-domain discriminator to encourage feature to be indistinguishable among the domains. DD addresses the source-biased discriminativity, while MRL mitigates the imperfect image translation. We construct a structured domain adaptation framework for our learning paradigm and introduce a practical way of DD for implementation. Our method outperforms the state-of-the-art methods by a large margin of 3% ∼ 12% in terms of mean average precision (mAP) on various datasets.
Targeted adversarial discriminative domain adaptation
Geospatial Informatics XI, 2021
Domain adaptation is a technology enabling aided target recognition and other algorithms for environments and targets with data or labeled data that is scarce. Recent advances in unsupervised domain adaptation have demonstrated excellent performance but only when the domain shift is relatively small. We proposed targeted adversarial discriminative domain adaptation (T-ADDA), a semi-supervised domain adaptation method that extends the ADDA framework. By providing at least one labeled target image per class, used as a cue to guide the adaption, T-ADDA significantly boosts the performance of ADDA and is applicable to the challenging scenario in which the sets of targets in the source and target domains are not the same. The efficacy of T-ADDA is demonstrated by cross-domain, cross-sensor, and cross-target experiments using the common digits datasets and several aerial image datasets. Results demonstrate an average increase of 15% improvement with T-ADDA over ADDA using just a few labeled images when adapting to a small domain shift and afforded a 60% improvement when adapting to large domain shifts. © The Authors. Published by SPIE under a Creative Commons Attribution 4.0 Unported License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.
CoNMix for Source-free Single and Multi-target Domain Adaptation
2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
This work introduces the novel task of Source-free Multitarget Domain Adaptation and proposes adaptation framework comprising of Consistency with Nuclear-Norm Maximization and MixUp knowledge distillation (CoNMix) as a solution to this problem. The main motive of this work is to solve for Single and Multi target Domain Adaptation (SMTDA) for the source-free paradigm, which enforces a constraint where the labeled source data is not available during target adaptation due to various privacyrelated restrictions on data sharing. The source-free approach leverages target pseudo labels, which can be noisy, to improve the target adaptation. We introduce consistency between label preserving augmentations and utilize pseudo label refinement methods to reduce noisy pseudo labels. Further, we propose novel MixUp Knowledge Distillation (MKD) for better generalization on multiple target domains using various source-free STDA models. We also show that the Vision Transformer (VT) backbone gives better feature representation with improved domain transferability and class discriminability. Our proposed framework achieves the state-of-the-art (SOTA) results in various paradigms of source-free STDA and MTDA settings on popular domain adaptation datasets like Office-Home, Office-Caltech, and DomainNet. Project
Multi-Adversarial Domain Adaptation
2018
Recent advances in deep domain adaptation reveal that adversarial learning can be embedded into deep networks to learn transferable features that reduce distribution discrepancy between the source and target domains. Existing domain adversarial adaptation methods based on single domain discriminator only align the source and target data distributions without exploiting the complex multimode structures. In this paper, we present a multi-adversarial domain adaptation (MADA) approach, which captures multimode structures to enable fine-grained alignment of different data distributions based on multiple domain discriminators. The adaptation can be achieved by stochastic gradient descent with the gradients computed by back-propagation in linear-time. Empirical evidence demonstrates that the proposed model outperforms state of the art methods on standard domain adaptation datasets.
A Review of Single-Source Deep Unsupervised Visual Domain Adaptation
IEEE Transactions on Neural Networks and Learning Systems, 2020
Large-scale labeled training datasets have enabled deep neural networks to excel across a wide range of benchmark vision tasks. However, in many applications, it is prohibitively expensive and time-consuming to obtain large quantities of labeled data. To cope with limited labeled training data, many have attempted to directly apply models trained on a large-scale labeled source domain to another sparsely labeled or unlabeled target domain. Unfortunately, direct transfer across domains often performs poorly due to the presence of domain shift or dataset bias. Domain adaptation is a machine learning paradigm that aims to learn a model from a source domain that can perform well on a different (but related) target domain. In this paper, we review the latest single-source deep unsupervised domain adaptation methods focused on visual tasks and discuss new perspectives for future research. We begin with the definitions of different domain adaptation strategies and the descriptions of existing benchmark datasets. We then summarize and compare different categories of single-source domain adaptation methods, including discrepancy-based methods, adversarial discriminative methods, adversarial generative methods, and self-supervisionbased methods. Finally, we discuss future research directions with challenges and possible solutions.
Automatic Adaptation of Object Detectors to New Domains Using Self-Training
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
This work addresses the unsupervised adaptation of an existing object detector to a new target domain. We assume that a large number of unlabeled videos from this domain are readily available. We automatically obtain labels on the target data by using high-confidence detections from the existing detector, augmented with hard (misclassified) examples acquired by exploiting temporal cues using a tracker. These automatically-obtained labels are then used for retraining the original model. A modified knowledge distillation loss is proposed, and we investigate several ways of assigning soft-labels to the training examples from the target domain. Our approach is empirically evaluated on challenging face and pedestrian detection tasks: a face detector trained on WIDER-Face, which consists of highquality images crawled from the web, is adapted to a largescale surveillance data set; a pedestrian detector trained on clear, daytime images from the BDD-100K driving data set is adapted to all other scenarios such as rainy, foggy, nighttime. Our results demonstrate the usefulness of incorporating hard examples obtained from tracking, the advantage of using soft-labels via distillation loss versus hard-labels, and show promising performance as a simple method for unsupervised domain adaptation of object detectors, with minimal dependence on hyper-parameters. Code and models are available at http://vis-www.cs.umass.edu/ unsupVideo/ Face detection: WIDER à CS6 Pedestrian detection: BDD(clear,daytime) à BDD(rest)
Reiterative Domain Aware Multi-target Adaptation
Lecture Notes in Computer Science, 2021
Most domain adaptation methods focus on single-sourcesingle-target adaptation settings. Multi-target domain adaptation is a powerful extension in which a single classifier is learned for multiple unlabeled target domains. To build a multi-target classifier, it is important to have: a feature extractor that generalizes well across domains; and effective aggregation of features from the labeled source and different unlabeled target domains. Towards the first, we use the recently popular Transformer as a feature extraction backbone. Towards the second, we use a co-teaching-based approach using a dual-classifier head, one of which is based on the graph neural network. The proposed approach uses a sequential adaptation strategy that adapts one domain at a time starting from the target domains that are more similar to the source, assuming that the network finds it easier to adapt to such target domains. After adapting on each target, samples with a softmax-based confidence score greater than a threshold are added to the pseudo-source, thus aggregating knowledge from different domains. However, softmax is not entirely trustworthy as a confidence score and may generate a high score for unreliable samples if trained for many iterations. To mitigate this effect, we adopt a reiterative approach, where we reduce target adaptation iterations, however, reiterate multiple times over the target domains. The experimental evaluation on the Office-Home, Office-31 and DomainNet datasets shows significant improvement over the existing methods. We have achieved 10.7% average improvement in Office-Home dataset over the state-of-art methods.
Unsupervised Multi-Target Domain Adaptation Through Knowledge Distillation
2021 IEEE Winter Conference on Applications of Computer Vision (WACV), 2021
Unsupervised domain adaptation (UDA) seeks to alleviate the problem of domain shift between the distribution of unlabeled data from the target domain w.r.t. labeled data from source domain. While the singletarget domain scenario is well studied in UDA literature, the Multi-Target Domain Adaptation (MTDA) setting remains largely unexplored despite its practical importance. For instance, in video surveillance applications, each camera of a distributed network corresponds to a different non-overlapping viewpoint (target domain). MTDA problem can be addressed by adapting one specialized model per target domain, although this solution is too costly in many real-world applications. It has also been addressed by blending target data for multi-domain adaptation to train a common model, yet this may lead to a reduction in model specificity and accuracy. In this paper, we propose a new unsupervised MTDA approach to train a common CNN that can generalize well across multiple target domains. Our approach-the Multi-Teacher MTDA (MT-MTDA)-relies on multi-teacher knowledge distillation (KD) in order to distill target domain knowledge from multiple teachers to a common student. Inspired by a common education scenario, a different target domain is assigned to each teacher model for UDA, and these teachers alternatively distill their knowledge to one common student model. The KD process is performed in a progressive manner, where the student is trained by each teacher on how to perform UDA, instead of directly learning domain adapted features. Finally, instead of directly combining the knowledge from each teacher, MT-MTDA alternates between teachers that distill knowledge in order to preserve the specificity of each target (teacher) when learning to adapt the student. MT-MTDA is compared against state-of-the-art methods on Office-Home, Office31, and Digits-5 datasets, and empirical results show that our proposed model can provide a considerably higher level of accuracy across multiple target domains.