Boosted Unsupervised Multi-Source Selection for Domain Adaptation (original) (raw)

Boosting for transfer learning with multiple sources

2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010

Transfer learning allows leveraging the knowledge of source domains, available a priori, to help training a classifier for a target domain, where the available data is scarce. The effectiveness of the transfer is affected by the relationship between source and target. Rather than improving the learning, brute force leveraging of a source poorly related to the target may decrease the classifier performance. One strategy to reduce this negative transfer is to import knowledge from multiple sources to increase the chance of finding one source closely related to the target. This work extends the boosting framework for transferring knowledge from multiple sources. Two new algorithms, MultiSource-TrAdaBoost, and TaskTrAdaBoost, are introduced, analyzed, and applied for object category recognition and specific object detection. The experiments demonstrate their improved performance by greatly reducing the negative transfer as the number of sources increases. TaskTrAdaBoost is a fast algorithm enabling rapid retraining over new targets.

Sample Selection for Universal Domain Adaptation

2021

This paper studies the problem of unsupervised domain adaption in the universal scenario, in which only some of the classes are shared between the source and target domains. We present a scoring scheme that is effective in identifying the samples of the shared classes. The score is used to select samples in the target domain for which to apply specific losses during training; pseudo-labels for high scoring samples and confidence regularization for low scoring samples. Taken together, our method is shown to outperform, by a sizeable margin, the current state of the art on the literature benchmarks.

A Sample Selection Approach for Universal Domain Adaptation

2020

We study the problem of unsupervised domain adaption in the universal scenario, in which only some of the classes are shared between the source and target domains. We present a scoring scheme that is effective in identifying the samples of the shared classes. The score is used to select which samples in the target domain to pseudo-label during training. Another loss term encourages diversity of labels within each batch. Taken together, our method is shown to outperform, by a sizable margin, the current state of the art on the literature benchmarks.

Boosted Multifeature Learning for Cross-Domain Transfer

ACM Transactions on Multimedia Computing, Communications, and Applications, 2015

Conventional learning algorithm assumes that the training data and test data share a common distribution. However, this assumption will greatly hinder the practical application of the learned model for cross-domain data analysis in multimedia. To deal with this issue, transfer learning based technology should be adopted. As a typical version of transfer learning, domain adaption has been extensively studied recently due to its theoretical value and practical interest. In this article, we propose a boosted multifeature learning (BMFL) approach to iteratively learn multiple representations within a boosting procedure for unsupervised domain adaption. The proposed BMFL method has a number of properties. (1) It reuses all instances with different weights assigned by the previous boosting iteration and avoids discarding labeled instances as in conventional methods. (2) It models the instance weight distribution effectively by considering the classification error and the domain similarity...

Multi-transfer: Transfer learning with multiple views and multiple sources

Statistical Analysis and Data Mining, 2014

Transfer learning, which aims to help the learning task in a target domain by leveraging knowledge from auxiliary domains, has been demonstrated to be effective in different applications, e.g., text mining, sentiment analysis, etc. In addition, in many real-world applications, auxiliary data are described from multiple perspectives and usually carried by multiple sources. For example, to help classify videos on Youtube, which include three views/perspectives: image, voice and subtitles, one may borrow data from Flickr, Last.FM and Google News. Although any single instance in these domains can only cover a part of the views available on Youtube, actually the piece of information carried by them may compensate with each other. In this paper, we define this transfer learning problem as Transfer Learning with Multiple Views and Multiple Sources. As different sources may have different probability distributions and different views may be compensate or inconsistent with each other, merging all data in a simplistic manner will not give optimal result. Thus, we propose a novel algorithm to leverage knowledge from different views and sources collaboratively, by letting different views from different sources complement each other through a co-training style framework, while revise the distribution differences in different domains. We conduct empirical studies on several real-world datasets to show that the proposed approach can improve the classification accuracy by up to 8% against different state-of-the-art baselines.

On handling negative transfer and imbalanced distributions in multiple source transfer learning

Statistical Analysis and Data Mining: The ASA Data Science Journal, 2014

Transfer learning has benefited many real‐world applications where labeled data are abundant in source domains but scarce in the target domain. As there are usually multiple relevant domains where knowledge can be transferred, multiple source transfer learning (MSTL) has recently attracted much attention. However, we are facing two major challenges when applying MSTL. First, without knowledge about the difference between source and target domains, negative transfer occurs when knowledge is transferred from highly irrelevant sources. Second, existence of imbalanced distributions in classes, where examples in one class dominate, can lead to improper judgement on the source domains' relevance to the target task. Since existing MSTL methods are usually designed to transfer from relevant sources with balanced distributions, they will fail in applications where these two challenges persist. In this article, we propose a novel two‐phase framework to effectively transfer knowledge from ...

On the Benefits of Selectivity in Pseudo-Labeling for Unsupervised Multi-Source-Free Domain Adaptation

2022

Due to privacy, storage, and other constraints, there is a growing need for unsupervised domain adaptation techniques in machine learning that do not require access to the data used to train a collection of source models. Existing methods for such multi-source-free domain adaptation typically train a target model using supervised techniques in conjunction with pseudo-labels for the target data, which are produced by the available source models. However, we show that assigning pseudo-labels to only a subset of the target data leads to improved performance. In particular, we develop an information-theoretic bound on the generalization error of the resulting target model that demonstrates an inherent bias-variance trade-off controlled by the subset choice. Guided by this analysis, we develop a method that partitions the target data into pseudo-labeled and unlabeled subsets to balance the trade-off. In addition to exploiting the pseudo-labeled subset, our algorithm further leverages the...

A Two-Stage Weighting Framework for Multi-Source Domain Adaptation

Neural Information Processing Systems, 2011

Discriminative learning when training and test data belong to different distributions is a challenging and complex task. Often times we have very few or no labeled data from the test or target distribution but may have plenty of labeled data from multiple related sources with different distributions. The difference in distributions may be both in marginal and conditional probabilities. Most of the existing domain adaptation work focuses on the marginal probability distribution difference between the domains, assuming that the conditional probabilities are similar. However in many real world applications, conditional probability distribution differences are as commonplace as marginal probability differences. In this paper we propose a two-stage domain adaptation methodology which combines weighted data from multiple sources based on marginal probability differences (first stage) as well as conditional probability differences (second stage), with the target domain data. The weights for minimizing the marginal probability differences are estimated independently, while the weights for minimizing conditional probability differences are computed simultaneously by exploiting the potential interaction among multiple sources. We also provide a theoretical analysis on the generalization performance of the proposed multi-source domain adaptation formulation using the weighted Rademacher complexity measure. Empirical comparisons with existing state-of-the-art domain adaptation methods using three real-world datasets demonstrate the effectiveness of the proposed approach.

Improving Semi-Supervised Domain Adaptation Using Effective Target Selection and Semantics

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2021

Recently, semi-supervised domain adaptation (SSDA) approaches have shown impressive performance for the domain adaptation task. They effectively utilize few labeled target samples along with the unlabeled data to account for the distribution shift across the source and target domains. In this work, we make three-fold contributions, concentrating on the role of target samples and semantics for the SSDA task. First, we observe that choosing a few, but an equal number of labeled samples from each class in the target domain requires a significant amount of manual effort. To address this, we propose an active learning-based framework by modeling both the sample diversity and the classifier uncertainty. By utilizing k-means initialized cluster centers for picking a small pool of diverse unlabeled target samples, we compute a novel classifier adaptation uncertainty term to select the most effective samples from this pool, which are queried to obtain their true labels from an oracle. Second...

Domain Adaptation Without Source Data

IEEE Transactions on Artificial Intelligence, 2021

Domain adaptation assumes that samples from source and target domains are freely accessible during a training phase. However, such an assumption is rarely plausible in the real-world and possibly causes data-privacy issues, especially when the label of the source domain can be a sensitive attribute as an identifier. To avoid accessing source data that may contain sensitive information, we introduce Source data-Free Domain Adaptation (SFDA). Our key idea is to leverage a pre-trained model from the source domain and progressively update the target model in a self-learning manner. We observe that target samples with lower self-entropy measured by the pre-trained source model are more likely to be classified correctly. From this, we select the reliable samples with the self-entropy criterion and define these as class prototypes. We then assign pseudo labels for every target sample based on the similarity score with class prototypes. Furthermore, to reduce the uncertainty from the pseudo labeling process, we propose set-to-set distance-based filtering which does not require any tunable hyperparameters. Finally, we train the target model with the filtered pseudo labels with regularization from the pre-trained source model. Surprisingly, without direct usage of labeled source samples, our SFDA outperforms conventional domain adaptation methods on benchmark datasets. Our code is publicly available at https://github.com/youngryan1993/SFDA-SourceFreeDA. Preprint. Under review.