Transductive Zero-Shot Learning by Decoupled Feature Generation (original) (raw)

Leveraging Seen and Unseen Semantic Relationships for Generative Zero-Shot Learning

ArXiv, 2020

Zero-shot learning (ZSL) addresses the unseen class recognition problem by leveraging semantic information to transfer knowledge from seen classes to unseen classes. Generative models synthesize the unseen visual features and convert ZSL into a classical supervised learning problem. These generative models are trained using the seen classes and are expected to implicitly transfer the knowledge from seen to unseen classes. However, their performance is stymied by overfitting, which leads to substandard performance on Generalized Zero-Shot learning (GZSL). To address this concern, we propose the novel LsrGAN, a generative model that Leverages the Semantic Relationship between seen and unseen categories and explicitly performs knowledge transfer by incorporating a novel Semantic Regularized Loss (SR-Loss). The SR-loss guides the LsrGAN to generate visual features that mirror the semantic relationships between seen and unseen classes. Experiments on seven benchmark datasets, including t...

Classifier Crafting: Turn Your ConvNet into a Zero-Shot Learner!

ArXiv, 2021

In Zero-shot learning (ZSL), we classify unseen categories using textual descriptions about their expected appearance when observed (class embeddings) and a disjoint pool of seen classes, for which annotated visual data are accessible. We tackle ZSL by casting a “vanilla” convolutional neural network (e.g. AlexNet [16], ResNet-101 [10], DenseNet-201 [12] or DarkNet-53 [28]) into a zero-shot learner. We do so by crafting the softmax classifier: we freeze its weights using fixed seen classification rules, either semantic (seen class embeddings) or visual (seen class prototypes). Then, we learn a data-driven and ZSL-tailored feature representation on seen classes only to match these fixed classification rules. Given that the latter seamlessly generalize towards unseen classes, while requiring not actual unseen data to be computed, we can perform ZSL inference by augmenting the pool of classification rules at test time while keeping the very same representation we learnt: nowhere re-tra...

Generative Model with Semantic Embedding and Integrated Classifier for Generalized Zero-Shot Learning

2020 IEEE Winter Conference on Applications of Computer Vision (WACV), 2020

Generative models have achieved impressive performance for the generalized zero-shot learning task by learning the mapping from attributes to feature space. In this work, we propose to derive semantic inferences from images and use them for the generation, which enables us to capture the bidirectional information i.e., visual to semantic and semantic to visual spaces. Specifically, we propose a Semantic Embedding module which not only gives image specific semantic information to the generative model for generation of better features, but also makes sure that the generated features can be mapped to the correct semantic space. We also propose an Integrated Classifier, which is trained along with the generator. This module not only eliminates the requirement of additional classifier for new object categories which is required by the existing generative approaches, but also facilitates the generation of more discriminative and useful features. This approach can be used seamlessly for th...

Generative Multi-Label Zero-Shot Learning

ArXiv, 2021

Multi-label zero-shot learning strives to classify images into multiple unseen categories for which no data is available during training. The test samples can additionally contain seen categories in the generalized variant. Existing approaches rely on learning either shared or labelspecific attention from the seen classes. Nevertheless, computing reliable attention maps for unseen classes during inference in a multi-label setting is still a challenge. In contrast, state-of-the-art single-label generative adversarial network (GAN) based approaches learn to directly synthesize the class-specific visual features from the corresponding class attribute embeddings. However, synthesizing multi-label features from GANs is still unexplored in the context of zero-shot setting. In this work, we introduce different fusion approaches at the attribute-level, featurelevel and cross-level (across attribute and feature-levels) for synthesizing multi-label features from their corresponding multi-labe...

Unified Generator-Classifier for Efficient Zero-Shot Learning

2019

Generative models have achieved state-of-the-art performance for the zero-shot learning problem, but they require re-training the classifier every time a new object category is encountered. The traditional semantic embedding approaches, though very elegant, usually do not perform at par with their generative counterparts. In this work, we propose an unified framework termed GenClass, which integrates the generator with the classifier for efficient zero-shot learning, thus combining the representative power of the generative approaches and the elegance of the embedding approaches. End-to-end training of the unified framework not only eliminates the requirement of additional classifier for new object categories as in the generative approaches, but also facilitates the generation of more discriminative and useful features. Extensive evaluation on three standard zero-shot object classification datasets, namely AWA, CUB and SUN shows the effectiveness of the proposed approach. The approa...

Learning Feature-to-Feature Translator by Alternating Back-Propagation for Generative Zero-Shot Learning

2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019

We investigate learning feature-to-feature translator networks by alternating back-propagation as a generalpurpose solution to zero-shot learning (ZSL) problems. It is a generative model-based ZSL framework. In contrast to models based on generative adversarial networks (GAN) or variational autoencoders (VAE) that require auxiliary networks to assist the training, our model consists of a single conditional generator that maps class-level semantic features and Gaussian white noise vector accounting for instance-level latent factors to visual features, and is trained by maximum likelihood estimation. The training process is a simple yet effective alternating backpropagation process that iterates the following two steps: (i) the inferential back-propagation to infer the latent factors of each observed example, and (ii) the learning backpropagation to update the model parameters. We show that, with slight modifications, our model is capable of learning from incomplete visual features for ZSL. We conduct extensive comparisons with existing generative ZSL methods on five benchmarks, demonstrating the superiority of our method in not only ZSL performance but also convergence speed and computational cost. Specifically, our model outperforms the existing state-of-the-art methods by a remarkable margin up to 3.1% and 4.0% in ZSL and generalized ZSL settings, respectively.

Synthesizing the Unseen for Zero-shot Object Detection

2020

The existing zero-shot detection approaches project visual features to the semantic domain for seen objects, hoping to map unseen objects to their corresponding semantics during inference. However, since the unseen objects are never visualized during training, the detection model is skewed towards seen content, thereby labeling unseen as background or a seen class. In this work, we propose to synthesize visual features for unseen classes, so that the model learns both seen and unseen objects in the visual domain. Consequently, the major challenge becomes, how to accurately synthesize unseen objects merely using their class semantics? Towards this ambitious goal, we propose a novel generative model that uses class-semantics to not only generate the features but also to discriminatively separate them. Further, using a unified model, we ensure the synthesized features have high diversity that represents the intra-class differences and variable localization precision in the detected bou...

Learning without Seeing nor Knowing: Towards Open Zero-Shot Learning

arXiv (Cornell University), 2021

In Generalized Zero-Shot Learning (GZSL), unseen categories (for which no visual data are available at training time) can be predicted by leveraging their class embeddings (e.g., a list of attributes describing them) together with a complementary pool of seen classes (paired with both visual data and class embeddings). Despite GZSL is arguably challenging, we posit that knowing in advance the class embeddings, especially for unseen categories, is an actual limit of the applicability of GZSL towards real-world scenarios. To relax this assumption, we propose Open Zero-Shot Learning (OZSL) to extend GZSL towards the openworld settings. We formalize OZSL as the problem of recognizing seen and unseen classes (as in GZSL) while also rejecting instances from unknown categories, for which neither visual data nor class embeddings are provided. We formalize the OZSL problem introducing evaluation protocols, error metrics and benchmark datasets. We also suggest to tackle the OZSL problem by proposing the idea of performing unknown feature generation (instead of only unseen features generation as done in GZSL). We achieve this by optimizing a generative process to sample unknown class embeddings as complementary to the seen and the unseen. We intend these results to be the ground to foster future research, extending the standard closed-world zeroshot learning (GZSL) with the novel open-world counterpart (OZSL).

Zero-Shot Learning from scratch (ZFS): leveraging local compositional representations

2020

Zero-shot classification is a generalization task where no instance from the target classes is seen during training. To allow for test-time transfer, each class is annotated with semantic information, commonly in the form of attributes or text descriptions. While classical zero-shot learning does not explicitly forbid using information from other datasets, the approaches that achieve the best absolute performance on image benchmarks rely on features extracted from encoders pretrained on Imagenet. This approach relies on hyper-optimized Imagenet-relevant parameters from the supervised classification setting, entangling important questions about the suitability of those parameters and how they were learned with more fundamental questions about representation learning and generalization. To remove these distractors, we propose a more challenging setting: Zero-Shot Learning from scratch (ZFS), which explicitly forbids the use of encoders fine-tuned on other datasets. Our analysis on thi...

GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot Learning

IEEE Transactions on Multimedia

Generalized Zero-Shot Learning (GZSL) aims to recognize images from both the seen and unseen classes by transferring semantic knowledge from seen to unseen classes. It is a promising solution to take the advantage of generative models to hallucinate realistic unseen samples based on the knowledge learned from the seen classes. However, due to the generation shifts, the synthesized samples by most existing methods may drift from the real distribution of the unseen data. To address this issue, we propose a novel flow-based generative framework that consists of multiple conditional affine coupling layers for learning unseen data generation. Specifically, we discover and address three potential problems that trigger the generation shifts, i.e., semantic inconsistency, variance collapse, and structure disorder. First, to enhance the reflection of the semantic information in the generated samples, we explicitly embed the semantic information into the transformation in each conditional affine coupling layer. Second, to recover the intrinsic variance of the real unseen features, we introduce a boundary sample mining strategy with entropy maximization to discover more difficult visual variants of semantic prototypes and hereby adjust the decision boundary of the classifiers. Third, a relative positioning strategy is proposed to revise the attribute embeddings, guiding them to fully preserve the inter-class geometric structure and further avoid structure disorder in the semantic space. Extensive experimental results on four GZSL benchmark datasets demonstrate that GSMFlow achieves the state-of-the-art performance on GZSL.