Simple and effective localized attribute representations for zero-shot learning (original) (raw)
Related papers
On Implicit Attribute Localization for Generalized Zero-Shot Learning
IEEE Signal Processing Letters, 2021
Zero-shot learning (ZSL) aims to discriminate images from unseen classes by exploiting relations to seen classes via their attribute-based descriptions. Since attributes are often related to specific parts of objects, many recent works focus on discovering discriminative regions. However, these methods usually require additional complex part detection modules or attention mechanisms. In this paper, 1) we show that common ZSL backbones (without explicit attention nor part detection) can implicitly localize attributes, yet this property is not exploited. 2) Exploiting it, we then propose SELAR, a simple method that further encourages attribute localization, surprisingly achieving very competitive generalized ZSL (GZSL) performance when compared with more complex state-of-the-art methods. Our findings provide useful insight for designing future GZSL methods, and SELAR provides an easy to implement yet strong baseline.
A Large-scale Attribute Dataset for Zero-shot Learning
arXiv (Cornell University), 2018
Zero-Shot Learning (ZSL) has attracted huge research attention over the past few years; it aims to learn the new concepts that have never been seen before. In classical ZSL algorithms, attributes are introduced as the intermediate semantic representation to realize the knowledge transfer from seen classes to unseen classes. Previous ZSL algorithms are tested on several benchmark datasets annotated with attributes. However, these datasets are defective in terms of the image distribution and attribute diversity. In addition, we argue that the "co-occurrence bias problem" of existing datasets, which is caused by the biased co-occurrence of objects, significantly hinders models from correctly learning the concept. To overcome these problems, we propose a Large-scale Attribute Dataset (LAD). Our dataset has 78,017 images of 5 super-classes, 230 classes. The image number of LAD is larger than the sum of the four most popular attribute datasets. 359 attributes of visual, semantic and subjective properties are defined and annotated in instance-level. We analyze our dataset by conducting both supervised learning and zero-shot learning tasks. Seven state-of-the-art ZSL algorithms are tested on this new dataset. The experimental results reveal the challenge of implementing zero-shot learning on our dataset.
Learning where to look: Semantic-Guided Multi-Attention Localization for Zero-Shot Learning
2019
Zero-shot learning extends the conventional object classification to the unseen class recognition by introducing semantic representations of classes. Existing approaches predominantly focus on learning the proper mapping function for visual-semantic embedding, while neglecting the effect of learning discriminative visual features. In this paper, we study the significance of the discriminative region localization. We propose a semantic-guided multiattention localization model, which automatically discovers the most discriminative parts of objects for zero-shot learning without any human annotations. Our model jointly learns cooperative global and local features from the whole object as well as the detected parts to categorize objects based on semantic descriptions. Moreover, with the joint supervision of embedding softmax loss and classcenter triplet loss, the model is encouraged to learn features with high inter-class dispersion and intraclass compactness. Through comprehensive expe...
Implicit and Explicit Attention for Zero-Shot Learning
ArXiv, 2021
Most of the existing Zero-Shot Learning (ZSL) methods focus on learning a compatibility function between the image representation and class attributes. Few others concentrate on learning image representation combining local and global features. However, the existing approaches still fail to address the bias issue towards the seen classes. In this paper, we propose implicit and explicit attention mechanisms to address the existing bias problem in ZSL models. We formulate the implicit attention mechanism with a self-supervised image angle rotation task, which focuses on specific image features aiding to solve the task. The explicit attention mechanism is composed with the consideration of a multi-headed self-attention mechanism via Vision Transformer model, which learns to map image features to semantic space during the training stage. We conduct comprehensive experiments on three popular benchmarks: AWA2, CUB and SUN. The performance of our proposed attention mechanisms has proved it...
Semantic-Guided Multi-Attention Localization for Zero-Shot Learning
2019
Zero-shot learning extends the conventional object classification to the unseen class recognition by introducing semantic representations of classes. Existing approaches predominantly focus on learning the proper mapping function for visual-semantic embedding, while neglecting the effect of learning discriminative visual features. In this paper, we study the significance of the discriminative region localization. We propose a semantic-guided multi-attention localization model, which automatically discovers the most discriminative parts of objects for zero-shot learning without any human annotations. Our model jointly learns cooperative global and local features from the whole object as well as the detected parts to categorize objects based on semantic descriptions. Moreover, with the joint supervision of embedding softmax loss and class-center triplet loss, the model is encouraged to learn features with high inter-class dispersion and intra-class compactness. Through comprehensive e...
Recent Advances in Zero-Shot Recognition: Toward Data-Efficient Understanding of Visual Content
IEEE Signal Processing Magazine, 2018
With the recent renaissance of deep convolution neural networks, encouraging breakthroughs have been achieved on the supervised recognition tasks, where each class has sufficient training data and fully annotated training data. However, to scale the recognition to a large number of classes with few or now training samples for each class remains an unsolved problem. One approach to scaling up the recognition is to develop models capable of recognizing unseen categories without any training instances, or zero-shot recognition/ learning. This article provides a comprehensive review of existing zero-shot recognition techniques covering various aspects ranging from representations of models, and from datasets and evaluation settings. We also overview related recognition tasks including one-shot and open set recognition which can be used as natural extensions of zeroshot recognition when limited number of class samples become available or when zero-shot recognition is implemented in a real-world setting. Importantly, we highlight the limitations of existing approaches and point out future research directions in this existing new research area.
Stacked Semantics-Guided Attention Model for Fine-Grained Zero-Shot Learning
2018
Zero-Shot Learning (ZSL) is generally achieved via aligning the semantic relationships between the visual features and the corresponding class semantic descriptions. However, using the global features to represent fine-grained images may lead to sub-optimal results since they neglect the discriminative differences of local regions. Besides, different regions contain distinct discriminative information. The important regions should contribute more to the prediction. To this end, we propose a novel stacked semantics-guided attention (S2GA) model to obtain semantic relevant features by using individual class semantic features to progressively guide the visual features to generate an attention map for weighting the importance of different local regions. Feeding both the integrated visual features and the class semantic features into a multi-class classification architecture, the proposed framework can be trained end-to-end. Extensive experimental results on CUB and NABird datasets show ...
Zero-Shot Recognition with Attributes
I have implemented 2 methods for zero-shot learning on the animals with attributes dataset. Both methods use SIFT features. The first method applies a 4-layer MLP while the second method uses episodic training with the Relation Network (Sung et al., 2018). Both methods outperform the 20% benchmark (required for full credit on the assignment), but with SIFT + MLP achieving higher accuracy at 25.0% while SIFT + Relation Network only achieves 23.1% accuracy. However, SIFT + Relation Network has a higher class-balanced accuracy of 24.52% vs SIFT+ RelNet's class-balanced accuracy of 24.47%.
Zero-Shot Learning with Missing Attributes using Semantic Correlations
2021 International Joint Conference on Neural Networks (IJCNN), 2021
Zero-shot learning (ZSL) aims to recognize instances belonging to unseen categories which are not available at training time. Previous ZSL models learn a projection function from the visual feature space to a semantic space which contains a description of the categories. The semantic attributes are often correlated with each other at the semantic space and it is not appropriate to learn them independently. Existing ZSL methods are designed to work on complete descriptions of the semantic attributes. However, because these attributes are human-designed values, they might be incomplete or contains noisy values which may affect the recognition performance of many existing ZSL models. This paper proposes a novel zero-shot learning approach (ZSL-MSA) to handle missing and noisy semantic attributes during the training process. Significantly, the proposed method learns a supplementary attribute matrix by exploiting the attribute correlation. The proposed method also learns the relevant feature coefficients in the projection matrix to identify the correlated attribute space. Th proposed method also adopts l1 regularization norm to select the relevant sparse features. A constrained optimization function is formulated and solved using the accelerated proximal gradient method. Extensive experiments on three benchmark datasets using ZSL and generalized ZSL demonstrate the effectiveness of the proposed method.
Zero-Shot Object Detection: Learning to Simultaneously Recognize and Localize Novel Concepts
Computer Vision – ACCV 2018
Current Zero-Shot Learning (ZSL) approaches are restricted to recognition of a single dominant unseen object category in a test image. We hypothesize that this setting is ill-suited for real-world applications where unseen objects appear only as a part of a complex scene, warranting both the 'recognition' and 'localization' of an unseen category. To address this limitation, we introduce a new 'Zero-Shot Detection' (ZSD) problem setting, which aims at simultaneously recognizing and locating object instances belonging to novel categories without any training examples. We also propose a new experimental protocol for ZSD based on the highly challenging ILSVRC dataset, adhering to practical issues, e.g., the rarity of unseen objects. To the best of our knowledge, this is the first end-to-end deep network for ZSD that jointly models the interplay between visual and semantic domain information. To overcome the noise in the automatically derived semantic descriptions, we utilize the concept of meta-classes to design an original loss function that achieves synergy between max-margin class separation and semantic space clustering. Furthermore, we present a baseline approach extended from recognition to detection setting. Our extensive experiments show significant performance boost over the baseline on the imperative yet difficult ZSD problem.