Learning to detect unseen object classes by between-class attribute transfer (original) (raw)

Attribute-centric recognition for cross-category generalization

2010

We propose an approach to find and describe objects within broad domains. We introduce a new dataset that provides annotation for sharing models of appearance and correlation across categories. We use it to learn part and category detectors. These serve as the visual basis for an integrated model of objects. We describe objects by the spatial arrangement of their attributes and the interactions between them. Using this model, our system can find animals and vehicles that it has not seen and infer attributes, such as function and pose. Our experiments demonstrate that we can more reliably locate and describe both familiar and unfamiliar objects, compared to a baseline that relies purely on basic category detectors.

Objects as Attributes for Scene Classification

Lecture Notes in Computer Science, 2012

Robust low-level image features have proven to be effective representations for a variety of high-level visual recognition tasks, such as object recognition and scene classification. But as the visual recognition tasks become more challenging, the semantic gap between low-level feature representation and the meaning of the scenes increases. In this paper, we propose to use objects as attributes of scenes for scene classification. We represent images by collecting their responses to a large number of object detectors, or "object filters". Such representation carries high-level semantic information rather than low-level image feature information, making it more suitable for high-level visual recognition tasks. Using very simple, off-the-shelf classifiers such as SVM, we show that this object-level image representation can be used effectively for high-level visual tasks such as scene classification. Our results are superior to reported state-of-the-art performance on a number of standard datasets. *indicates equal contributions.

COCO Attributes: Attributes for People, Animals, and Objects

Computer Vision – ECCV 2016, 2016

In this paper, we discover and annotate visual attributes for the COCO dataset. With the goal of enabling deeper object understanding, we deliver the largest attribute dataset to date. Using our COCO Attributes dataset, a fine-tuned classification system can do more than recognize object categories-for example, rendering multi-label classifications such as "sleeping spotted curled-up cat" instead of simply "cat". To overcome the expense of annotating thousands of COCO object instances with hundreds of attributes, we present an Economic Labeling Algorithm (ELA) which intelligently generates crowd labeling tasks based on correlations between attributes. The ELA offers a substantial reduction in labeling cost while largely maintaining attribute density and variety. Currently, we have collected 3.5 million object-attribute pair annotations describing 180 thousand different objects. We demonstrate that our efficiently labeled training data can be used to produce classifiers of similar discriminative ability as classifiers created using exhaustively labeled ground truth. Finally, we provide baseline performance analysis for object attribute recognition.

A Large-scale Attribute Dataset for Zero-shot Learning

arXiv (Cornell University), 2018

Zero-Shot Learning (ZSL) has attracted huge research attention over the past few years; it aims to learn the new concepts that have never been seen before. In classical ZSL algorithms, attributes are introduced as the intermediate semantic representation to realize the knowledge transfer from seen classes to unseen classes. Previous ZSL algorithms are tested on several benchmark datasets annotated with attributes. However, these datasets are defective in terms of the image distribution and attribute diversity. In addition, we argue that the "co-occurrence bias problem" of existing datasets, which is caused by the biased co-occurrence of objects, significantly hinders models from correctly learning the concept. To overcome these problems, we propose a Large-scale Attribute Dataset (LAD). Our dataset has 78,017 images of 5 super-classes, 230 classes. The image number of LAD is larger than the sum of the four most popular attribute datasets. 359 attributes of visual, semantic and subjective properties are defined and annotated in instance-level. We analyze our dataset by conducting both supervised learning and zero-shot learning tasks. Seven state-of-the-art ZSL algorithms are tested on this new dataset. The experimental results reveal the challenge of implementing zero-shot learning on our dataset.

UnseenNet: Fast Training Detector for Any Unseen Concept

ArXiv, 2022

Training of object detection models using less data is currently the focus of existing N-shot learning models in computer vision. Such methods use object-level labels and takes hours to train on unseen classes. There are many cases where we have large amount of image-level labels available for training but cannot be utilized by few shot object detection models for training. There is a need for a machine learning framework that can be used for training any unseen class and can become useful in real-time situations. In this paper, we proposed an “Unseen Class Detector” that can be trained within a very short time for any possible unseen class without bounding boxes with competitive accuracy. We build our approach on “Strong” and “Weak” baseline detectors, which we trained on existing object detection and image classification datasets, respectively. Unseen concepts are fine-tuned on the strong baseline detector using only image-level labels and further adapted by transferring the classifi...

Learning From a Small Number of Training Examples by Exploiting Object Categories

2004 Conference on Computer Vision and Pattern Recognition Workshop, 2004

In the last few years, object detection techniques have progressed immensely. Impressive detection results have been achieved for many objects such as faces and cars . The robustness of these systems emerges from a training stage utilizing thousands of positive examples. One approach to enable learning from a small set of training examples is to find an efficient set of features that accurately represent the target object. Unfortunately, automatically selecting such a feature set is a difficult task in itself.

Zero Shot Learning to Detect Object Instances from Unknown Image Sources

International Journal of Innovative Technology and Exploring Engineering (IJITEE), 2020

Inspired by the human capability, zero-shot learning research has been approaches to detect object instances from unknown sources. Human brains are capable of making decisions for any unknown object from a given attributes. They can make relation between the unknown and unseen object just by having the description of them. If human brain is given enough attributes, they can assess about the object. Zero-shot learning aims to reach this capability of human brain. First, we consider a machine to detect unknown object with training examples. Zero-shot learning approaches to do this type of object detection where there are no training examples. Through the process, a machine can detect object instances from images without any training examples. In this paper, we develop a dynamic system which will be able to detect object instances from an image that it never seen before. Which means during the testing process the test image will completely unknown from trained images. The system will be able to detect completely unseen objects from some bounded region of given images using zero shot learning approach. We approach to detect object instances from unknown class, because there are lots of growing category in the world and the new categories are always emerging. It is not possible to limit objects in this fast-forwarding world. Again, collecting, annotating and training each category is impossible. So, zero-shot learning will reduce the complexity to detect unknown objects.

Detect-and-describe: Joint learning framework for detection and description of objects

MATEC Web of Conferences, 2019

Traditional object detection answers two questions; “what” (what the object is?) and “where” (where the object is?). “what” part of the object detection can be fine grained further i-e. “what type”, “what shape” and “what material” etc. This results in shifting of object detection task to object description paradigm. Describing object provides additional detail that enables us to understand the characteristics and attributes of the object (“plastic boat” not just boat, “glass bottle” not just bottle). This additional information can implicitly be used to gain insight about unseen objects (e.g. unknown object is “metallic”, “has wheels”), which is not possible in traditional object detection. In this paper, we present a new approach to simultaneously detect objects and infer their attributes, we call it Detectand- Describe (DaD) framework. DaD is a deep learning-based approach that extends object detection to object attribute prediction as well. We train our model on aPascal train se...

On the relationship between visual attributes and convolutional networks

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015

One of the cornerstone principles of deep models is their abstraction capacity, i.e. their ability to learn abstract concepts from 'simpler' ones. Through extensive experiments, we characterize the nature of the relationship between abstract concepts (specifically objects in images) learned by popular and high performing convolutional networks (conv-nets) and established mid-level representations used in computer vision (specifically semantic visual attributes). We focus on attributes due to their impact on several applications, such as object description, retrieval and mining, and active (and zero-shot) learning. Among the findings we uncover, we show empirical evidence of the existence of Attribute Centric Nodes (ACNs) within a conv-net, which is trained to recognize objects (not attributes) in images. These special conv-net nodes (1) collectively encode information pertinent to visual attribute representation and discrimination, (2) are unevenly and sparsely distribution across all layers of the conv-net, and (3) play an important role in conv-net based object recognition.

Zero-Shot Recognition with Attributes

I have implemented 2 methods for zero-shot learning on the animals with attributes dataset. Both methods use SIFT features. The first method applies a 4-layer MLP while the second method uses episodic training with the Relation Network (Sung et al., 2018). Both methods outperform the 20% benchmark (required for full credit on the assignment), but with SIFT + MLP achieving higher accuracy at 25.0% while SIFT + Relation Network only achieves 23.1% accuracy. However, SIFT + Relation Network has a higher class-balanced accuracy of 24.52% vs SIFT+ RelNet's class-balanced accuracy of 24.47%.