Recurrent Neural Networks Research Papers (original) (raw)

Multilabel Image Tagging is one of the most important challenges in computer vision with many real world applications and thus we have used Deep Neural Networks for Image Annotation to boost performance. This experiment is performed on NUS-WIDE Dataset with 1K Tags. I. INTRODUCTION Multilabel image annotation is an important and challenging problem in computer vision. Most existing work focus on single-label classification problems, where each image is assumed to have only one class label. However, this is not necessarily true for real world applications, as an image may be associated with multiple tags. As a practical example, images from Flickr are used as an example which has multiple tags, such as objects, activities, and scene descriptions. Images on the Internet, in general, are usually associated with sentences or descriptions, instead of a single class label, which is a type of multitagging. Therefore, it is a practical and important problem to accurately assign multiple labels to one image. Single-label image classification has been extensively studied in the vision community, the most recent advances reported on the large-scale ImageNet Architecture. Most existing work focus on designing visual features for improving recognition accuracy. For example, sparse coding, Fisher vectors, and VLAD have been proposed to reduce the quantization error of " bag of words "-type features. Very recently, deep convolutional neural networks (CNN) have demonstrated promising results for single-label image classification. Such algorithms have focused on one vs all, but one have worked on the multilabel image annotation problem. In this work, we used the highly expressive convolutional network for the problem of multilabel image annotation. We employed a similar network structure to as used in Image Net, which contains several convolutional and dense connected layers as the basic architecture. We studied and compared several other popular multilabel losses, such as the ranking loss that optimizes the area under ROC curve (AUC), and the cross-entropy loss used in Tagprop. Specifically, we propose to use the top-k ranking loss, inspired by, for embedding to train the network. Using the largest publicly available multilabel dataset NUS-WIDE, we observe a significant performance boost over conventional features, reporting the best retrieval performance. We even performed test on Triplet Loss giving much better results. II. MOTIVATION Multilabel image annotation is an important and challenging problem in computer vision. Most existing work focus on single-label classification problems, where each image is assumed to have only one class label. However, this is not necessarily true for real world applications, as an image may be associated with multiple tags. As a practical example, images from Flickr are used as an example which has multiple tags, such as objects, activities, and scene descriptions. Images on the Internet, in general, are usually associated with sentences or descriptions, instead of a single class label, which is a type of multitagging. Therefore, it is a practical and important problem to accurately assign multiple labels to one image.