Locality-constrained Linear Coding for image classification (original) (raw)

Locality Discriminative Coding for Image Classification

The Bag-of-Words (BOW) based methods are widely used in image classification. However, huge number of visual information is omitted inevitably in the quantization step of the BOW. Recently, NBNN and its improved methods like Local NBNN were proposed to solve this problem. Nevertheless, these methods do not perform better than the stateof-the-art BOW based methods. In this paper, based on the advantages of BOW and Local NBNN, we introduce a novel locality discriminative coding (LDC) method. We convert each low level local feature, such as SIFT, into code vector using the Local Feature-to-Class distance other than by k-means quantization. Extensive experimental results on 4 challenging benchmark datasets show that our LDC method outperforms 6 state-of-the-art image classification methods (3 based on NBNN, 3 based on BOW).

Sparse Spatial Coding: A novel approach for efficient and accurate object recognition

2012

Successful state-of-the-art object recognition techniques from images have been based on powerful methods, such as sparse representation, in order to replace the also popular vector quantization (VQ) approach. Recently, sparse coding, which is characterized by representing a signal in a sparse space, has raised the bar on several object recognition benchmarks. However, one serious drawback of sparse space based methods is that similar local features can be quantized into different visual words. We present in this paper a new method, called Sparse Spatial Coding (SSC), which combines a sparse coding dictionary learning, a spatial constraint coding stage and an online classification method to improve object recognition. An efficient new off-line classification algorithm is also presented. We overcome the problem of techniques which make use of sparse representation alone by generating the final representation with SSC and max pooling, presented for an online learning classifier. Exper...

Local Coding Based Matching Kernel Method for Image Classification

PLoS ONE, 2014

This paper mainly focuses on how to effectively and efficiently measure visual similarity for local feature based representation. Among existing methods, metrics based on Bag of Visual Word (BoV) techniques are efficient and conceptually simple, at the expense of effectiveness. By contrast, kernel based metrics are more effective, but at the cost of greater computational complexity and increased storage requirements. We show that a unified visual matching framework can be developed to encompass both BoV and kernel based metrics, in which local kernel plays an important role between feature pairs or between features and their reconstruction. Generally, local kernels are defined using Euclidean distance or its derivatives, based either explicitly or implicitly on an assumption of Gaussian noise. However, local features such as SIFT and HoG often follow a heavy-tailed distribution which tends to undermine the motivation behind Euclidean metrics. Motivated by recent advances in feature coding techniques, a novel efficient local coding based matching kernel (LCMK) method is proposed. This exploits the manifold structures in Hilbert space derived from local kernels. The proposed method combines advantages of both BoV and kernel based metrics, and achieves a linear computational complexity. This enables efficient and scalable visual matching to be performed on large scale image sets. To evaluate the effectiveness of the proposed LCMK method, we conduct extensive experiments with widely used benchmark datasets, including 15-Scenes, Caltech101/256, PASCAL VOC 2007 and 2011 datasets. Experimental results confirm the effectiveness of the relatively efficient LCMK method.

From Local Similarity to Global Coding: An Application to Image Classification

Bag of words models for feature extraction have demonstrated top-notch performance in image classification. These representations are usually accompanied by a coding method. Recently, methods that code a descriptor giving regard to its nearby bases have proved efficacious. These methods take into account the nonlinear structure of descriptors, since local similarities are a good approximation of global similarities. However, they confine their usage of the global similarities to nearby bases. In this paper, we propose a coding scheme that brings into focus the manifold structure of descriptors, and devise a method to compute the global similarities of descriptors to the bases. Given a local similarity measure between bases, a global measure is computed. Exploiting the local similarity of a descriptor and its nearby bases, a global measure of association of a descriptor to all the bases is computed. Unlike the locality-based and sparse coding methods, the proposed coding varies smoothly with respect to the underlying manifold. Experiments on benchmark image classification datasets substantiate the superiority of the proposed method over its locality and sparsity based rivals.

Spatio-temporal Video Representation with Locality-Constrained Linear Coding

Lecture Notes in Computer Science, 2012

This paper presents a spatio-temporal coding technique for a video sequence. The framework is based on a space-time extension of scale-invariant feature transform (SIFT) combined with localityconstrained linear coding (LLC). The coding scheme projects each spatiotemporal descriptor into a local coordinate representation produced by max pooling. The extension is evaluated using human action classification tasks. Experiments with the KTH, Weizmann, UCF sports and Hollywood datasets indicate that the approach is able to produce results comparable to the state-of-the-art. * Extension of the current LLC scheme from a 2D image to a spatio-temporal video signal; * Provision of a robust schema to represent a human action signal; * Application of the spatio-temporal LLC for human action classification achieving the state-of-the-art performance on several benchmarks.

Encoding Optimization Using Nearest Neighbor Descriptor

Communications in Computer and Information Science, 2014

The Bag-of-words framework is probably one of the best models used in image classification. In this model, coding plays a very important role in the classification process. There are many coding methods that have been proposed to encode images in different ways. The relationship between different codewords is studied, but the relationship among descriptors is not fully discovered. In this work, we aim to draw a relationship between descriptors, and propose a new method that can be used with other coding methods to improve the performance. The basic idea behind this is encoding the descriptor not only with its nearest codewords but also with the codewords of its nearest neighboring descriptors. Experiments on several benchmark datasets show that even using this simple relationship between the descriptors helps to improve coding methods.

Sparse coded spatial pyramid matching and multi-kernel integrated SVM for non-linear scene classification

Journal of Electrical Engineering

Support vector machine (SVM) techniques and deep learning have been prevalent in object classification for many years. However, deep learning is computation-intensive and can require a long training time. SVM is significantly faster than Convolution Neural Network (CNN). However, the SVM has limited its applications in the mid-size dataset as it requires proper tuning. Recently the parameterization of multiple kernels has shown greater flexibility in the characterization of the dataset. Therefore, this paper proposes a sparse coded multi-scale approach to reduce training complexity and tuning of SVM using a non-linear fusion of kernels for large class natural scene classification. The optimum features are obtained by parameterizing the dictionary, Scale Invariant Feature Transform (SIFT) parameters, and fusion of multiple kernels. Experiments were conducted on a large dataset to examine the multi-kernel space capability to find distinct features for better classification. The propos...

Local feature selection for efficient binary descriptor coding

2014 IEEE International Conference on Image Processing (ICIP), 2014

In a visual sensor network, a large number of camera nodes are able to acquire and process image data locally, collaborate with other camera nodes and provide a description about the captured events. Typically, camera nodes have severe constraints in terms of energy, bandwidth resources and processing capabilities. Considering these unique characteristics, coding and transmission of the pixel-level representation of the visual scene must be avoided, due to the energy resources required. A promising approach is to extract at the camera nodes, compact visual features that are coded to meet the bandwidth and power requirements of the underlying network and devices. Since the total number of features extracted from an image may be rather significant, this paper proposes a novel method to select the most relevant features before the actual coding process. The solution relies on a score that estimates the accuracy of each local feature. Then, local features are ranked and only the most relevant features are coded and transmitted. The selected features must maximize the efficiency of the image analysis task but also minimize the required computational and transmission resources. Experimental results show that higher efficiency is achieved when compared to the previous state-of-the-art.

Efficient Object Pixel-Level Categorization Using Bag of Features

Lecture Notes in Computer Science, 2009

In this paper we present a pixel-level object categorization method suitable to be applied under real-time constraints. Since pixels are categorized using a bag of features scheme, the major bottleneck of such an approach would be the feature pooling in local histograms of visual words. Therefore, we propose to bypass this time-consuming step and directly obtain the score of a linear Support Vector Machine classifier. This is achieved by creating an integral image of the components of the SVM which can readily obtain the classification score for any image sub-window with only 10 additions and 2 products, regardless of its size. Besides, we evaluated the performance of two efficient feature quantization methods: the Hierarchical K-Means and the Extremely Randomized Forest. All experiments have been done in the Graz02 database, showing comparable, or even better results to related work with a lower computational cost.