Boosting Classifiers for Scene Category Recognition (original) (raw)
Related papers
2013
The bag of visual words (BOW) model is an efficient image representation technique for image categorisation and annotation tasks. Building good visual vocabularies, from automatically extracted image feature vectors, produces discriminative visual words which can improve the accuracy of image categorisation tasks. Most approaches that use the BOW model in categorising images ignore useful information that can be obtained from image classes to build visual vocabularies. Moreover, most BOW models use intensity features extracted from local regions and disregard colour information which is an important characteristic of any natural scene image. In this paper we show that integrating visual vocabularies generated from each image category, improves the BOW image representation and improves accuracy in natural scene image classification. We use a keypoints densitybased weighting method, to combine the BOW representation with image colour information on a spatial pyramid layout. In addition, we show that visual vocabularies generated from training images of one scene image dataset, can plausibly represent another scene image dataset on the same domain. This helps in reducing time and effort needed to build new visual vocabularies. The proposed approach is evaluated over three well-known scene classification datasets with 6, 8 and 15 scene categories respectively using 10-fold crossvalidation. The experimental results, using support vector machines with histogram intersection kernel, show that the proposed approach outperforms baseline methods such as Gist features, rgbSIFT features and different configurations of the BOW model.
Contextual Bag-of-Words for Visual Categorization
IEEE Transactions on Circuits and Systems for Video Technology, 2011
Bag-of-Words (BoW), which represents an image by the histogram of local patches on the basis of a visual vocabulary, has attracted intensive attention in visual categorization due to its good performance and flexibility. Conventional BoW neglects the contextual relations between local patches due to its Naive Bayesian assumption. However, it is well known that contextual relations play an important role for human beings to recognize visual categories from their local appearance. This paper proposes a novel contextual Bag-of-Words (CBoW) representation to model two kinds of typical contextual relations between local patches, i.e., a semantic conceptual relation and a spatial neighboring relation. To model the semantic conceptual relation, visual words are grouped on multiple semantic levels according to the similarity of class distribution induced by them, accordingly local patches are encoded and images are represented. To explore the spatial neighboring relation, an automatic term extraction technique is adopted to measure the confidence that neighboring visual words are relevant. Word groups with high relevance are used and their statistics are incorporated into the BoW representation. Classification is taken using the support vector machine (SVM) with an efficient kernel to incorporate the relational information. The proposed approach is extensively evaluated on two kinds of visual categorization tasks, i.e., video event and scene categorization. Experimental results demonstrate the importance of contextual relations of local patches and the CBoW shows superior performance to conventional BoW.
Scene categorization using bag of textons on spatial hierarchy
Image Processing, 2008. …, 2008
This paper proposes a method to recognize scene categories using bags of visual words obtained hierarchically partitioning into subregion the input images. Specifically, for each subregions the Texton histogram and the extension of the subregion is taken into account. The bags of visual words, obtained in this way, are weighted and used in a similarity measure during the categorization. Experimental tests using ten different scene categories show that the proposed approach achieves good performances with respect to the state of the art methods.
Image Category Recognition using Bag of Visual Words Representation
Transactions on Machine Learning and Artificial Intelligence, 2016
Image category recognition is one of the challenging tasks due to difference in image background, illumination, scale, clutter, rotation, etc. Bag-of-Visual-Words (BoVW) model is considered as the standard approach for image categorization. The performance of the BoVW is mainly depend on local features extracted from images. In this paper, a novel BoVW representation approach utilizing Compressed Local Retinal Features (CLRF) for image categorization is proposed. The CLRF uses interest point regions from images and transform them to log polar form. Then two dimensional Discrete Wavelet Transformation (2D DWT) is applied to compress the log polar form and the resultant are considered as features for the interest regions. These features are further used to build a visual vocabulary using k-means clustering algorithm. Then this visual vocabulary is used to form a histogram representation of each image where the images are further classified using Support Vector Machines (SVM) classifier. The performance of the proposed BoVW framework is evaluated using SIMPLIcity and butterflies datasets. The experimental results show that the proposed BoVW approach that uses CLRF is very competitive to the state-of-the-art methods.
Improving bag-of-words scheme for scene categorization
The Journal of China Universities of Posts and Telecommunications, 2012
Bag-of-words (BoW) representation becomes one of the most popular methods for representing image content and has been successfully applied to object categorization. This paper uses the newly proposed statistics of word activation forces (WAFs) to reduce the redundancy in the codebook used in the BoW model. In such a way, the representation of image features is improved. In addition, the authors propose a method using soft inverse document frequency (Soft-IDF) to optimize BoW based image features. Given visual words and the dataset, each visual word appears in different amount of images and also different times in each particular image. Some of the visual words appear rare in contrary to the frequent ones. The proposed method balances this case. Experiments show encouraging results in scene categorization by the proposed approach.
Categorization of Similar Objects Using Bag of Visual Words and Support Vector Machines
This paper studies the problem of visual subcategorization of objects within a larger category. Such categoriza-tion seems more challenging than categorization of objects from visually distinctive categories, previously pre-sented in the literature. The proposed methodology is based on "Bag of Visual Words" using Scale-Invariant Feature Transform (SIFT) descriptors and Support Vector Machines (SVM). We present the results of the ex-perimental session, both for categorization of visually similar and visually distinctive objects. In addition, we attempt to empirically identify the most effective visual dictionary size and the feature vector normalization scheme.
A New Bag of Words LBP (BoWL) Descriptor for Scene Image Classification
This paper explores a new Local Binary Patterns (LBP) based image descriptor that makes use of the bag-of-words model to significantly improve classification performance for scene images. Specifically, first, a novel multi-neighborhood LBP is introduced for small image patches. Second, this multi-neighborhood LBP is combined with frequency domain smoothing to extract features from an image. Third, the features extracted are used with spatial pyramid matching (SPM) and bag-of-words representation to propose an innovative Bag of Words LBP (BoWL) descriptor. Next, a comparative assessment is done of the proposed BoWL descriptor and the conventional LBP descriptor for scene image classification using a Support Vector Machine (SVM) classifier. Further , the classification performance of the new BoWL descriptor is compared with the performance achieved by other researchers in recent years using some popular methods. Experiments with three fairly challenging publicly available image datasets show that the proposed BoWL descriptor not only yields significantly higher classification performance than LBP, but also generates results better than or at par with some other popular image descriptors.
Scene Classification Using Localized Histogram of Oriented Gradients Method
Scene classification is an important and elementary problem in image understanding. It deals with large number of scenes in order to discover the common structure shared by all the scenes in a class. It is used in medical science (X-Ray, ECG and Endoscopy etc), criminal detection, gender classification, skin classification, facial image classification, generating weather information from satellite image; identify vegetation types, anthropogenic structures, mineral resources, or transient changes in any of these properties. In this paper, at first we propose a feature extraction method named LHOG or Localized HOG. We consider that an image contains some important region which helps to find similarity with same class of images. We generate local information from an image via our proposed LHOG method. Then by combing all the local information we generate the global descriptor using Bag of Feature (BoF) method which is finally used to represent and classify an image accurately and efficiently. In classification purpose, we use Support Vector Machine (SVM) that analyze data and recognize patterns. The basic SVM takes a set of input data and predicts, for each given input, which of two possible classes forms the output. In our paper, we use six different classes of images.
Visual categorization with bags of keypoints
We present a novel method for generic visual categorization: the problem of identifying the object content of natural images while generalizing across variations inherent to the object class. This bag of keypoints method is based on vector quantization of affine invariant descriptors of image patches. We propose and compare two alternative implementations using different classifiers: Naïve Bayes and SVM. The main advantages of the method are that it is simple, computationally efficient and intrinsically invariant. We present results for simultaneously classifying seven semantic visual categories. These results clearly demonstrate that the method is robust to background clutter and produces good categorization accuracy even without exploiting geometric information.