Adaptive Bag-of-Visual Word Modelling using Stacked-Autoencoder and Particle Swarm Optimisation for the Unsupervised Categorisation of Images (original) (raw)

Local Image Feature Extraction using Stacked-Autoencoder in the Bag-of-Visual Word modelling of Images

International Conference on Computer and Communications, 2019

The Bag-of-Visual Words has been recognised as an effective mean of representing images for image classification. However, its reliance on hand crafted image feature extraction algorithms often results in significant computational overhead, and poor classification accuracies. Therefore, this paper presents a Bag-of-Visual Word Modelling in which Image Feature Extraction is achieved using Deep Feature Learning via Stacked-Autoencoder. The proposed method is tested using three image collections constituted from the Caltech 101 image collection and the results confirm the ability of deep feature learning to yield optimum image categorisation performance.

The Dual Codebook : Combining Bags of Visual Words in Image Classification

2016

In this paper, we evaluate the performance of two conventional bag of words approaches, using two basic local feature descriptors, to perform image classification. These approaches are compared to a novel design which combines two bags of visual words, using two different feature descriptors. The system extends earlier work wherein a bag of visual words approach with an L2 support vector machine classifier outperforms several alternatives. The descriptors we test are raw pixel intensities and the Histogram of Oriented Gradients. Using a novel Primal Support Vector Machine as a classifier, we perform image classification on the CIFAR-10 and MNIST datasets. Results show that the dual codebook implementation successfully utilizes the potential contributive information encapsulated by an alternative feature descriptor and increases performance, improving classification by 5-18% on CIFAR-10, and 0.22-1.03% for MNIST compared to the simple bag of words approaches.

Hybrid Classification in Bag of Visual Words Model

Circulation in Computer Science, 2018

Classification of images by means of the BOVW method is well known and applied in different recognition projects, this method rely on three phases: detection and extraction of characteristics, representation of the image and finally the classification. SIFT, Kmeans and SVM is the most accepted combination. This article aims to demonstrate that this combination is not always the best choice for all types of datasets, different training sets of images were created from scratch and will be used for the bag of visual words model: the first phase of detection and extraction, SIFT will be used, later in the second phase a dictionary of words will be created through a clustering process using K-means, EM, K-means in combination with EM, finally, for classification it will be compared the algorithms of SVM, Gaussian NB, KNN, Decision Tree, Random Forest, Neural Network and AdaBoost in order to determine the performance and accuracy of every method.

Image Category Recognition using Bag of Visual Words Representation

Transactions on Machine Learning and Artificial Intelligence, 2016

Image category recognition is one of the challenging tasks due to difference in image background, illumination, scale, clutter, rotation, etc. Bag-of-Visual-Words (BoVW) model is considered as the standard approach for image categorization. The performance of the BoVW is mainly depend on local features extracted from images. In this paper, a novel BoVW representation approach utilizing Compressed Local Retinal Features (CLRF) for image categorization is proposed. The CLRF uses interest point regions from images and transform them to log polar form. Then two dimensional Discrete Wavelet Transformation (2D DWT) is applied to compress the log polar form and the resultant are considered as features for the interest regions. These features are further used to build a visual vocabulary using k-means clustering algorithm. Then this visual vocabulary is used to form a histogram representation of each image where the images are further classified using Support Vector Machines (SVM) classifier. The performance of the proposed BoVW framework is evaluated using SIMPLIcity and butterflies datasets. The experimental results show that the proposed BoVW approach that uses CLRF is very competitive to the state-of-the-art methods.

Efficient Object Pixel-Level Categorization Using Bag of Features

Lecture Notes in Computer Science, 2009

In this paper we present a pixel-level object categorization method suitable to be applied under real-time constraints. Since pixels are categorized using a bag of features scheme, the major bottleneck of such an approach would be the feature pooling in local histograms of visual words. Therefore, we propose to bypass this time-consuming step and directly obtain the score of a linear Support Vector Machine classifier. This is achieved by creating an integral image of the components of the SVM which can readily obtain the classification score for any image sub-window with only 10 additions and 2 products, regardless of its size. Besides, we evaluated the performance of two efficient feature quantization methods: the Hierarchical K-Means and the Extremely Randomized Forest. All experiments have been done in the Graz02 database, showing comparable, or even better results to related work with a lower computational cost.

Language modeling for bag-of-visual words image categorization

Proceedings of the 2008 international conference on Content-based image and video retrieval - CIVR '08, 2008

In this paper, we propose two ways of improving image classification based on bag-of-words representation . Two shortcomings of this representation are the loss of the spatial information of visual words and the presence of noisy visual words due to the coarseness of the vocabulary building process. On the one hand, we propose a new representation of images that goes further in the analogy with textual data: visual sentences, that allows us to "read" visual words in a certain order, as in the case of text. We can therefore consider simple spatial relations between words. We also present a new image classification scheme that exploits these relations. It is based on the use of language models, a very popular tool from speech and text analysis communities. On the other hand, we propose new techniques to eliminate useless words, one based on geometric properties of the keypoints, the other on the use of probabilistic Latent Semantic Analysis (pLSA). Experiments show that our techniques can significantly improve image classification, compared to a classical Support Vector Machine-based classification.

Combining bag of visual words-based features with CNN in image classification

Combining bag of visual words-based features with CNN in image classification, 2023

Although traditional image classification techniques are often used in authentic ways, they have several drawbacks, such as unsatisfactory results, poor classification accuracy, and a lack of flexibility. In this study, we introduce a combination of convolutional neural network (CNN) and support vector machine (SVM), along with a modified bag of visual words (BoVW)-based image classification model. BoVW uses scale-invariant feature transform (SIFT) and Oriented Fast and Rotated BRIEF (ORB) descriptors; as a consequence, the SIFT-ORB-BoVW model developed contains highly discriminating features, which enhance the performance of the classifier. To identify appropriate images and overcome challenges, we have also explored the possibility of utilizing a fuzzy Bag of Visual Words (BoVW) approach. This study also discusses using CNNs/SVM to improve the proposed feature extractor's ability to learn more relevant visual vocabulary from the image. The proposed technique was compared with classic BoVW. The experimental results proved the significant enhancement of the proposed technique in terms of performance and accuracy over state-of-the-art models of BoVW.

Improving bag-of-words scheme for scene categorization

The Journal of China Universities of Posts and Telecommunications, 2012

Bag-of-words (BoW) representation becomes one of the most popular methods for representing image content and has been successfully applied to object categorization. This paper uses the newly proposed statistics of word activation forces (WAFs) to reduce the redundancy in the codebook used in the BoW model. In such a way, the representation of image features is improved. In addition, the authors propose a method using soft inverse document frequency (Soft-IDF) to optimize BoW based image features. Given visual words and the dataset, each visual word appears in different amount of images and also different times in each particular image. Some of the visual words appear rare in contrary to the frequent ones. The proposed method balances this case. Experiments show encouraging results in scene categorization by the proposed approach.

Contextual Bag-of-Words for Visual Categorization

IEEE Transactions on Circuits and Systems for Video Technology, 2011

Bag-of-Words (BoW), which represents an image by the histogram of local patches on the basis of a visual vocabulary, has attracted intensive attention in visual categorization due to its good performance and flexibility. Conventional BoW neglects the contextual relations between local patches due to its Naive Bayesian assumption. However, it is well known that contextual relations play an important role for human beings to recognize visual categories from their local appearance. This paper proposes a novel contextual Bag-of-Words (CBoW) representation to model two kinds of typical contextual relations between local patches, i.e., a semantic conceptual relation and a spatial neighboring relation. To model the semantic conceptual relation, visual words are grouped on multiple semantic levels according to the similarity of class distribution induced by them, accordingly local patches are encoded and images are represented. To explore the spatial neighboring relation, an automatic term extraction technique is adopted to measure the confidence that neighboring visual words are relevant. Word groups with high relevance are used and their statistics are incorporated into the BoW representation. Classification is taken using the support vector machine (SVM) with an efficient kernel to incorporate the relational information. The proposed approach is extensively evaluated on two kinds of visual categorization tasks, i.e., video event and scene categorization. Experimental results demonstrate the importance of contextual relations of local patches and the CBoW shows superior performance to conventional BoW.

Bag-of-Visual Words Codebook Development for the Semantic Content Based Annotation of Images

11th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), 2015

The Bag-of-Visual has been recognised as an effective mean of representing images for the purpose of image classification. This paper explains that the quality and quantity of visual-words in the Bag-of-Visual Words codebook generated from an image collection should correlate to the diversity of image contents, and proposes a BOVW codebook development approach that uses the elimination of image features spatial redundancy, batch vector quantisation, and the imposition of an image feature similarity threshold function in generating a codebook that considers the content diversity of the image collection to be classified. With the aid of experimental image collections constituted from Caltech-101 Image set, this paper also demonstrates the importance of this codebook development approach in the determination of the suitable number of latent topics for the implementation of image categorisation via Probabilistic Latent Semantic Analysis for the semantic content annotation of images.