Adaptive Bag-of-Visual Word Modelling using Stacked-Autoencoder and Particle Swarm Optimisation for the Unsupervised Categorisation of Images (original) (raw)

Abstract

The Bag-of-Visual Words has been recognised as an effective mean of representing images for image classification. However, its reliance on a visual codebook developed using Hand Crafted image feature extraction algorithms and vector quantisation via k-means clustering often results in significant computational overhead, and poor classification accuracies. Therefore, this paper presents an adaptive Bag-of-Visual Word Modelling in which Image Feature Extraction is achieved using Deep Feature Learning and the amount of computation required for the development of Visual Codebook is minised using a batch implementation of Particle Swarm Optimisation. The proposed method is tested using Caltech 101 image dataset, and the results confirm the suitability of the proposed method in improving the categorisation performance while reducing the computational load.

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (79)

Deljooi, H., Jassbi, S.J.: 'A multi-criteria decision-making-based approach for semantic image annotation', Int. J. Comput.-Aided Technol., 2015, 2, (1), pp. 17-30
Deljooi, H., Eskandari, A.: 'A novel semantic statistical model for automatic image annotation using the relationship between the regions based on multi- criteria decision making', Int. J. Electr. Comput. Eng., 2014, 4, (1), pp. 37-51
Sahlani, H., Hourali, M.: 'A novel semantic statistical model for automatic image annotation using ontology', Majlesi J. Multimed. Process., 2015, 4, (2), pp. 1-10
Mookdarsanit, P., Mookdarsanit, L.: 'An automatic image tagging of Thai dance's gestures'. Joint Conf. ACTIS & NCOBA, Pranakhon Si Ayutthaya, Thailand, 2018
Zhang, D., Islam, M.M., Lu, G.: 'A review on image annotation techniques', Pattern Recognit., 2012, 45, (1), pp. 345-362
Cao, J., Wu, C., Chen, L., et al.: 'An improved convolutional neural network algorithm and its application in multilabel image labeling', Comput. Intell. Neurosci., 2019, 2019, pp. 1-12
Russakovsky, O., Deng, J., Su, H., et al.: 'ImageNet large-scale visual recognition challenge', Cornell University, 2015
Olaode, A., Naghdy, G.: 'Local image feature extraction using stacked autoencoder in the bag-of-visual word modelling of images'. Fifth IEEE Int. Conf. Computer and Communication, Chengdu, People's Republic of China, 2019
Deng, J., Berg, A.C., Li, K., et al.: 'What does classifying more than 10,000 image categories tell us?'. European Conf. Computer Vision, Crete, Greece, 2010
Olaode, A., Naghdy, G., Todd, C.: 'Unsupervised classification of images: a review', Int. J. Image Process., 2014, 8, (5), pp. 325-342
Zazzaro, G., Martone, A.: 'ECF-means -ensemble clustering fuzzification means'. Eighth Int. Conf. Advances in Information Mining and Management, Barcelona, Spain, 2018
Huang, Y., Liu, Q., Lv, F., et al.: 'Unsupervised image categorization by hypergraph partition', IEEE Trans. Pattern Anal. Mach. Intell., 2011, 33, (6), pp. 1266-1273
Zhang, W., Wang, X., Zhao, D., et al.: 'Graph degree linkage: agglomerative clustering on a directed graph'. European Conf. Computer Vision, Florence, Italy, 2012
Kim, G., Faloutsos, C., Hebert, M.: 'Unsupervised modeling of object categories using link analysis techniques'. IEEE Conf. Computer Vision and Pattern Recognition, Anchorage, AL, USA, 2008
Balakrishnan, R., Kumar, K.: 'An application of genetic algorithm with iterative chromosomes for image clustering problems', Int. J. Comput. Sci., 2012, 9, (1), pp. 60-67
Datta, R., Joshi, D., Li, J., et al.: 'Image retrieval: ideas, influences, and trends of the new age', ACM Comput. Surv., 2008, 40, (2), p. Article 5, pp. 1- 60
Wang, H.H., Mohamad, D., Ismail, N.A.: 'Semantic gap in CBIR: automatic objects spatial relationships semantic extraction and representation', Int. J. Image Process. (IJIP), 2010, 4, (3), pp. 192-204
Xu, J., Li, H., Liu, P., et al.: 'A novel hyperspectral image clustering method with context-aware unsupervised discriminative extreme learning machine', IEEE Access, 2018, 6, pp. 16176-16188
Baig, F., Rashid, M., Javid, M.A., et al.: 'Boosting the performance of the BoVW model using SURF-CoHOG-based sparse features with relevance feedback for CBIR', Iran. J. Sci. Technol. Trans. Electr. Eng., 2019, 1, (4), pp. 1-20
Faheema, A., Subrata, R.: 'Feature selection using bag-of-visual words representation', Centre for AI and Robotics (CAIR), Bangalore, India, 2010
Tirilly, P., Claveau, V., Gros, P.: 'Language modelling for bag-of-visual words image categorisation'. CNRS-IRSA Image Processing and Computer Vision, Rennes, France, 2008
Wu, J., Tan, W.-C., Rehg, J.M.: 'Efficient and effective visual codebook generation using additive kernels', J. Mach. Learn. Res., 2011, 12, pp. 3097- 3118
Zhaoyin, J., Tsuham, C., Yimeng, Z.: 'Image retrieval with geometry- preserving visual phrases', School of Electrical and Computer Engineering, Cornell University, 2010
Srinivas, K., Srikanth, V.: 'A scientific approach for segmentation and clustering technique of improved K-means and neural networks', Int. J. Adv. Res. Comput. Sci. Softw. Eng., 2012, II, (7), pp. 183-189
Guo, J., Qiu, Z., Gurrin, C.: 'Exploring the optimal visual vocabulary sizes for semantic concept detection'. Int. Workshop on Content based Multimedia Indexing (CBMI), Veszprem, Hungary, 2013
Rane, R., Khadse, B.K., Suralkar, S.R.: 'A review of object recognition using visual codebook', Int. J. Comput. Sci. Mob. Comput., 2013, II, (2), pp. 74-79
Van Gemert, J.C., Snoek, C.G., Veenman, C.J., et al.: 'Comparing compact codebooks for visual categorization', Comput. Vis. Image Underst., 2010, 114, pp. 450-462
Suhail, Z., Mahmood, A., Denton, E., et al.: 'Bag-of-visual words based approach for the classification of benign and malignant masses in mammograms using voting-based feature encoding'. 14th Int. Workshop on Breast Imaging (IWBI 2018), Atlanta, GA, USA, 2018
Wang, R., Ding, K., Yang, J., et al.: 'A novel method for image classification based on bag of visual words', J. Vis. Commun. Image Represent., 2016, 40, pp. 24-33
Chanti, D., Caplier, A.: 'Improving bag-of-visual words towards effective facial expressive image classification'. 13th Int. Joint Conf. Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2018), Madeira, Portugal, 2018
Mandal, R., Roy, P.P., Pal, U., et al.: 'Bag-of-visual words for signature-based multi-script document retrieval', Cornell University, 2018
Law, M., Thome, N., Cord, M.: 'Bag-of-words image representation: key ideas and further insight', in (Eds.): 'Fusion in computer vision, advances in computer vision and pattern recognition' (Springer International Publishing, Switzerland, 2014), pp. 29-52
Singhal, N., Singhal, N., Kalaichelvi, V.: 'Image classification using bag of visual words model with FAST and FREAK'. 2017 Second Int. Conf. Electrical, Computer and Communication Technologies (ICECCT), Coimbatore, France, 2017
Jiang, J., Wu, D., Jiang, Z.: 'A correlation-based bag of visual words for image classification'. 2017 IEEE Third Information Technology and Mechatronics Engineering Conf. (ITOEC), Chongqing, People's Republic of China, 2017
Li, W., Dong, Y.: 'Scene classification based on the bag-of-visual words and Doc2Vec models for high-spatial resolution remote-sensing imagery', J. Appl. Remote Sens., 2019, 13, (2), art no. 026506
Xu, K., Yang, W., Liu, G., et al.: 'Unsupervised satellite image classification using Markov field topic model', IEEE Geosci. Remote Sens. Lett., 2013, 10, (1), pp. 130-134
Bosch, A., Zisserman, A., Munoz, A.X.: 'Scene classification via PLSA', Computer Vision and Robotics Group, University of Girona, Girona, Spain, 2006
Olukanmi, P., Nelwamondo, F., Marwala, T.: 'k-means-lite: real time clustering for large datasets'. IEEE Fifth Int. Conf. Soft Computing and Machine Intelligence, Nairobi, Kenya, 2018
Tsai, C.-F.: 'Two strategies for bag-of-visual words feature extraction'. Seventh Int. Congress on Advanced Applied Informatics (IIAI-AAI), Yonago, Japan, 2018
Wan, J., Wang, D., Hoi, S.C., et al.: 'Deep learning for content-based image retrieval: a comprehensive study'. IEEE Conf. Computer Vision and Pattern Recognition Workshops, Orlando, FL, USA, 2015
Najafabadi, M.N., Villanustre, F., Khoshgoftaar, T.M., et al.: 'Deep learning applications and challenges in big data analytics', J. Big Data, 2015, 2, (1), pp. 1-21
Karpathy, A., Fei-Fei, L.: 'Deep visual semantic alignments for generating image descriptions', IEEE Trans. Pattern Anal. Mach. Intell., 2016, 39, (4), pp. 664-676
Bengio, Y., Courville, A., Vincent, P.: 'Representation learning: a review and new perspectives', IEEE Trans. Pattern Anal. Mach. Intell., 2014, 35, (8), pp. 1798-1828
Wu, Y., Razavi, R.: 'An introduction to deep learning: examining the advantages of hierarchical learning', Predictive Analytics, Santa Barbara, CA, USA, 2015
Zhang, J., He, Z., Zhang, J., et al.: 'Cograph regularized collective non- negative matrix factorization for multilabel image annotation', IEEE Access, 2019, 7, pp. 88338-88356
Krizhevsky, A., Sutskever, I., Hinton, G.: 'ImageNet classification with deep convolutional neural networks', Adv. Neural. Inf. Process. Syst., 2012, 25, pp. 1106-1114
Patel, T., Kapadia, M., Maisuria, J.: 'A review on content-based image retrieval', Int. J. Comput. Appl., 2015, 132, (13), pp. 22-25
Wang, S., Ding, Z., Fu, Y.: 'Feature selection guided auto-encoder'. 31st AAAI Conf. Artificial Intelligence (AAAI-17), San Francisco, CA, USA, 2017
Chu, W., Cai, D.: 'Stacked similarity-aware autoencoders'. Proc. 26th Int. Joint Conf. Artificial Intelligence (IJCAI-17), Melbourne, Australia, 2017
Hinton, G.E., Salakhutdinov, R.: 'Reducing the dimensionality of data with neural networks', Science, 2006, 313, pp. 504-507
Xu, Q., Zhang, C., Zhang, L., et al.: 'The learning effect of different hidden layers stacked autoencoder'. Eighth Int. Conf. Intelligent Human-Machine Systems and Cybernetics (IHMSC), Hangzhou, People's Republic of China, 2016
Wu, Z., Junqing, Y.: 'A multi-level descriptor using ultra-deep feature for image retrieval', J. Multimed. Tools Appl., 2019, 78, (318), pp. 1-18 [53] Wu, W., Sun, D.: 'Multiple deep CNN for image annotation'. Tenth Int. Conf. Graphics and Image Processing, Chengdu, People's Republic of China, 2018
Hsu, C.-C., Lin, C.-W.: 'Unsupervised convolutional neural networks for large-scale image clustering'. IEEE Int. Conf. Image Processing, Beijing, People's Republic of China, 2017
Gui, L.-Y., Gui, L., Wang, Y.-X., et al.: 'Factorized convolutional networks: unsupervised fine-tuning for image clustering'. IEEE Winter Conf. Applications of Computer Vision, Lake Tahoe, NV, USA, 2018
Radenovic, F., Tolias, G., Chum, O.: 'CNN image retrieval learns from BOW: unsupervised fine-tuning with hard examples'. European Conf. Computer Vision, Amsterdam, Netherlands, 2016
Bay, H., Tuytelaars, T., Gool, L.V.: 'SURF: speeded-up robust features', ETH Zurich, Zurich, Switzerland, 2005
Shinde, T.S., Tiwari, A.K.: 'Pruning SIFT & SURF for efficient clustering of near-duplicate images'. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019
Balayil, M., Kumar, G.S., Anees, V.M.: 'Automatic multilabelling of images and semantic relation extraction'. Intelligent Signal Processing Conf. (ICISP), Cherbourg, France, 2018
Li, Y., Xu, Y., Wang, J., et al.: 'MS-RMAC: multiscale regional maximum activation of convolutions for image retrieval', IEEE Signal Process. Lett., 2017, 24, (5), pp. 609-613
EL Agha, M., Ashour, W.: 'Efficient and fast initialisation algorithm for k- means clustering', Int. J. Intell. Syst. Appl., 2012, I, pp. 21-31
Tsai, C.-F.: 'Bag-of-words representation in image annotation: a review', ISRN Artif. Intell., 2012, 2012, pp. 1-19
Jurie, F., Triggs, B.: 'Creating efficient codebooks for visual recognition'. Tenth IEEE Int. Conf. Computer Vision, Beijing, 2005
Hastie, T., Tibshirani, R., Friedman, J.: 'The elements of statistical learning- data mining, inference and prediction', vol. II (Springer, Stanford, 2008, 2nd edn.), pp. 465-576
Salvador, S., Chan, P.: 'Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms'. 16th IEEE Int. Conf. Tools with Artificial Intelligence, Boca Raton, FL, USA, 2004
Yan, M.: 'Methods of determining the number of clusters in a data set and a new clustering criterion', Virginia Polytechnic Institute and State University, Blacksburg, VI, USA, 2005
Jun, Y., Chong-Wah, N., Alexander, G.H., et al.: 'Representations of keypoint-based semantic concept detection: a comprehensive study', City University of Hong Kong, Hong Kong, People's Republic of China, 2008
Battiato, S., Farinella, G.M., Meccio, T., et al.: 'Bags of phrases with codebooks alignment for near-duplicate image detection'. Multimedia in Forensics, Security and Intelligence, Florence, Italy, 2010
Olaode, A., Naghdy, G., Todd, C.: 'Bag-of-visual words codebook development for the semantic content-based annotation of images'. Signal Image Technology and Internet based System, Bangkok, Thailand, 2015
Olaode, A.A., Naghdy, G., Todd, C.A.: 'Efficient region of interest detection using blind image division'. Signal Processing Symp., Debe, Poland, 2015
Pelleg, D., Moore, A.: 'X-means: extending k-means with efficient estimation of the number of clusters'. 17th Int. Conf. Machine Learning, Stanford, CA, USA, 2000
Kersorn, K., Chimlek, S., Poslad, S., et al.: 'Visual content representation using semantically similar visual words', Expert Syst. Appl., 2011, 38, pp. 11472-11481
Tsou, D., MacNish, C.: 'Adaptive particle swarm optimisation for high- dimensional highly convex search spaces'. The 2003 Congress on Evolutionary Computation, Canberra, Australia, 2003
Ballardini, A.L.: 'A tutorial on particle swarm optimization clustering', Cornell University, New York, NY, USA, 2016
Kamel, N., Ouchen, I., Baali, K.: 'A sampling-PSO-K-means algorithm for document clustering'. Seventh Int. Conf. Genetic and Evolutionary Computing (ICGEC), Prague, Czech Republic, 2013
Gong, C., Chen, H., He, W., et al.: 'Improved multi-objective clustering algorithm using particle swarm optimization', PLoS One, 2017, 12, (12), p. e0188815, 5 December 2017. Available at https://journals.plos.org/plosone/ article?id=10.1371/journal.pone.0188815, accessed 2 July 2019
Lazebnik, S., Schmid, C., Ponce, J.: 'Beyond bags of features: spatial pyramid matching for recognizing natural scene categories'. 2006 IEEE Computer Society Conf. Computer Vision and Pattern Recognition, New York, NY, USA, 2006
Hofmann, T.: 'Unsupervised learning by probabilistic latent semantic analysis', Machine learning, 2001 42, pp. 177-196
Olaode, A., Naghdy, G.: 'Elimination of spatial incoherency in bag-of-visual words image representation using visual sentence modelling'. Int. Conf. Image and Vision Computing New Zealand (IVCNZ), Auckland, New Zealand, 2018
Gao, Z., Wang, L., Zhou, L.: 'A probabilistic approach to cross-region matching-based image retrieval', IEEE Trans. Image Process., 2019, 28, (3), pp. 1191-1204