Adaptive Bag-of-Visual Word Modelling using Stacked-Autoencoder and Particle Swarm Optimisation for the Unsupervised Categorisation of Images (original) (raw)
Abstract
The Bag-of-Visual Words has been recognised as an effective mean of representing images for image classification. However, its reliance on a visual codebook developed using Hand Crafted image feature extraction algorithms and vector quantisation via k-means clustering often results in significant computational overhead, and poor classification accuracies. Therefore, this paper presents an adaptive Bag-of-Visual Word Modelling in which Image Feature Extraction is achieved using Deep Feature Learning and the amount of computation required for the development of Visual Codebook is minised using a batch implementation of Particle Swarm Optimisation. The proposed method is tested using Caltech 101 image dataset, and the results confirm the suitability of the proposed method in improving the categorisation performance while reducing the computational load.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (79)
- Deljooi, H., Jassbi, S.J.: 'A multi-criteria decision-making-based approach for semantic image annotation', Int. J. Comput.-Aided Technol., 2015, 2, (1), pp. 17-30
- Deljooi, H., Eskandari, A.: 'A novel semantic statistical model for automatic image annotation using the relationship between the regions based on multi- criteria decision making', Int. J. Electr. Comput. Eng., 2014, 4, (1), pp. 37-51
- Sahlani, H., Hourali, M.: 'A novel semantic statistical model for automatic image annotation using ontology', Majlesi J. Multimed. Process., 2015, 4, (2), pp. 1-10
- Mookdarsanit, P., Mookdarsanit, L.: 'An automatic image tagging of Thai dance's gestures'. Joint Conf. ACTIS & NCOBA, Pranakhon Si Ayutthaya, Thailand, 2018
- Zhang, D., Islam, M.M., Lu, G.: 'A review on image annotation techniques', Pattern Recognit., 2012, 45, (1), pp. 345-362
- Cao, J., Wu, C., Chen, L., et al.: 'An improved convolutional neural network algorithm and its application in multilabel image labeling', Comput. Intell. Neurosci., 2019, 2019, pp. 1-12
- Russakovsky, O., Deng, J., Su, H., et al.: 'ImageNet large-scale visual recognition challenge', Cornell University, 2015
- Olaode, A., Naghdy, G.: 'Local image feature extraction using stacked autoencoder in the bag-of-visual word modelling of images'. Fifth IEEE Int. Conf. Computer and Communication, Chengdu, People's Republic of China, 2019
- Deng, J., Berg, A.C., Li, K., et al.: 'What does classifying more than 10,000 image categories tell us?'. European Conf. Computer Vision, Crete, Greece, 2010
- Olaode, A., Naghdy, G., Todd, C.: 'Unsupervised classification of images: a review', Int. J. Image Process., 2014, 8, (5), pp. 325-342
- Zazzaro, G., Martone, A.: 'ECF-means -ensemble clustering fuzzification means'. Eighth Int. Conf. Advances in Information Mining and Management, Barcelona, Spain, 2018
- Huang, Y., Liu, Q., Lv, F., et al.: 'Unsupervised image categorization by hypergraph partition', IEEE Trans. Pattern Anal. Mach. Intell., 2011, 33, (6), pp. 1266-1273
- Zhang, W., Wang, X., Zhao, D., et al.: 'Graph degree linkage: agglomerative clustering on a directed graph'. European Conf. Computer Vision, Florence, Italy, 2012
- Kim, G., Faloutsos, C., Hebert, M.: 'Unsupervised modeling of object categories using link analysis techniques'. IEEE Conf. Computer Vision and Pattern Recognition, Anchorage, AL, USA, 2008
- Balakrishnan, R., Kumar, K.: 'An application of genetic algorithm with iterative chromosomes for image clustering problems', Int. J. Comput. Sci., 2012, 9, (1), pp. 60-67
- Datta, R., Joshi, D., Li, J., et al.: 'Image retrieval: ideas, influences, and trends of the new age', ACM Comput. Surv., 2008, 40, (2), p. Article 5, pp. 1- 60
- Wang, H.H., Mohamad, D., Ismail, N.A.: 'Semantic gap in CBIR: automatic objects spatial relationships semantic extraction and representation', Int. J. Image Process. (IJIP), 2010, 4, (3), pp. 192-204
- Xu, J., Li, H., Liu, P., et al.: 'A novel hyperspectral image clustering method with context-aware unsupervised discriminative extreme learning machine', IEEE Access, 2018, 6, pp. 16176-16188
- Baig, F., Rashid, M., Javid, M.A., et al.: 'Boosting the performance of the BoVW model using SURF-CoHOG-based sparse features with relevance feedback for CBIR', Iran. J. Sci. Technol. Trans. Electr. Eng., 2019, 1, (4), pp. 1-20
- Faheema, A., Subrata, R.: 'Feature selection using bag-of-visual words representation', Centre for AI and Robotics (CAIR), Bangalore, India, 2010
- Tirilly, P., Claveau, V., Gros, P.: 'Language modelling for bag-of-visual words image categorisation'. CNRS-IRSA Image Processing and Computer Vision, Rennes, France, 2008
- Wu, J., Tan, W.-C., Rehg, J.M.: 'Efficient and effective visual codebook generation using additive kernels', J. Mach. Learn. Res., 2011, 12, pp. 3097- 3118
- Zhaoyin, J., Tsuham, C., Yimeng, Z.: 'Image retrieval with geometry- preserving visual phrases', School of Electrical and Computer Engineering, Cornell University, 2010
- Srinivas, K., Srikanth, V.: 'A scientific approach for segmentation and clustering technique of improved K-means and neural networks', Int. J. Adv. Res. Comput. Sci. Softw. Eng., 2012, II, (7), pp. 183-189
- Guo, J., Qiu, Z., Gurrin, C.: 'Exploring the optimal visual vocabulary sizes for semantic concept detection'. Int. Workshop on Content based Multimedia Indexing (CBMI), Veszprem, Hungary, 2013
- Rane, R., Khadse, B.K., Suralkar, S.R.: 'A review of object recognition using visual codebook', Int. J. Comput. Sci. Mob. Comput., 2013, II, (2), pp. 74-79
- Van Gemert, J.C., Snoek, C.G., Veenman, C.J., et al.: 'Comparing compact codebooks for visual categorization', Comput. Vis. Image Underst., 2010, 114, pp. 450-462
- Suhail, Z., Mahmood, A., Denton, E., et al.: 'Bag-of-visual words based approach for the classification of benign and malignant masses in mammograms using voting-based feature encoding'. 14th Int. Workshop on Breast Imaging (IWBI 2018), Atlanta, GA, USA, 2018
- Wang, R., Ding, K., Yang, J., et al.: 'A novel method for image classification based on bag of visual words', J. Vis. Commun. Image Represent., 2016, 40, pp. 24-33
- Chanti, D., Caplier, A.: 'Improving bag-of-visual words towards effective facial expressive image classification'. 13th Int. Joint Conf. Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2018), Madeira, Portugal, 2018
- Mandal, R., Roy, P.P., Pal, U., et al.: 'Bag-of-visual words for signature-based multi-script document retrieval', Cornell University, 2018
- Law, M., Thome, N., Cord, M.: 'Bag-of-words image representation: key ideas and further insight', in (Eds.): 'Fusion in computer vision, advances in computer vision and pattern recognition' (Springer International Publishing, Switzerland, 2014), pp. 29-52
- Singhal, N., Singhal, N., Kalaichelvi, V.: 'Image classification using bag of visual words model with FAST and FREAK'. 2017 Second Int. Conf. Electrical, Computer and Communication Technologies (ICECCT), Coimbatore, France, 2017
- Jiang, J., Wu, D., Jiang, Z.: 'A correlation-based bag of visual words for image classification'. 2017 IEEE Third Information Technology and Mechatronics Engineering Conf. (ITOEC), Chongqing, People's Republic of China, 2017
- Li, W., Dong, Y.: 'Scene classification based on the bag-of-visual words and Doc2Vec models for high-spatial resolution remote-sensing imagery', J. Appl. Remote Sens., 2019, 13, (2), art no. 026506
- Xu, K., Yang, W., Liu, G., et al.: 'Unsupervised satellite image classification using Markov field topic model', IEEE Geosci. Remote Sens. Lett., 2013, 10, (1), pp. 130-134
- Bosch, A., Zisserman, A., Munoz, A.X.: 'Scene classification via PLSA', Computer Vision and Robotics Group, University of Girona, Girona, Spain, 2006
- Olukanmi, P., Nelwamondo, F., Marwala, T.: 'k-means-lite: real time clustering for large datasets'. IEEE Fifth Int. Conf. Soft Computing and Machine Intelligence, Nairobi, Kenya, 2018
- Tsai, C.-F.: 'Two strategies for bag-of-visual words feature extraction'. Seventh Int. Congress on Advanced Applied Informatics (IIAI-AAI), Yonago, Japan, 2018
- Wan, J., Wang, D., Hoi, S.C., et al.: 'Deep learning for content-based image retrieval: a comprehensive study'. IEEE Conf. Computer Vision and Pattern Recognition Workshops, Orlando, FL, USA, 2015
- Najafabadi, M.N., Villanustre, F., Khoshgoftaar, T.M., et al.: 'Deep learning applications and challenges in big data analytics', J. Big Data, 2015, 2, (1), pp. 1-21
- Karpathy, A., Fei-Fei, L.: 'Deep visual semantic alignments for generating image descriptions', IEEE Trans. Pattern Anal. Mach. Intell., 2016, 39, (4), pp. 664-676
- Bengio, Y., Courville, A., Vincent, P.: 'Representation learning: a review and new perspectives', IEEE Trans. Pattern Anal. Mach. Intell., 2014, 35, (8), pp. 1798-1828
- Wu, Y., Razavi, R.: 'An introduction to deep learning: examining the advantages of hierarchical learning', Predictive Analytics, Santa Barbara, CA, USA, 2015
- Zhang, J., He, Z., Zhang, J., et al.: 'Cograph regularized collective non- negative matrix factorization for multilabel image annotation', IEEE Access, 2019, 7, pp. 88338-88356
- Krizhevsky, A., Sutskever, I., Hinton, G.: 'ImageNet classification with deep convolutional neural networks', Adv. Neural. Inf. Process. Syst., 2012, 25, pp. 1106-1114
- Patel, T., Kapadia, M., Maisuria, J.: 'A review on content-based image retrieval', Int. J. Comput. Appl., 2015, 132, (13), pp. 22-25
- Wang, S., Ding, Z., Fu, Y.: 'Feature selection guided auto-encoder'. 31st AAAI Conf. Artificial Intelligence (AAAI-17), San Francisco, CA, USA, 2017
- Chu, W., Cai, D.: 'Stacked similarity-aware autoencoders'. Proc. 26th Int. Joint Conf. Artificial Intelligence (IJCAI-17), Melbourne, Australia, 2017
- Hinton, G.E., Salakhutdinov, R.: 'Reducing the dimensionality of data with neural networks', Science, 2006, 313, pp. 504-507
- Xu, Q., Zhang, C., Zhang, L., et al.: 'The learning effect of different hidden layers stacked autoencoder'. Eighth Int. Conf. Intelligent Human-Machine Systems and Cybernetics (IHMSC), Hangzhou, People's Republic of China, 2016
- Wu, Z., Junqing, Y.: 'A multi-level descriptor using ultra-deep feature for image retrieval', J. Multimed. Tools Appl., 2019, 78, (318), pp. 1-18 [53] Wu, W., Sun, D.: 'Multiple deep CNN for image annotation'. Tenth Int. Conf. Graphics and Image Processing, Chengdu, People's Republic of China, 2018
- Hsu, C.-C., Lin, C.-W.: 'Unsupervised convolutional neural networks for large-scale image clustering'. IEEE Int. Conf. Image Processing, Beijing, People's Republic of China, 2017
- Gui, L.-Y., Gui, L., Wang, Y.-X., et al.: 'Factorized convolutional networks: unsupervised fine-tuning for image clustering'. IEEE Winter Conf. Applications of Computer Vision, Lake Tahoe, NV, USA, 2018
- Radenovic, F., Tolias, G., Chum, O.: 'CNN image retrieval learns from BOW: unsupervised fine-tuning with hard examples'. European Conf. Computer Vision, Amsterdam, Netherlands, 2016
- Bay, H., Tuytelaars, T., Gool, L.V.: 'SURF: speeded-up robust features', ETH Zurich, Zurich, Switzerland, 2005
- Shinde, T.S., Tiwari, A.K.: 'Pruning SIFT & SURF for efficient clustering of near-duplicate images'. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019
- Balayil, M., Kumar, G.S., Anees, V.M.: 'Automatic multilabelling of images and semantic relation extraction'. Intelligent Signal Processing Conf. (ICISP), Cherbourg, France, 2018
- Li, Y., Xu, Y., Wang, J., et al.: 'MS-RMAC: multiscale regional maximum activation of convolutions for image retrieval', IEEE Signal Process. Lett., 2017, 24, (5), pp. 609-613
- EL Agha, M., Ashour, W.: 'Efficient and fast initialisation algorithm for k- means clustering', Int. J. Intell. Syst. Appl., 2012, I, pp. 21-31
- Tsai, C.-F.: 'Bag-of-words representation in image annotation: a review', ISRN Artif. Intell., 2012, 2012, pp. 1-19
- Jurie, F., Triggs, B.: 'Creating efficient codebooks for visual recognition'. Tenth IEEE Int. Conf. Computer Vision, Beijing, 2005
- Hastie, T., Tibshirani, R., Friedman, J.: 'The elements of statistical learning- data mining, inference and prediction', vol. II (Springer, Stanford, 2008, 2nd edn.), pp. 465-576
- Salvador, S., Chan, P.: 'Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms'. 16th IEEE Int. Conf. Tools with Artificial Intelligence, Boca Raton, FL, USA, 2004
- Yan, M.: 'Methods of determining the number of clusters in a data set and a new clustering criterion', Virginia Polytechnic Institute and State University, Blacksburg, VI, USA, 2005
- Jun, Y., Chong-Wah, N., Alexander, G.H., et al.: 'Representations of keypoint-based semantic concept detection: a comprehensive study', City University of Hong Kong, Hong Kong, People's Republic of China, 2008
- Battiato, S., Farinella, G.M., Meccio, T., et al.: 'Bags of phrases with codebooks alignment for near-duplicate image detection'. Multimedia in Forensics, Security and Intelligence, Florence, Italy, 2010
- Olaode, A., Naghdy, G., Todd, C.: 'Bag-of-visual words codebook development for the semantic content-based annotation of images'. Signal Image Technology and Internet based System, Bangkok, Thailand, 2015
- Olaode, A.A., Naghdy, G., Todd, C.A.: 'Efficient region of interest detection using blind image division'. Signal Processing Symp., Debe, Poland, 2015
- Pelleg, D., Moore, A.: 'X-means: extending k-means with efficient estimation of the number of clusters'. 17th Int. Conf. Machine Learning, Stanford, CA, USA, 2000
- Kersorn, K., Chimlek, S., Poslad, S., et al.: 'Visual content representation using semantically similar visual words', Expert Syst. Appl., 2011, 38, pp. 11472-11481
- Tsou, D., MacNish, C.: 'Adaptive particle swarm optimisation for high- dimensional highly convex search spaces'. The 2003 Congress on Evolutionary Computation, Canberra, Australia, 2003
- Ballardini, A.L.: 'A tutorial on particle swarm optimization clustering', Cornell University, New York, NY, USA, 2016
- Kamel, N., Ouchen, I., Baali, K.: 'A sampling-PSO-K-means algorithm for document clustering'. Seventh Int. Conf. Genetic and Evolutionary Computing (ICGEC), Prague, Czech Republic, 2013
- Gong, C., Chen, H., He, W., et al.: 'Improved multi-objective clustering algorithm using particle swarm optimization', PLoS One, 2017, 12, (12), p. e0188815, 5 December 2017. Available at https://journals.plos.org/plosone/ article?id=10.1371/journal.pone.0188815, accessed 2 July 2019
- Lazebnik, S., Schmid, C., Ponce, J.: 'Beyond bags of features: spatial pyramid matching for recognizing natural scene categories'. 2006 IEEE Computer Society Conf. Computer Vision and Pattern Recognition, New York, NY, USA, 2006
- Hofmann, T.: 'Unsupervised learning by probabilistic latent semantic analysis', Machine learning, 2001 42, pp. 177-196
- Olaode, A., Naghdy, G.: 'Elimination of spatial incoherency in bag-of-visual words image representation using visual sentence modelling'. Int. Conf. Image and Vision Computing New Zealand (IVCNZ), Auckland, New Zealand, 2018
- Gao, Z., Wang, L., Zhou, L.: 'A probabilistic approach to cross-region matching-based image retrieval', IEEE Trans. Image Process., 2019, 28, (3), pp. 1191-1204