Veena Thenkanidiyoor | National Institute of Technology Goa , India (original) (raw)

Papers by Veena Thenkanidiyoor

Lecture Notes in Computer Science, Dec 31, 2022

Lecture Notes in Computer Science, 2019

This paper addresses the issues of handling varying size images in convolutional neural networks ... more This paper addresses the issues of handling varying size images in convolutional neural networks (CNNs). When images of different size are given as input to a CNN then it results in varying size set of activation maps at its convolution layer. We propose to explore two approaches to address varying size set of activation maps for the classification task. In the first approach, we explore deep spatial pyramid match kernel (DSPMK) to compute a matching score between two varying size sets of activation maps. We also propose to explore different pooling and normalization techniques for computing DSPMK. In the second approach, we propose to use spatial pyramid pooling (SPP) layer in CNN architectures to remove fixed-length constraint and to allow the original varying size image as input to train and fine-tune the CNN for different datasets. Experimental results show that proposed DSPMK-based SVM and SPP-layer based CNN frameworks achieve state-of-the-art results for scene image classification and fine-grained bird species classification tasks.

In this paper, we propose to use sparse representation classifier (SRC) for text classification. ... more In this paper, we propose to use sparse representation classifier (SRC) for text classification. The sparse representation of an example is obtained by using an overcomplete dictionary made up of term frequency (TF) vectors corresponding to all the training documents. We propose to seed the dictionary using principal components of TF vector representation corresponding to training text documents. In this work, we also propose 2-level hierarchical SRC (HSRC) by exploiting the similarity among the classes. We propose to use weighted decomposition principal component analysis (WDPCA) in the second level of HSRC to seed the dictionary to discriminate the similar classes. The effectiveness of the proposed approach to build HSRC for text classification is demonstrated on 20 Newsgroup Corpus.

Communications in computer and information science, 2018

For challenging visual recognition tasks such as scene classification and object detection there ... more For challenging visual recognition tasks such as scene classification and object detection there is a need to bridge the semantic gap between low-level features and the semantic concept descriptors. This requires mapping a scene image onto a semantic representation. Semantic multinomial (SMN) representation is a semantic representation of an image that corresponds to a vector of posterior probabilities of concepts. In this work we propose to build a concept neural network (CoNN) to obtain the SMN representation for a scene image. An important issue in building a CoNN is that it requires the availability of ground truth concept labels. In this work we propose to use pseudo-concepts obtained from feature maps of higher level layers of convolutional neural network. The effectiveness of the proposed approaches are studied using standard datasets.

Lecture Notes in Computer Science, 2015

In this paper, we propose example-specific density based matching kernel (ESDMK) for the classifi... more In this paper, we propose example-specific density based matching kernel (ESDMK) for the classification of varying length patterns of long duration speech represented as sets of feature vectors. The proposed kernel is computed between the pair of examples, represented as sets of feature vectors, by matching the estimates of the example-specific densities computed at every feature vector in those two examples. In this work, the number of feature vectors of an example among the K nearest neighbors of a feature vector is considered as an estimate of the example-specific density. The minimum of the estimates of two example-specific densities, one for each example, at a feature vector is considered as the matching score. The ESDMK is then computed as the sum of the matching score computed at every feature vector in a pair of examples. We study the performance of the support vector machine (SVM) based classifiers using the proposed ESDMK for speech emotion recognition and speaker identification tasks and compare the same with that of the SVM-based classifiers using the state-of-the-art kernels for varying length patterns.

Classification of long duration speech, represented as varying length sets of feature vectors usi... more Classification of long duration speech, represented as varying length sets of feature vectors using support vector machine (SVM) requires a suitable kernel. In this paper we propose a novel segment-level pyramid match kernel (SLPMK) for the classification of varying length patterns of long duration speech represented as sets of feature vectors. This kernel is designed by partitioning the speech signal into increasingly finer segments and matching the corresponding segments. We study the performance of the SVM-based classifiers using the proposed SLPMKs for speech emotion recognition and speaker identification and compare with that of the SVM-based classifiers using other dynamic kernels.

In this paper, we propose the example-specific density based matching kernel (ESDMK) for classifi... more In this paper, we propose the example-specific density based matching kernel (ESDMK) for classification of scene images represented as sets of local feature vectors. The proposed kernel is computed between the pair of examples, represented as sets of local feature vectors, by matching the estimates of example-specific densities computed at every local feature vector in those two examples. In this work, the number of local feature vectors of an example among the K nearest neighbors of a local feature vector is considered as an estimate of the example-specific density. The minimum of the two example-specific densities, one for each example, at a local feature vector is considered as the matching score. The ESDMK is then computed as the sum of the matching score computed at every local feature vector in a pair of examples. We also propose the spatial ESDMK (SESDMK) to include spatial information present in the scene images while matching the pair of scene images. Each of the scene images is divided spatially into a fixed number of regions. Then the SESDMK is computed as a combination of region specific ESDMKs that match the corresponding regions. We study the performance of the support vector machine (SVM) based classifiers using the proposed ESDMKs for scene classification and compare with that of the SVM-based classifiers using the state-of-the-art kernels for sets of local feature vectors.

Video activity recognition involves automatically assigning a activity label to a video. This is ... more Video activity recognition involves automatically assigning a activity label to a video. This is a challenging task due to the complex nature of video data. There exists many sub activities whose temporal order is important. For building an SVM-based activity recognizer it is necessary to use a suitable kernel that considers varying length temporal data corresponding to videos. In (Mario Rodriguez and Makris, 2016), a time flexible kernel (TFK) is proposed for matching a pair of videos by encoding a video into a sequence of bag of visual words (BOVW) vectors. The TFK involves matching every pair of BOVW vectors from a pair of videos using linear kernel. In this paper we propose modified TFK (MTFK) where better approaches to match a pair of BOVW vectors are explored. We propose to explore the use of frequency based kernels for matching a pair of BOVW vectors. We also propose an approach for encoding the videos using Gaussian mixture models based soft clustering technique. The effectiveness of the proposed approaches are studied using benchmark datasets.

International Journal of Speech Technology, Feb 5, 2019

In this work, we address some issues in the classification of varying length patterns of speech r... more In this work, we address some issues in the classification of varying length patterns of speech represented as sets of continuous-valued feature vectors using kernel methods. Kernels designed for varying length patterns are called as dynamic kernels. We propose two dynamic kernels namely segment-level pyramid match kernel (SLPMK) and segment-level probabilistic sequence kernel (SLPSK) for classification of long duration speech, represented as varying length sets of feature vectors using extreme learning machine (ELM). SLPMK and SLPSK are designed by partitioning the speech signal into increasingly finer segments and matching the corresponding segments. SLPSK is built upon a set of Gaussian basis functions, where half of the basis functions contain class-specific information while the other half implicates the common characteristics of all the speech utterances of all classes. The computational complexity of SVM training algorithms is usually intensive, which is at least quadratic with respect to the number of training examples. It is difficult to deal with the immense amount of data using traditional SVMs. For reducing the training time of classifier we propose to use a simple algorithm namely ELM. ELM refers to a wider type of generalized single hidden layer feedforward networks (SLFNs) whose hidden layer need not be tuned. In our work, we proposed to explore kernel based ELM to exploit dynamic kernels. We study the performance of the ELM-based classifiers using the proposed SLPSK and SLPMK for speech emotion recognition and speaker identification tasks and compare with other kernels for varying length patterns. Experimental studies showed that proposed ELM-based approach offer a 10-12% of relative improvement over baseline approach, and a 3-9% relative improvement over ELMs/ SVMs using other state-of-the-art dynamic kernels.

Lecture Notes in Computer Science, 2016

In this work we propose the segment-level probabilistic sequence kernel (SLPSK) as dynamic kernel... more In this work we propose the segment-level probabilistic sequence kernel (SLPSK) as dynamic kernel to be used in support vector machine (SVM) for classification of varying length patterns of long duration speech represented as sets of feature vectors. SLPSK is built upon a set of Gaussian basis functions, where half of the basis functions contain class specific information while the other half implicates the common characteristics of all the speech utterances of all classes. The proposed kernel is computed between the pair of examples, by partitioning the speech signal into fixed number of segments and then matching the corresponding segments. We study the performance of the SVM-based classifiers using the proposed SLPSK using different pooling technique for speech emotion recognition and speaker identification and compare with that of the SVM-based classifiers using other kernels for varying length patterns.

Lecture Notes in Computer Science, 2019

This paper addresses issues in performing video activity recognition using support vector machine... more This paper addresses issues in performing video activity recognition using support vector machines (SVMs). The videos comprise of sequence of sub-activities where a sub-activity correspond to a segment of video. For building activity recognizer, each segment is encoded into a feature vector. Hence a video is represented as a sequence of feature vectors. In this work, we propose to explore GMM-based encoding scheme ot encode a video segment into bag-of-visual-word vector representation. We also propose to use Fisher score vector as an encoded representation for a video segment. For building SVM-based activity recognizer, it is necessary to use suitable kernel that match sequences of feature vectors. Such kernels are called sequence kernels. In this work, we propose different sequence kernels like modified time flexible kernel, segment level pyramid match kernel, segment level probability sequence kernel and segment level Fisher kernel for matching videos when segments are represented using an encoded feature vector representation. The effectiveness of the proposed sequence kernels in the SVM-based activity recognition are studied using benchmark datasets.

Biomedical Signal Processing and Control

Biocybernetics and Biomedical Engineering, 2022

Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, 2018

Natural language generation (NLG) is an important component in spoken dialog systems (SDSs). A mo... more Natural language generation (NLG) is an important component in spoken dialog systems (SDSs). A model for NLG involves sequence to sequence learning. State-of-the-art NLG models are built using recurrent neural network (RNN) based sequence to sequence models (Dušek and Jurcicek, 2016a). Convolutional sequence to sequence based models have been used in the domain of machine translation but their application as natural language generators in dialogue systems is still unexplored. In this work, we propose a novel approach to NLG using convolutional neural network (CNN) based sequence to sequence learning. CNN-based approach allows to build a hierarchical model which encapsulates dependencies between words via shorter path unlike RNNs. In contrast to recurrent models, convolutional approach allows for efficient utilization of computational resources by parallelizing computations over all elements, and eases the learning process by applying constant number of nonlinearities. We also propose to use CNN-based reranker for obtaining responses having semantic correspondence with input dialogue acts. The proposed model is capable of entrainment. Studies using a standard dataset shows the effectiveness of the proposed CNN-based approach to NLG.

Machine Vision and Applications, 2021

Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, 2018

Though recent convolutional neural network (CNN) based method for scene classification task show ... more Though recent convolutional neural network (CNN) based method for scene classification task show impressive results but lacks in capturing the complex semantic content of the scene images. To reduce the semantic gap a semantic multinomial (SMN) representation is introduced. SMN representation corresponds to a vector of posterior probabilities of concepts. The core part of SMN generation is building the concept model. For building the concept model, it is necessary to have ground truth (true) concept labels for every image in the database. In this research work, we propose novel deep CNN based SMN representation which exploits convolutional layer filters response as pseudo concepts to build the concept model in the absence of true concept labels. The effectiveness of the proposed approach is studied for scene classification tasks on standard datasets like MIT67 and SUN397.

Proceedings of the 7th International Conference on Pattern Recognition Applications and Methods, 2018

2020 International Conference on Signal Processing and Communications (SPCOM), 2020

Query-by-Example based spoken term detection (QbE-STD) to audio search involves matching an audio... more Query-by-Example based spoken term detection (QbE-STD) to audio search involves matching an audio query with the reference utterances to find the relevant utterances. QbE-STD involves computing a matching matrix between a query and reference utterance using a suitable metric. In this work we propose to use kernel based matching by considering histogram intersection kernel (HIK) as a matching metric. A CNN-based approach to QbE-STD involves first converting a matching matrix to a corresponding size-normalized image and classifying the image as relevant or not [6]. In this work, we propose to train a CNN-based classifier using size-normalized images instead of splitting them into subimages as in [6]. Training approach proposed in this work is expected to be more effective since there is less chance of a CNN based classifier getting confused. The effectiveness of the proposed kernel based matching and novel training approach is studied using TIMIT dataset.

2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), 2017

Lecture Notes in Computer Science, Dec 31, 2022

Lecture Notes in Computer Science, 2019

Communications in computer and information science, 2018

Lecture Notes in Computer Science, 2015

International Journal of Speech Technology, Feb 5, 2019

Lecture Notes in Computer Science, 2016

Lecture Notes in Computer Science, 2019

Biomedical Signal Processing and Control

Biocybernetics and Biomedical Engineering, 2022

Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, 2018

Machine Vision and Applications, 2021

Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, 2018

Proceedings of the 7th International Conference on Pattern Recognition Applications and Methods, 2018

2020 International Conference on Signal Processing and Communications (SPCOM), 2020

2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), 2017