Gnana Praveen Rajasekar - Academia.edu (original) (raw)

Uploads

Papers by Gnana Praveen Rajasekar

Research paper thumbnail of A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Multimodal emotion recognition has recently gained much attention since it can leverage diverse a... more Multimodal emotion recognition has recently gained much attention since it can leverage diverse and complementary relationships over multiple modalities, such as audio, visual, and biosignals. Most state-of-the-art methods for audiovisual (A-V) fusion rely on recurrent networks or conventional attention mechanisms that do not effectively leverage the complementary nature of A-V modalities. In this paper, we focus on dimensional emotion recognition based on the fusion of facial and vocal modalities extracted from videos. Specifically, we propose a joint cross-attention model that relies on the complementary relationships to extract the salient features across A-V modalities, allowing for accurate prediction of continuous values of valence and arousal. The proposed fusion model efficiently leverages the intermodal relationships, while reducing the heterogeneity between features. In particular, it computes cross-attention weights based on the correlation between joint feature representations, and that of individual modalities. By deploying a joint A-V feature representation into the crossattention module, the performance of our fusion module improves significantly over the vanilla cross-attention module. Experimental results 1 on the AffWild2 dataset highlight the robustness of our proposed A-V fusion model. It has achieved a concordance correlation coefficient (CCC) of 0.374 (0.663) and 0.363 (0.584) for valence and arousal, respectively, on test set (validation set). This is a significant improvement over the baseline of third challenge of Affective Behavior Analysis in-the-wild (ABAW3) competition, with a CCC of 0.180 (0.310) and 0.170 (0.170).

Research paper thumbnail of A code and domain independent traitor tracing system based on the eigen-decomposition of fingerprinted images

2011 International Conference on Image Information Processing, 2011

Abstract Conventional traitor tracing approaches focus on creating associations between multimedi... more Abstract Conventional traitor tracing approaches focus on creating associations between multimedia fingerprints through anti-collusion codes. In this paper, we explore the possibility of creating a code and embedding domain independent fingerprint detection strategy by ...

Research paper thumbnail of Compressed domain human action recognition in H.264/AVC video streams

Multimedia Tools and Applications, 2014

This paper discusses a novel high-speed approach for human action recognition in H.264/AVC compre... more This paper discusses a novel high-speed approach for human action recognition in H.264/AVC compressed domain. The proposed algorithm utilizes cues from quantization parameters and motion vectors extracted from the compressed video sequence for feature extraction and further classification using Support Vector Machines (SVM). The ultimate goal of the proposed work is to portray a much faster algorithm than pixel domain counterparts, with comparable accuracy, utilizing only the sparse information from compressed video. Partial decoding rules out the complexity of full decoding, and minimizes computational load and memory usage, which can result in reduced hardware utilization and faster recognition results. The proposed approach can handle illumination changes, scale, and appearance variations, and is robust to outdoor as well as indoor testing scenarios. We have evaluated the performance of the proposed method on two benchmark action datasets and achieved more than 85 % accuracy. The proposed algorithm classifies actions with speed (>2,000 fps) approximately 100 times faster than existing state-of-the-art pixel-domain algorithms. Keywords H.264/AVC. Human action recognition. Compressed domain video analysis. Motion vectors. Quantization parameters 1 Introduction Human action recognition has been an area of keen interest to computer vision researchers for the past few decades. Day by day, length and breadth of the problem statement is expanding and researchers have come up with solutions to tackle scale, appearance, illumination, orientation variations, and occlusions. Action Recognition spans a wide spectrum of applications such as the autonomous video surveillance, detection of abnormal events, analysis of human behavior, video retrieval, human computer interaction, etc. The sole aim of our work is to perform real-time action recognition for a large scale surveillance system. The algorithm primarily targets real-time fixed-camera surveillance and closed circuit TV applications. A lot of research has been reported till date in recognizing human actions in pixel domain. However, most of those algorithms claimed recognition speed of upto 25 fps. Applications Multimed Tools Appl

Research paper thumbnail of A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Multimodal emotion recognition has recently gained much attention since it can leverage diverse a... more Multimodal emotion recognition has recently gained much attention since it can leverage diverse and complementary relationships over multiple modalities, such as audio, visual, and biosignals. Most state-of-the-art methods for audiovisual (A-V) fusion rely on recurrent networks or conventional attention mechanisms that do not effectively leverage the complementary nature of A-V modalities. In this paper, we focus on dimensional emotion recognition based on the fusion of facial and vocal modalities extracted from videos. Specifically, we propose a joint cross-attention model that relies on the complementary relationships to extract the salient features across A-V modalities, allowing for accurate prediction of continuous values of valence and arousal. The proposed fusion model efficiently leverages the intermodal relationships, while reducing the heterogeneity between features. In particular, it computes cross-attention weights based on the correlation between joint feature representations, and that of individual modalities. By deploying a joint A-V feature representation into the crossattention module, the performance of our fusion module improves significantly over the vanilla cross-attention module. Experimental results 1 on the AffWild2 dataset highlight the robustness of our proposed A-V fusion model. It has achieved a concordance correlation coefficient (CCC) of 0.374 (0.663) and 0.363 (0.584) for valence and arousal, respectively, on test set (validation set). This is a significant improvement over the baseline of third challenge of Affective Behavior Analysis in-the-wild (ABAW3) competition, with a CCC of 0.180 (0.310) and 0.170 (0.170).

Research paper thumbnail of A code and domain independent traitor tracing system based on the eigen-decomposition of fingerprinted images

2011 International Conference on Image Information Processing, 2011

Abstract Conventional traitor tracing approaches focus on creating associations between multimedi... more Abstract Conventional traitor tracing approaches focus on creating associations between multimedia fingerprints through anti-collusion codes. In this paper, we explore the possibility of creating a code and embedding domain independent fingerprint detection strategy by ...

Research paper thumbnail of Compressed domain human action recognition in H.264/AVC video streams

Multimedia Tools and Applications, 2014

This paper discusses a novel high-speed approach for human action recognition in H.264/AVC compre... more This paper discusses a novel high-speed approach for human action recognition in H.264/AVC compressed domain. The proposed algorithm utilizes cues from quantization parameters and motion vectors extracted from the compressed video sequence for feature extraction and further classification using Support Vector Machines (SVM). The ultimate goal of the proposed work is to portray a much faster algorithm than pixel domain counterparts, with comparable accuracy, utilizing only the sparse information from compressed video. Partial decoding rules out the complexity of full decoding, and minimizes computational load and memory usage, which can result in reduced hardware utilization and faster recognition results. The proposed approach can handle illumination changes, scale, and appearance variations, and is robust to outdoor as well as indoor testing scenarios. We have evaluated the performance of the proposed method on two benchmark action datasets and achieved more than 85 % accuracy. The proposed algorithm classifies actions with speed (>2,000 fps) approximately 100 times faster than existing state-of-the-art pixel-domain algorithms. Keywords H.264/AVC. Human action recognition. Compressed domain video analysis. Motion vectors. Quantization parameters 1 Introduction Human action recognition has been an area of keen interest to computer vision researchers for the past few decades. Day by day, length and breadth of the problem statement is expanding and researchers have come up with solutions to tackle scale, appearance, illumination, orientation variations, and occlusions. Action Recognition spans a wide spectrum of applications such as the autonomous video surveillance, detection of abnormal events, analysis of human behavior, video retrieval, human computer interaction, etc. The sole aim of our work is to perform real-time action recognition for a large scale surveillance system. The algorithm primarily targets real-time fixed-camera surveillance and closed circuit TV applications. A lot of research has been reported till date in recognizing human actions in pixel domain. However, most of those algorithms claimed recognition speed of upto 25 fps. Applications Multimed Tools Appl