Bhaskar Chakraborty - Academia.edu (original) (raw)

Uploads

Papers by Bhaskar Chakraborty

Research paper thumbnail of Selective spatio-temporal interest points

Computer Vision and Image Understanding

Recent progress in the field of human action recognition points towards the use of Spatio-Tempora... more Recent progress in the field of human action recognition points towards the use of Spatio-Temporal Interest Points (STIPs) for local descriptor-based recognition strategies. In this paper, we present a novel approach for robust and selective STIP detection, by applying surround suppression combined with local and temporal constraints. This new method is significantly different from existing STIP detection techniques and improves the performance by detecting more repeatable, stable and distinctive STIPs for human actors, while suppressing unwanted background STIPs. For action representation we use a bagof-video words (BoV) model of local N-jet features to build a vocabulary of visual-words. To this end, we introduce a novel vocabulary building strategy by combining spatial pyramid and vocabulary compression techniques, resulting in improved performance and efficiency. Action class specific Support Vector Machine (SVM) classifiers are trained for categorization of human actions. A comprehensive set of experiments on popular benchmark datasets (KTH and Weizmann), more challenging datasets of complex scenes with background clutter and camera motion (CVC and CMU), movie and YouTube video clips (Hollywood 2 and YouTube), and complex scenes with multiple actors (MSR I and Multi-KTH), validates our approach and show state-of-the-art performance. Due to the unavailability of ground truth action annotation data for the Multi-KTH dataset, we introduce an actor specific spatio-temporal clustering of STIPs to address the problem of automatic action annotation of multiple simultaneous actors. Additionally, we perform cross-data action recognition by training on source datasets (KTH and Weizmann) and testing on completely different and more challenging target datasets (CVC, CMU, MSR I and Multi-KTH). This documents the robustness of our proposed approach in the realistic scenario, using separate training and test datasets, which in general has been a shortcoming in the performance evaluation of human action recognition techniques.

Research paper thumbnail of View-Invariant Human Action Detection Using Component-Wise HMM of Body Parts

This paper presents a framework for view-invariant action recognition in image sequences. Feature... more This paper presents a framework for view-invariant action recognition in image sequences. Feature-based human detection becomes extremely challenging when the agent is being observed from different viewpoints. Besides, similar actions, such as walking and jogging, are hardly distinguishable by considering the human body as a whole. In this work, we have developed a system which detects human body parts under different views and recognize similar actions by learning temporal changes of detected body part components. Firstly, human body part detection is achieved to find separately three components of the human body, namely the head, legs and arms. We incorporate a number of sub-classifiers, each for a specific range of view-point, to detect those body parts. Subsequently, we have extended this approach to distinguish and recognise actions like walking and jogging based on component-wise HMM learning.

Research paper thumbnail of Towards Real-Time Human Action Recognition

This work presents a novel approach to human detection based action-recognition in real-time. To ... more This work presents a novel approach to human detection based action-recognition in real-time. To realize this goal our method first detects humans in different poses using a correlation-based approach. Recognition of actions is done afterward based on the change of the angular values subtended by various body parts. Real-time human detection and action recognition are very challenging, and most state-of-the-art approaches employ complex feature extraction and classification techniques, which ultimately becomes a handicap for real-time recognition. Our correlation-based method, on the other hand, is computationally efficient and uses very simple gradient-based features. For action recognition angular features of body parts are extracted using a skeleton technique. Results for action recognition are comparable with the present state-of-the-art.

Research paper thumbnail of Component-based Human Detection

In this paper, we present a general framework for human detection in a video sequence by componen... more In this paper, we present a general framework for human detection in a video sequence by components. The technique is demonstrated by developing a system that locates people in the cluttered scenes where they are performing certain actions like walking, running ect. The system is structured with main three distinct examle-based detectors that are trained to find separately the three components of the human body: head, legs and arms. Some geometric constraints are applied over those detected components to ensure that those are present in the proper geometric configuration. In this way the system ultimstely detects a person. Here we have dveloped the example-based detectors which are view invariant. To achieve this we have designed four sub-classifier for the head and arms taking into account the different positions those body parts can have while a human performing some action. Experimental results shown here can be compared with similar full-body detector. The algorithm is also very robust in that it is capable of locating partially occluded views of people and people whose body parts have little contrast with the background

Research paper thumbnail of Enhancing Real-time Human Detection based on Histograms of Oriented Gradients

In this paper we propose a human detection framework based on an enhanced version of Histogram of... more In this paper we propose a human detection framework based on an enhanced version of Histogram of Oriented Gradients (HOG) features. These feature descriptors are computed with the help of a precalculated histogram of square-blocks. This novel method outperforms the integral of oriented histograms allowing the calculation of a single feature four times faster. Using Adaboost for HOG feature selection and Support Vector Machine as weak classifier, we build up a real-time human classifier with an excellent detection rate.

Research paper thumbnail of Enhancing Real-Time Human Detection Based on Histograms of Oriented Gradients

In this paper we propose a human detection framework based on an enhanced version of Histogram of... more In this paper we propose a human detection framework based on an enhanced version of Histogram of Oriented Gradients (HOG) features. These feature descriptors are computed with the help of a precalculated histogram of square-blocks. This novel method outperforms the integral of oriented histograms allowing the calculation of a single feature four times faster. Using Adaboost for HOG feature selection and Support Vector Machine as weak classifier, we build up a real-time human classifier with an excellent detection rate.

Research paper thumbnail of Boosting Histograms of Oriented Gradients for Human Detection

In this paper we propose a human detection framework based on an enhanced version of Histogram of... more In this paper we propose a human detection framework based on an enhanced version of Histogram of Oriented Gradients (HOG) features. These feature descriptors are computed with the help of a precalculated histogram of squareblocks. This novel method outperforms the integral of oriented histograms allowing the calculation of a single feature four times faster. Using Adaboost for HOG feature selection and Support Vector Machine as weak classifier, we build up a fast human classifier with an excellent detection rate.

Research paper thumbnail of A selective spatio-temporal interest point detector for human action recognition in complex scenes

Recent progress in the field of human action recognition points towards the use of Spatio-Tempora... more Recent progress in the field of human action recognition points towards the use of Spatio-Temporal Interest Points (STIPs) for local descriptor-based recognition strategies. In this paper we present a new approach for STIP detection by applying surround suppression combined with local and temporal constraints. Our method is significantly different from existing STIP detectors and improves the performance by detecting more repeatable, stable and distinctive STIPs for human actors, while suppressing unwanted background STIPs. For action representation we use a bag-of-visual words (BoV) model of local N-jet features to build a vocabulary of visual-words. To this end, we introduce a novel vocabulary building strategy by combining spatial pyramid and vocabulary compression techniques, resulting in improved performance and efficiency. Action class specific Support Vector Machine (SVM) classifiers are trained for categorization of human actions. A comprehensive set of experiments on existing benchmark datasets, and more challenging datasets of complex scenes, validate our approach and show state-of-the-art performance.

Research paper thumbnail of Selective spatio-temporal interest points

Computer Vision and Image Understanding

Recent progress in the field of human action recognition points towards the use of Spatio-Tempora... more Recent progress in the field of human action recognition points towards the use of Spatio-Temporal Interest Points (STIPs) for local descriptor-based recognition strategies. In this paper, we present a novel approach for robust and selective STIP detection, by applying surround suppression combined with local and temporal constraints. This new method is significantly different from existing STIP detection techniques and improves the performance by detecting more repeatable, stable and distinctive STIPs for human actors, while suppressing unwanted background STIPs. For action representation we use a bagof-video words (BoV) model of local N-jet features to build a vocabulary of visual-words. To this end, we introduce a novel vocabulary building strategy by combining spatial pyramid and vocabulary compression techniques, resulting in improved performance and efficiency. Action class specific Support Vector Machine (SVM) classifiers are trained for categorization of human actions. A comprehensive set of experiments on popular benchmark datasets (KTH and Weizmann), more challenging datasets of complex scenes with background clutter and camera motion (CVC and CMU), movie and YouTube video clips (Hollywood 2 and YouTube), and complex scenes with multiple actors (MSR I and Multi-KTH), validates our approach and show state-of-the-art performance. Due to the unavailability of ground truth action annotation data for the Multi-KTH dataset, we introduce an actor specific spatio-temporal clustering of STIPs to address the problem of automatic action annotation of multiple simultaneous actors. Additionally, we perform cross-data action recognition by training on source datasets (KTH and Weizmann) and testing on completely different and more challenging target datasets (CVC, CMU, MSR I and Multi-KTH). This documents the robustness of our proposed approach in the realistic scenario, using separate training and test datasets, which in general has been a shortcoming in the performance evaluation of human action recognition techniques.

Research paper thumbnail of View-Invariant Human Action Detection Using Component-Wise HMM of Body Parts

This paper presents a framework for view-invariant action recognition in image sequences. Feature... more This paper presents a framework for view-invariant action recognition in image sequences. Feature-based human detection becomes extremely challenging when the agent is being observed from different viewpoints. Besides, similar actions, such as walking and jogging, are hardly distinguishable by considering the human body as a whole. In this work, we have developed a system which detects human body parts under different views and recognize similar actions by learning temporal changes of detected body part components. Firstly, human body part detection is achieved to find separately three components of the human body, namely the head, legs and arms. We incorporate a number of sub-classifiers, each for a specific range of view-point, to detect those body parts. Subsequently, we have extended this approach to distinguish and recognise actions like walking and jogging based on component-wise HMM learning.

Research paper thumbnail of Towards Real-Time Human Action Recognition

This work presents a novel approach to human detection based action-recognition in real-time. To ... more This work presents a novel approach to human detection based action-recognition in real-time. To realize this goal our method first detects humans in different poses using a correlation-based approach. Recognition of actions is done afterward based on the change of the angular values subtended by various body parts. Real-time human detection and action recognition are very challenging, and most state-of-the-art approaches employ complex feature extraction and classification techniques, which ultimately becomes a handicap for real-time recognition. Our correlation-based method, on the other hand, is computationally efficient and uses very simple gradient-based features. For action recognition angular features of body parts are extracted using a skeleton technique. Results for action recognition are comparable with the present state-of-the-art.

Research paper thumbnail of Component-based Human Detection

In this paper, we present a general framework for human detection in a video sequence by componen... more In this paper, we present a general framework for human detection in a video sequence by components. The technique is demonstrated by developing a system that locates people in the cluttered scenes where they are performing certain actions like walking, running ect. The system is structured with main three distinct examle-based detectors that are trained to find separately the three components of the human body: head, legs and arms. Some geometric constraints are applied over those detected components to ensure that those are present in the proper geometric configuration. In this way the system ultimstely detects a person. Here we have dveloped the example-based detectors which are view invariant. To achieve this we have designed four sub-classifier for the head and arms taking into account the different positions those body parts can have while a human performing some action. Experimental results shown here can be compared with similar full-body detector. The algorithm is also very robust in that it is capable of locating partially occluded views of people and people whose body parts have little contrast with the background

Research paper thumbnail of Enhancing Real-time Human Detection based on Histograms of Oriented Gradients

In this paper we propose a human detection framework based on an enhanced version of Histogram of... more In this paper we propose a human detection framework based on an enhanced version of Histogram of Oriented Gradients (HOG) features. These feature descriptors are computed with the help of a precalculated histogram of square-blocks. This novel method outperforms the integral of oriented histograms allowing the calculation of a single feature four times faster. Using Adaboost for HOG feature selection and Support Vector Machine as weak classifier, we build up a real-time human classifier with an excellent detection rate.

Research paper thumbnail of Enhancing Real-Time Human Detection Based on Histograms of Oriented Gradients

In this paper we propose a human detection framework based on an enhanced version of Histogram of... more In this paper we propose a human detection framework based on an enhanced version of Histogram of Oriented Gradients (HOG) features. These feature descriptors are computed with the help of a precalculated histogram of square-blocks. This novel method outperforms the integral of oriented histograms allowing the calculation of a single feature four times faster. Using Adaboost for HOG feature selection and Support Vector Machine as weak classifier, we build up a real-time human classifier with an excellent detection rate.

Research paper thumbnail of Boosting Histograms of Oriented Gradients for Human Detection

In this paper we propose a human detection framework based on an enhanced version of Histogram of... more In this paper we propose a human detection framework based on an enhanced version of Histogram of Oriented Gradients (HOG) features. These feature descriptors are computed with the help of a precalculated histogram of squareblocks. This novel method outperforms the integral of oriented histograms allowing the calculation of a single feature four times faster. Using Adaboost for HOG feature selection and Support Vector Machine as weak classifier, we build up a fast human classifier with an excellent detection rate.

Research paper thumbnail of A selective spatio-temporal interest point detector for human action recognition in complex scenes

Recent progress in the field of human action recognition points towards the use of Spatio-Tempora... more Recent progress in the field of human action recognition points towards the use of Spatio-Temporal Interest Points (STIPs) for local descriptor-based recognition strategies. In this paper we present a new approach for STIP detection by applying surround suppression combined with local and temporal constraints. Our method is significantly different from existing STIP detectors and improves the performance by detecting more repeatable, stable and distinctive STIPs for human actors, while suppressing unwanted background STIPs. For action representation we use a bag-of-visual words (BoV) model of local N-jet features to build a vocabulary of visual-words. To this end, we introduce a novel vocabulary building strategy by combining spatial pyramid and vocabulary compression techniques, resulting in improved performance and efficiency. Action class specific Support Vector Machine (SVM) classifiers are trained for categorization of human actions. A comprehensive set of experiments on existing benchmark datasets, and more challenging datasets of complex scenes, validate our approach and show state-of-the-art performance.