A Discriminative Framework for Action Recognition Using f-HOL Features (original) (raw)

Action recognition via local descriptors and holistic features

2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2009

In this paper we propose a unified action recognition framework fusing local descriptors and holistic features. The motivation is that the local descriptors and holistic features emphasize different aspects of actions and are suitable for the different types of action databases. The proposed unified framework is based on frame differencing, bag-of-words and feature fusion. We extract two kinds of local descriptors, i.e. 2D and 3D SIFT feature descriptors, both based on 2D SIFT interest points. We apply Zernike moments to extract two kinds of holistic features, one is based on single frames and the other is based on motion energy image. We perform action recognition experiments on the KTH and Weizmann databases, using Support Vector Machines. We apply the leave-one-out and pseudo leave-N-out setups, and compare our proposed approach with state-of-the-art results. Experiments show that our proposed approach is effective. Compared with other approaches our approach is more robust, more versatile, easier to compute and simpler to understand.

Histogram of Oriented Gradient-Based Fusion of Features for Human Action Recognition in Action Video Sequences

Sensors

Human Action Recognition (HAR) is the classification of an action performed by a human. The goal of this study was to recognize human actions in action video sequences. We present a novel feature descriptor for HAR that involves multiple features and combining them using fusion technique. The major focus of the feature descriptor is to exploits the action dissimilarities. The key contribution of the proposed approach is to built robust features descriptor that can work for underlying video sequences and various classification models. To achieve the objective of the proposed work, HAR has been performed in the following manner. First, moving object detection and segmentation are performed from the background. The features are calculated using the histogram of oriented gradient (HOG) from a segmented moving object. To reduce the feature descriptor size, we take an averaging of the HOG features across non-overlapping video frames. For the frequency domain information we have calculated...

Human action recognition with line and flow histograms

2008 19th International Conference on Pattern Recognition, 2008

We present a compact representation for human action recognition in videos using line and optical flow histograms. We introduce a new shape descriptor based on the distribution of lines which are fitted to boundaries of human figures. By using an entropy-based approach, we apply feature selection to densify our feature representation, thus, minimizing classification time without degrading accuracy. We also use a compact representation of optical flow for motion information. Using line and flow histograms together with global velocity information, we show that high-accuracy action recognition is possible, even in challenging recording conditions. 1 1 This research is partially supported by TUBITAK Career grant 104E065 and grants 104E077 and 105E065.

Spatial-temporal Histograms of Gradients and HOD-VLAD Encoding for Human Action Recognition

—Automatic human action recognition is a core func-tionality of systems for video surveillance and human object interaction. In the whole recognition system, feature description and encoding represent two crucial key steps. In order to construct a powerful action recognition framework it is important that the two steps must provide reliable performance. In this paper, we proposed a new human action feature descriptor which is called spatial-temporal histograms of gradients (SPHOG). SPHOG is based on the spatial and temporal derivation signal, which extracts the gradient changes between consecutive frames. Compare to the traditional descriptors histograms of optical flow, our proposed SPHOG costs less computation resource. Vector of Locally Aggregated Descriptors (VLAD), which is a popular encoding approach for Bag-of-Feature representation. There is a main drawback of VLAD that it only considers the difference between local descriptor and their centroids. In order to resolve the weakness, we proposed a improved VLAD method called HOD-VLAD, which complementary the distribution information of local descriptors by computing a weight histograms of distance. We validated our proposed algorithm for human action recognition on three public available datasets KTH, UCF Sports and HMDB51. The evaluation experiment results indicate that the proposed descriptor and encoding method can improve the efficiency of human action recognition and the recognition accuracy.

A Novel Approach for Fast Action Recognition using Simple Features

2008

We propose a new method for human action recognition from video streams that is fast and robust to random noise, partial occlusions and large changes in camera views. We extract features in the Fourier domain using the bounding boxes containing the silhouettes of a human for a number of frames representing an action. After preprocessing, we divide each space-time volume into space-time sub-volumes and compute their corresponding mean-power spectra as our feature vectors. Our features result in high classification performance using a weighted variant of the Euclidean distance. We require no camera calibration or synchronization and make use of multiple cameras to enrich the training data towards view-invariance. We test the robustness of our method using a variety of experiments including synthetic data generated in a virtual environment and real-world data used by other researchers. We also provide an experimental comparison, using the same data, between our method and two recent alternatives.

Human Action Recognition based on motion and appearance

This paper presents a method to recognize the action being performed by a human in a video. Applications like video surveillance, highlight extraction and video summarization require the recognition of the activities occurring in the video. The analysis of human activities in video is an area with increasingly important consequences from security and surveillance to entertainment and personal archiving. We propose an action recognition scheme based on motion and appearance. Firstly, we define an Accumulated Frame difference (AFD) from which Intensity histograms are built and normalized for extracting features. Then we compute DFT from the Intensity histograms so that features like mean and variance are obtained. Secondly, we try finding out gradient direction and magnitude by taking a key frame from the video. Again, we extract mean and variance from histogram giving out few more feature vectors. Finally with all the extracted features, we train the system using Dynamic Time Warping (DTW) to recognize the various actions.Public dataset is used for Evaluation.

Combining gradient histograms using orientation tensors for human action recognition

… (ICPR), 2012 21st …, 2012

We present a method for human action recognition based on the combination of Histograms of Gradients into orientation tensors. It uses only information from HOG3D: no features or points of interest are extracted. The resulting raw histograms obtained per frame are combined into an orientation tensor, making it a simple, fast to compute and effective global descriptor. The addition of new videos and/or new action cathegories does not require any recomputation or changes to the previously computed descriptors. Our method reaches 92.01% of recognition rate with KTH, comparable to the best local approaches. For the Hollywood2 dataset, our recognition rate is lower than local approaches but is fairly competitive, suitable when the dataset is frequently updated or the time response is a major application issue.

Robust Incremental Hidden Conditional Random Fields for Human Action Recognition

2018

Hidden conditional random fields (HCRFs) are a powerful supervised classification system, which is able to capture the intrinsic motion patterns of a human action. However, finding the optimal number of hidden states remains a severe limitation for this model. This paper addresses this limitation by proposing a new model, called robust incremental hidden conditional random field (RI-HCRF). A hidden Markov model (HMM) is created for each observation paired with an action label and its parameters are defined by the potentials of the original HCRF graph. Starting from an initial number of hidden states and increasing their number incrementally, the Viterbi path is computed for each HMM. The method seeks for a sequence of hidden states, where each variable participates in a maximum number of optimal paths. Thereby, variables with low participation in optimal paths are rejected. In addition, a robust mixture of Student’s t-distributions is imposed as a regularizer to the parameters of th...

Human Action Recognition using Ensemble of Shape, Texture and Motion features

2018

Even though many approaches have been proposed for Human Action Recognition, challenges like illumination variation, occlusion, camera view and background clutter keep this topic open for further research. Devising a robust descriptor for representing an action to give good classification accuracy is a demanding task. In this work, a new feature descriptor is introduced which is named ‘Spatio Temporal Shape-Texture-Motion’ (STSTM) descriptor. STSTM feature descriptor uses hybrid approach by combining local and global features. Salient points are extracted using Spatio Temporal Interest Points (STIP) algorithm which are further encoded using Discrete Wavelet Transform (DWT). DWT coefficients thus extracted represent local motion information of the object. Shape and texture features are extracted using Histogram of Oriented Gradient (HOG) and Local Binary Pattern (LBP) algorithms respectively. To achieve dimensionality reduction, Principal Component analysis is applied separately to t...

An efficient human action recognition framework with pose-based spatiotemporal features

Engineering Science and Technology, an International Journal, 2020

In the past two decades, human action recognition has been among the most challenging tasks in the field of computer vision. Recently, extracting accurate and cost-efficient skeleton information became available thanks to the cutting edge deep learning algorithms and low-cost depth sensors. In this paper, we propose a novel framework to recognize human actions using 3D skeleton information. The main components of the framework are pose representation and encoding. Assuming that human actions can be represented by spatiotemporal poses, we define a pose descriptor consisting of three elements. The first element contains the normalized coordinates of the raw skeleton joints information. The second element contains the temporal displacement information relative to a predefined temporal offset and the third element keeps the displacement information pertinent to the previous timestamp in the temporal resolution. The final descriptor of the whole sequence is the concatenation of frame-wise descriptors. To avoid the problems regarding high dimensionality, Principal Component Analysis (PCA) is applied on the descriptors. The resulted descriptors are encoded with Fisher Vector (FV) representation before they get trained with an Extreme Learning Machine (ELM). The performance of the proposed framework is evaluated by three public benchmark datasets. The proposed method achieved competitive results compared to the other methods in the literature.