Human Action Recognition Using Distribution of Oriented Rectangular Patches (original) (raw)

Motion history histograms for human action recognition

Embedded Computer Vision, 2009

In this chapter, a compact human action recognition system is presented with a view to applications in security systems, human-computer interaction, and intelligent environments. There are three main contributions: Firstly, the framework of an embedded human action recognition system based on a support vector machine (SVM) classifier and some compact motion features has been presented. Secondly, the limitations of the well-known motion history image (MHI) are addressed and a new motion history histograms (MHH) feature is introduced to represent the motion information in the video. MHH not only provides rich motion information, but also remains computationally inexpensive. We combine MHI and MHH into a low-dimensional feature vector for the system and achieve improved performance in human action recognition over comparable methods that use tracking-free temporal template motion representations. Finally, a simple system based on SVM and MHI has been implemented on a reconfigurable embedded computer vision architecture for real-time gesture recognition.

Human Action Representation and Recognition: An Approach to a Histogram of Spatiotemporal Templates

2015

The motion sequences of human actions have its own discriminating profile that can be represented as a spatiotemporal template like Motion History Image (MHI). A histogram is a popular statistic to present the underlying information in a template. In this paper a histogram oriented action recognition method is presented. In the proposed method, we use the Directional Motion History Images (DMHI), their corresponding Local Binary Pattern (LBP) images and the Motion Energy Image (MEI) as spatiotemporal template. The intensity histogram is then extracted from those images which are concatenated together to form the feature vector for action representation. A linear combination of the histograms taken from DMHIs and LBP images is used in the experiment. We evaluated the performance of the proposed method along with some variants of it using the renowned KTH action dataset and found higher accuracies. The obtained results justify the superiority of the proposed method compared to other a...

Human Action Recognition based on motion and appearance

This paper presents a method to recognize the action being performed by a human in a video. Applications like video surveillance, highlight extraction and video summarization require the recognition of the activities occurring in the video. The analysis of human activities in video is an area with increasingly important consequences from security and surveillance to entertainment and personal archiving. We propose an action recognition scheme based on motion and appearance. Firstly, we define an Accumulated Frame difference (AFD) from which Intensity histograms are built and normalized for extracting features. Then we compute DFT from the Intensity histograms so that features like mean and variance are obtained. Secondly, we try finding out gradient direction and magnitude by taking a key frame from the video. Again, we extract mean and variance from histogram giving out few more feature vectors. Finally with all the extracted features, we train the system using Dynamic Time Warping (DTW) to recognize the various actions.Public dataset is used for Evaluation.

Exploiting the Motion Learning Paradigm for Recognizing Human Actions

Identifying the human actions in unconfined videos is a difficult problem in several applications. The human action recognition is an active area of research. For this purpose the system proposes motion representation. In many proposed methods the motion pattern was avoided. So here these motion relationships which were discarded previously are now proposed. Identifying the actions in videos using motion scheme is been proposed. The video event representation used to recognize human activities is based on motion modeling. Dense local patch trajectories such as Histogram of Optical Flow (HOF), Histogram of Oriented Gradients (HOG) and Motion Boundary Histograms (MBH) areused in this approach which does not require the background foreground separation. The dimensions are reduced by Principal Components Analysis (PCA). The Support Vector Machine (SVM) is the classifier used for classification. The proposed video representation model is applied on the UT-Interaction dataset. The experimental results show that proposed representation produces a very competitive performance when compared with state-of-the-art methods and is more accurate.

Recognizing human actions based on motion information and SVM

2006

In this paper, we propose a new system for human action recognition with a view to applications in security systems, man-machine communications and intelligent environments. Our system is based on very simple features in order to achieve high-speed recognition in real-world applications. We have chosen three main techniques to build a system that can work in real-time. Firstly, we choose Motion History Images and related features. Secondly, we use a template matching methods instead of state-space methods that need expensive modelling processes; finally, we use linear classifier support vector machine (SVM) for fast classification. Experimental results show that this system can achieve good performance in human action recognition in realtime embedded applications, such as intelligent environments.

Action recognition using bag of features extracted from a beam of trajectories

2013 IEEE International Conference on Image Processing, 2013

A new spatio temporal descriptor is proposed for action recognition. The action is modelled from a beam of trajectories obtained using semi dense point tracking on the video sequence. We detect the dominant points of these trajectories as points of local extremum curvature and extract their corresponding feature vectors, to form a dictionary of atomic action elements. The high density of these informative and invariant elements allows effective statistical action description. Then, human action recognition is performed using a bag of feature model with SVM classifier. Experimentations show promising results on several well-known datasets.

Recognition of Human Actions Based on Temporal Motion Templates

British Journal of Applied Science & Technology, 2017

Despite their attractive properties of invariance, robustness and reliability, statistical motion descriptions from temporal templates have not apparently received the amount of attention they might deserve in the human action recognition literature. In this paper, we propose an innovative approach for action recognition, where a novel fuzzy representation based on temporal motion templates is developed to model human actions as time series of low-dimensional descriptors. An NB (Naïve Bayes) classifier is trained on these features for action classification. When tested on a realistic action dataset incorporating a large collection of video data, the results demonstrate that the approach is able to achieve a recognition rate of as high as 93.7%, while remaining tractable for real-time operation.

Evaluating a bag-of-visual features approach using spatio-temporal features for action recognition

Computers & Electrical Engineering, 2018

Human action recognition (HAR) has emerged as a core research domain for video understanding and analysis, thus attracting many researchers. Although significant results have been achieved in simple scenarios, HAR is still a challenging task due to issues associated with view independence, occlusion and inter-class variation observed in realistic scenarios. In previous research efforts, the classical bag of visual words approach along with its variations has been widely used. In this paper, we propose a Dynamic Spatio-Temporal Bag of Expressions (D-STBoE) model for human action recognition without compromising the strengths of the classical bag of visual words approach. Expressions are formed based on the density of a spatio-temporal cube of a visual word. To handle inter-class variation, we use class-specific visual word representation for visual expression generation. In contrast to the Bag of Expressions (BoE) model, the formation of visual expressions is based on the density of spatio-temporal cubes built around each visual word, as constructing neighborhoods with a fixed number of neighbors could include non-relevant information making a visual expression less discriminative in scenarios with occlusion and changing viewpoints. Thus, the proposed approach makes the model more robust to occlusion and changing viewpoint challenges present in realistic scenarios. Furthermore, we train a multi-class Support Vector Machine (SVM) for classifying bag of expressions into action classes. Comprehensive experiments on four publicly available datasets: KTH, UCF Sports, UCF11 and UCF50 show that the proposed model outperforms existing state-of-the-art human action recognition methods in term of accuracy to 99.21%, 98.60%, 96.94 and 94.10%, respectively.

Spatial-temporal Histograms of Gradients and HOD-VLAD Encoding for Human Action Recognition

—Automatic human action recognition is a core func-tionality of systems for video surveillance and human object interaction. In the whole recognition system, feature description and encoding represent two crucial key steps. In order to construct a powerful action recognition framework it is important that the two steps must provide reliable performance. In this paper, we proposed a new human action feature descriptor which is called spatial-temporal histograms of gradients (SPHOG). SPHOG is based on the spatial and temporal derivation signal, which extracts the gradient changes between consecutive frames. Compare to the traditional descriptors histograms of optical flow, our proposed SPHOG costs less computation resource. Vector of Locally Aggregated Descriptors (VLAD), which is a popular encoding approach for Bag-of-Feature representation. There is a main drawback of VLAD that it only considers the difference between local descriptor and their centroids. In order to resolve the weakness, we proposed a improved VLAD method called HOD-VLAD, which complementary the distribution information of local descriptors by computing a weight histograms of distance. We validated our proposed algorithm for human action recognition on three public available datasets KTH, UCF Sports and HMDB51. The evaluation experiment results indicate that the proposed descriptor and encoding method can improve the efficiency of human action recognition and the recognition accuracy.

Human action recognition using Dynamic Time Warping

… and Informatics (ICEEI), …, 2011

Human action recognition is gaining interest from many computer vision researchers because of its wide variety of potential applications. For instance: surveillance, advanced human computer interaction, content-based video retrieval, or athletic performance analysis. In this research, we focus to recognize some human actions such as waving, punching, clapping, etc. We choose exemplar-based sequential single-layered approach using Dynamic Time Warping (DTW) because of its robustness against variation in speed or style in performing action. For improving recognition rate, we perform body part tracking using depth camera to recover human joints body part information in 3D real world coordinate system. We build our feature vector from joint orientation along time series that invariant to human body size. Dynamic Time Warping is then applied to the resulted feature vector. We examine our approach to recognize several actions and we confirm our method can work well with several experiments. Further experiment for benchmarking the result will be held in near future.