The Kinetics Human Action Video Dataset (original) (raw)
Related papers
EduNet: A New Video Dataset for Understanding Human Activity in the Classroom Environment
Sensors, 2021
Human action recognition in videos has become a popular research area in artificial intelligence (AI) technology. In the past few years, this research has accelerated in areas such as sports, daily activities, kitchen activities, etc., due to developments in the benchmarks proposed for human action recognition datasets in these areas. However, there is little research in the benchmarking datasets for human activity recognition in educational environments. Therefore, we developed a dataset of teacher and student activities to expand the research in the education domain. This paper proposes a new dataset, called EduNet, for a novel approach towards developing human action recognition datasets in classroom environments. EduNet has 20 action classes, containing around 7851 manually annotated clips extracted from YouTube videos, and recorded in an actual classroom environment. Each action category has a minimum of 200 clips, and the total duration is approximately 12 h. To the best of ou...
Human Action Recognition Using Deep Learning
IRJET, 2022
The goals of video analysis tasks have changed significantly over time, shifting from inferring the current state to forecasting the future state. Recent advancements in the fields of computer vision and machine learning have made it possible. Different human activities are inferred in tasks based on vision-based action recognition based on the full motions of those acts. By extrapolating from that person's current actions, it also aids in the prognosis of that person's future action. Since it directly addresses issues in the real world, such as visual surveillance, autonomous cars, entertainment, etc., it has been a prominent topic in recent years. To create an effective human action recognizer, a lot of study has been done in this area. Additionally, it is anticipated that more work will need to be done. In this sense, human action recognition has a wide range of uses, including patient monitoring, video surveillance, and many more. Two CNN and LRCN models are put out in this article. The findings show that the recommended approach performs at least 8% more accurately than the traditional two-stream CNN method. The recommended method also offers better temporal and spatial stream identification accuracy.
Video benchmarks of human action datasets: a review
Vision-based Human activity recognition is becoming a trendy area of research due to its wide application such as security and surveillance, human-computer interactions, patients monitoring system, and robotics. In the past two decades, there are several publically available human action, and activity datasets are reported based on modalities, view, actors, actions, and applications. The objective of this survey paper is to outline the different types of video datasets and highlights their merits and demerits under practical considerations. Based on the available information inside the dataset we can categorise these datasets into RGB (Red, Green, and Blue) and RGB-D(depth). The most prominent challenges involved in these datasets are occlusions, illumination variation, view variation, annotation, and fusion of modalities. The key specification of these datasets is discussed such as resolutions, frame rate, actions/actors, background, and application domain. We have also presented the state-of-the-art algorithms in a tabular form that give the best performance on such datasets. In comparison with earlier surveys, our works give a better presentation of datasets on the well-organised comparison, challenges, and latest evaluation technique on existing datasets.
Deep Learning based Human Action Recognition
ITM Web of Conferences, 2021
Human action recognition has become an important research area in the fields of computer vision, image processing, and human-machine or human-object interaction due to its large number of real time applications. Action recognition is the identification of different actions from video clips (an arrangement of 2D frames) where the action may be performed in the video. This is a general construction of image classification tasks to multiple frames and then collecting the predictions from each frame. Different approaches are proposed in literature to improve the accuracy in recognition. In this paper we proposed a deep learning based model for Recognition and the main focus is on the CNN model for image classification. The action videos are converted into frames and pre-processed before sending to our model for recognizing different actions accurately..
CoReHAR: A Hybrid Deep Network for Video Action Recognition
2020
Automating the processing of videos in applications such as surveillance, sport commentary and activity detection, human-machine interaction, and health/disability care is crucial to their correct functioning. In such video processing tasks, recognition of various human actions is a pivotal component for the correct understanding of videos and making decisions upon it. Accurately recognizing human actions is a complex process, demanding high computing capabilities and intelligent algorithms. Several factors, such as object occlusion, camera movement, and background clutter, further challenge the task and its accuracy, essentially leaving deep learning approaches the only viable option for properly detecting human actions in videos. In this study, we propose CoReHAR, a novel Human Action Recognition method that employs both deep Convolutional and Recurrent neural networks on raw video frames. Using the pre-trained ResNet152 CNN, deep features are initially extracted from video frames...