Unsupervised learning of human perspective context using ME-DT for efficient human detection in surveillance (original) (raw)

A joint estimation of head and body orientation cues in surveillance video

2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), 2011

The automatic analysis and understanding of behavior and interactions is a crucial task in the design of socially intelligent video surveillance systems. Such an analysis often relies on the extraction of people behavioral cues, amongst which body pose and head pose are probably the most important ones. In this paper, we propose an approach that jointly estimates these two cues from surveillance video. Given a human track, our algorithm works in two steps. First, a per-frame analysis is conducted, in which the head is localized, head and body features are extracted, and their likelihoods under different poses is evaluated. These likelihoods are then fused within a temporal filtering framework that jointly estimate the body position, body pose and head pose by taking advantage of the soft couplings between body position (movement direction), body pose and head pose. Quantitative as well as qualitative experiments show the benefit of several aspects of our approach and in particular the benefit of the joint estimation framework for tracking the behavior cues. Further analysis of behavior and interaction could then be conducted based on the output of our system.

Perspective and appearance context for people surveillance in open areas

2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, 2010

Contextual information can be used both to reduce computations and to increase accuracy and this paper presents how it can be exploited for people surveillance in terms of perspective (i.e. weak scene calibration) and appearance of the objects of interest (i.e. relevance feedback on the training of a classifier). These techniques are applied to a pedestrian detector that exploits covariance descriptors through a LogitBoost classifier on Riemannian manifolds. The approach has been tested on a construction working site where complexity and dynamics are very high, making human detection a real challenge. The experimental results demonstrate the improvements achieved by the proposed approach.

Perspective Multiscale Detection and Tracking of Persons

Lecture Notes in Computer Science, 2014

The efficient detection and tracking of persons in videos has widrespread applications, specially in CCTV systems for surveillance or forensics applications. In this paper we present a new method for people detection and tracking based on the knowledge of the perspective information of the scene. It allows alleviating two main drawbacks of existing methods: (i) high or even excessive computational cost associated to multiscale detection-by-classification methods; and (ii) the inherent difficulty of the CCTV, in which predominate partial and full occlusions as well as very high intra-class variability. During the detection stage, we propose to use the homograhy of the dominant plane to compute the expected sizes of persons at different positions of the image and thus dramatically reduce the number of evaluation of the multiscale sliding window detection scheme. To achieve robustness against false positives and negatives, we have used a combination of full and upper-body detectors, as well as a Data Association Filter (DAF) inspired in the well-known Rao-Blackwellization-based particle filters (RBPF). Our experiments demonstrate the benefit of using the proposed perspective multiscale approach, compared to conventional sliding window approaches, and also that this perspective information can lead to useful mixes of full-body and upperbody detectors.

Fuzzy qualitative human model for viewpoint identification

Neural Computing and Applications, 2015

The integration of advance human motion analysis techniques in low-cost video cameras has emerged for consumer applications, particularly in video surveillance systems. These smart and cheap devices provide the practical solutions for improving the public safety and homeland security with the capability of understanding the human behaviour automatically. In this sense, an intelligent video surveillance system should not be constrained on a person viewpoint, as in natural, a person is not restricted to perform an action from a fixed camera viewpoint. To achieve the objective, many state-of-the-art approaches require the information from multiple cameras in their processing. This is an impractical solution by considering its feasibility and computational complexity. First, it is very difficult to find an open space in real environment with perfect overlapping for multi-camera calibration. Secondly, the processing of information from multiple cameras is computational burden. With this, a surge of interest has sparked on single camera approach with notable work on the concept of view specific action recognition. However in their work, the viewpoints are assumed in a priori. In this paper, we extend it by proposing a viewpoint estimation framework where a novel human contour descriptor namely the fuzzy qualitative human contour is extracted from the fuzzy qualitative Poisson human model for viewpoint analysis. Clustering algorithms are used to learn and classify the viewpoints. In addition, our system is also integrated with the capability to classify front and rear views. Experimental results showed the reliability and effectiveness of our proposed viewpoint estimation framework by using the challenging IXMAS human action dataset.

Component-based Human Detection

In this paper, we present a general framework for human detection in a video sequence by components. The technique is demonstrated by developing a system that locates people in the cluttered scenes where they are performing certain actions like walking, running ect. The system is structured with main three distinct examle-based detectors that are trained to find separately the three components of the human body: head, legs and arms. Some geometric constraints are applied over those detected components to ensure that those are present in the proper geometric configuration. In this way the system ultimstely detects a person. Here we have dveloped the example-based detectors which are view invariant. To achieve this we have designed four sub-classifier for the head and arms taking into account the different positions those body parts can have while a human performing some action. Experimental results shown here can be compared with similar full-body detector. The algorithm is also very robust in that it is capable of locating partially occluded views of people and people whose body parts have little contrast with the background

Human Detection Framework for Automated Surveillance Systems

International Journal of Electrical and Computer Engineering (IJECE), 2016

Vision-based systems for surveillance applications have been used widely and gained more research attention. Detecting people in an image stream is challenging because of their intra-class variability, the diversity of the backgrounds, and the conditions under which the images were acquired. Existing human detection solutions suffer in their effectiveness and efficiency. In particular, the accuracy of the existing detectors is characterized by their high false positive and negative. In addition, existing detectors are slow for online surveillance systems which lead to large delay that is not suitable for surveillance systems for real-time monitoring. In this paper, a holistic framework is proposed for enhancing the performance of human detection in surveillance system. In general, the framework includes the following stages: environment modeling, motion object detection, and human object recognition. In environment modeling, modal algorithm has been suggested for background initiali...

People recognition and pose estimation in image sequences

Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium, 2000

This paper presents a system which learns from examples to automatically recognize people and estimate their poses in image sequences with the potential application to daily surveillance in indoor environments. The person in the image is represented by a set of features based on color and shape information. Recognition is carried out through a hierarchy of biclass SVM classifiers that are separately trained to recognize people and estimate their poses. The system shows a very high accuracy in people recognition and about 85% level of performance in pose estimation, outperforming in both cases k-Nearest Neighbors classifiers. The system works in real time.

A Survey on Human Motion Detection and Surveillance

Over years detecting human beings in a video scene of a surveillance system is one of the most active research topics in computer vision. This interest is driven by wide applications in many areas such as virtual reality, smart surveillance and perceptual interface, human gait characterization person counting in a dense crowd, person identification, gender classification, and fall detection for elderly people. Video surveillance system mainly deals with tracking and classification of moving objects. The general processing steps of human motion detection for video surveillance includes modeling of environments, detection of motion, object detection and classification human detection, activity recognition and behavior understanding. The aim of this paper is to review recent developments and analyze future open directions in visual surveillance systems.

Scene specific people detection by simple human interaction

Proceedings of the IEEE International Conference on Computer Vision, 2011

This paper proposes a generic procedure for training a scene specific people detector by exploiting simple human interaction. This technique works for any kind of scene imaged by a static camera and allows to considerably increase the performances of an appearance-based people detector. The user is requested to validate the results of a basic detector relying on background subtraction and proportions constraints. From this simple supervision it is possible to select new scene specific examples that can be used for retraining the people detector used in the testing phase. These new examples have the benefit of adapting the classifier to the particular scene imaged by the camera, improving the detection for that particular viewpoint, background, and image resolution. At the same time, positions and scales, where people can be found, are learnt, thus allowing to considerably reduce the number of windows that have to be scanned in the detection phase. Experimental results are presented on three different scenarios, showing an improved detection accuracy and a reduced number of false positives even when the ground plane assumption does not hold.