Learning Pedestrian Detection from Virtual Worlds (original) (raw)

Virtual to Real Adaptation of Pedestrian Detectors

Sensors

Pedestrian detection through Computer Vision is a building block for a multitude of applications. Recently, there has been an increasing interest in convolutional neural network-based architectures to execute such a task. One of these supervised networks’ critical goals is to generalize the knowledge learned during the training phase to new scenarios with different characteristics. A suitably labeled dataset is essential to achieve this purpose. The main problem is that manually annotating a dataset usually requires a lot of human effort, and it is costly. To this end, we introduce ViPeD (Virtual Pedestrian Dataset), a new synthetically generated set of images collected with the highly photo-realistic graphical engine of the video game GTA V (Grand Theft Auto V), where annotations are automatically acquired. However, when training solely on the synthetic dataset, the model experiences a Synthetic2Real domain shift leading to a performance drop when applied to real-world images. To m...

Virtual to Real adaptation of Pedestrian Detectors for Smart Cities

arXiv (Cornell University), 2020

Pedestrian detection through computer vision is a building block for a multitude of applications in the context of smart cities, such as surveillance of sensitive areas, personal safety, monitoring, and control of pedestrian flow, to mention only a few. Recently, there was an increasing interest in deep learning architectures for performing such a task. One of the critical objectives of these algorithms is to generalize the knowledge gained during the training phase to new scenarios having various characteristics, and a suitably labeled dataset is fundamental to achieve this goal. The main problem is that manually annotating a dataset usually requires a lot of human effort, and it is a time-consuming operation. For this reason, in this work, we introduced ViPeD (Vi rtual Pedestrian Dataset), a new synthetically generated set of images collected from a realistic 3D video game where the labels can be automatically generated exploiting 2D pedestrian positions extracted from the graphics engine. We used this new synthetic dataset training a state-of-theart computationally-efficient Convolutional Neural Network (CNN) that is ready to be installed in smart low-power devices, like smart cameras. We addressed the problem of the domain-adaptation from the virtual world to the real one by fine-tuning the CNN using the synthetic data and also exploiting a mixed-batch supervised training approach. Extensive experimentation carried out on different real-world datasets shows very competitive results compared to other methods presented in the literature in which the algo

STD-PD: Generating Synthetic Training Data for Pedestrian Detection in Unannotated Videos

ArXiv, 2017

We present a new method for training pedestrian detectors on an unannotated image set, which is captured by a moving camera with a fixed height and angle from the ground. Our approach is general and robust, and makes no other assumptions about the image dataset or the number of pedestrians. We automatically extract the vanishing point and the pedestrians’ scale to calibrate the virtual camera and generate a probability map for the pedestrians to spawn. Using these features, we overlay synthetic human-like agents in proper locations on the images from the unannotated dataset. We also present novel techniques to increase the realism of these synthetic agents and use the augmented images to train a Faster R-CNN detector. Our approach improves the accuracy by 12−13% over prior methods for unannotated image datasets.

MixedPeds: Pedestrian Detection in Unannotated Videos Using Synthetically Generated Human-Agents for Training

2018

We present a new method for training pedestrian detectors on an unannotated set of images. We produce a mixed reality dataset that is composed of real-world background images and synthetically generated static human-agents. Our approach is general, robust, and makes no other assumptions about the unannotated dataset regarding the number or location of pedestrians. We automatically extract from the dataset: i) the vanishing point to calibrate the virtual camera, and ii) the pedestrians' scales to generate a Spawn Probability Map, which is a novel concept that guides our algorithm to place the pedestrians at appropriate locations. After putting synthetic human-agents in the unannotated images, we use these augmented images to train a Pedestrian Detector, with the annotations generated along with the synthetic agents. We conducted our experiments using Faster R-CNN by comparing the detection results on the unannotated dataset performed by the detector trained using our approach and...

Virtual and Real World Adaptationfor Pedestrian Detection

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000

Detecting pedestrians in images is a key functionality to avoid vehicle-to-pedestrian collisions. The most promising detectors rely on appearance-based pedestrian classifiers trained with labelled samples. This paper addresses the following question: can a pedestrian appearance model learnt in virtual scenarios work successfully for pedestrian detection in real images? . Our experiments suggest a positive answer, which is a new and relevant conclusion for research in pedestrian detection. More specifically, we record training sequences in virtual scenarios and then appearance-based pedestrian classifiers are learnt using HOG and linear SVM. We test such classifiers in a publicly available dataset provided by Daimler AG for pedestrian detection benchmarking. This dataset contains real world images acquired from a moving car. The obtained result is compared with the one given by a classifier learnt using samples coming from real images. The comparison reveals that, although virtual samples were not specially selected, both virtual and real based training give rise to classifiers of similar performance.

Learning Appearance in Virtual Scenarios for Pedestrian Detection

2010 Ieee Conference on Computer Vision and Pattern Recognition (Cvpr), 2010

Detecting pedestrians in images is a key functionality to avoid vehicle-to-pedestrian collisions. The most promising detectors rely on appearance-based pedestrian classifiers trained with labelled samples. This paper addresses the following question: can a pedestrian appearance model learnt in virtual scenarios work successfully for pedestrian detection in real images? . Our experiments suggest a positive answer, which is a new and relevant conclusion for research in pedestrian detection. More specifically, we record training sequences in virtual scenarios and then appearance-based pedestrian classifiers are learnt using HOG and linear SVM. We test such classifiers in a publicly available dataset provided by Daimler AG for pedestrian detection benchmarking. This dataset contains real world images acquired from a moving car. The obtained result is compared with the one given by a classifier learnt using samples coming from real images. The comparison reveals that, although virtual samples were not specially selected, both virtual and real based training give rise to classifiers of similar performance.

Pedestrian detection system based on deep learning

International Journal of Advances in Applied Sciences (IJAAS), 2022

Pedestrian detection is a rapidly growing field of computer vision with applications in smart cars, surveillance, automotive safety, and advanced robotics. Most of the success of the last few years has been driven by the rapid growth of deep learning, more efficient tools capable of learning semantic, high-level, deeper features of images are proposed. In this article, we investigated the task of pedestrian detection on roads using models based on convolutional neural networks. We compared the performance of standard state-of-the-art object detectors like Faster region-based convolutional network (R-CNN), single shot detector (SSD), and you only look once, version 3 (YOLOv3). Results show that YOLOv3 is the best object detection model than others for pedestrians in terms of detection and time prediction. This is an open access article under the CC BY-SA license.

Deep Convolutional Neural Networks for pedestrian detection

Signal Processing: Image Communication, 2016

Pedestrian detection is a popular research topic due to its paramount importance for a number of applications, especially in the fields of automotive, surveillance and robotics. Despite the significant improvements, pedestrian detection is still an open challenge that calls for more and more accurate algorithms. In the last few years, deep learning and in particular convolutional neural networks emerged as the state of the art in terms of accuracy for a number of computer vision tasks such as image classification, object detection and segmentation, often outperforming the previous gold standards by a large margin. In this paper, we propose a pedestrian detection system based on deep learning, adapting a general-purpose convolutional network to the task at hand. By thoroughly analyzing and optimizing each step of the detection pipeline we propose an architecture that outperforms traditional methods, achieving a task accuracy close to that of state-of-the-art approaches, while requiring a low computational time. Finally, we tested the system on an NVIDIA Jetson TK1, a 192-core platform that is envisioned to be a forerunner computational brain of future self-driving cars.

Unsupervised domain adaptation of virtual and real worlds for pedestrian detection

Vision-based object detectors are crucial for different applications. They rely on learnt object models. Ideally, we would like to deploy our vision system in the scenario where it must operate. Then, the system should self-learn how to distinguish the objects of interest, i.e., without human intervention. However, the learning of each object model requires labelled samples collected through a tiresome manual process. For instance, we are interested in exploring the self-training of a pedestrian detector for driver assistance systems. Our first approach to avoid manual labelling consisted in the use of samples coming from realistic computer graphics, so that their labels are automatically available . This would make possible the desired self-training of our pedestrian detector. However, as we showed in , between virtual and real worlds it may be a dataset shift. In order to overcome it, we propose the use of unsupervised domain adaptation techniques that avoid human intervention during the adaptation process. In particular, this paper explores the use of the transductive SVM (T-SVM) learning algorithm in order to adapt virtual and real worlds for pedestrian detection .