aiMotive Dataset: A Multimodal Dataset for Robust Autonomous Driving with Long-Range Perception (original) (raw)
Related papers
A*3D Dataset: Towards Autonomous Driving in Challenging Environments
2020 IEEE International Conference on Robotics and Automation (ICRA)
With the increasing global popularity of selfdriving cars, there is an immediate need for challenging realworld datasets for benchmarking and training various computer vision tasks such as 3D object detection. Existing datasets either represent simple scenarios or provide only daytime data. In this paper, we introduce a new challenging A*3D dataset which consists of RGB images and LiDAR data with significant diversity of scene, time, and weather. The dataset consists of high-density images (≈ 10 times more than the pioneering KITTI dataset), heavy occlusions, a large number of nighttime frames (≈ 3 times the nuScenes dataset), addressing the gaps in the existing datasets to push the boundaries of tasks in autonomous driving research to more challenging highly diverse environments. The dataset contains 39K frames, 7 classes, and 230K 3D object annotations. An extensive 3D object detection benchmark evaluation on the A*3D dataset for various attributes such as high density, day-time/night-time, gives interesting insights into the advantages and limitations of training and testing 3D object detection in real-world setting.
One Million Scenes for Autonomous Driving: ONCE Dataset
2021
Current perception models in autonomous driving have become notorious for greatly relying on a mass of annotated data to cover unseen cases and address the long-tail problem. On the other hand, learning from unlabeled large-scale collected data and incrementally self-training powerful recognition models have received increasing attention and may become the solutions of next-generation industrylevel powerful and robust perception models in autonomous driving. However, the research community generally suffered from data inadequacy of those essential real-world scene data, which hampers the future exploration of fully/semi/selfsupervised methods for 3D perception. In this paper, we introduce the ONCE (One millioN sCenEs) dataset for 3D object detection in the autonomous driving scenario. The ONCE dataset consists of 1 million LiDAR scenes and 7 million corresponding camera images. The data is selected from 144 driving hours, which is 20x longer than the largest 3D autonomous driving da...
Autonomous Vehicles Perception (AVP) Using Deep Learning: Modeling, Assessment, and Challenges
IEEE Access
Perception is the fundamental task of any autonomous driving system, which gathers all the necessary information about the surrounding environment of the moving vehicle. The decision-making system takes the perception data as input and makes the optimum decision given that scenario, which maximizes the safety of the passengers. This paper surveyed recent literature on autonomous vehicle perception (AVP) by focusing on two primary tasks: Semantic Segmentation and Object Detection. Both tasks play an important role as a vital component of the vehicle's navigation system. A comprehensive overview of deep learning for perception and its decision-making process based on images and LiDAR point clouds is discussed. We discussed the sensors, benchmark datasets, and simulation tools widely used in semantic segmentation and object detection tasks, especially for autonomous driving. This paper acts as a road map for current and future research in AVP, focusing on models, assessment, and challenges in the field.
A3CarScene: An audio-visual dataset for driving scene understanding
Data in Brief, 2023
Accurate perception and awareness of the environment surrounding the automobile is a challenge in automotive research. This article presents A3CarScene, a dataset recorded while driving a research vehicle equipped with audio and video sensors on public roads in the Marche Region, Italy. The sensor suite includes eight microphones installed inside and outside the passenger compartment and two dashcams mounted on the front and rear windows. Approximately 31 h of data for each device were collected during October and November 2022 by driving about 1500 km along diverse roads and landscapes, in variable weather conditions, in daytime and nighttime hours. All key information for the scene understanding process of automated vehicles has been accurately annotated. For each route, annotations with beginning and end timestamps report the type of road traveled (motorway, trunk, primary, secondary, tertiary, residential, and service roads), the degree of urbanization of the area (city, town, suburban area, village, exurban and rural areas), the weather conditions (clear, cloudy, overcast, and rainy), the level of lighting (daytime, evening, night, and tunnel), the type (asphalt or cobblestones) and moisture status (dry or wet) of the road pavement, and the state of the windows (open or closed). This large-scale dataset is valuable for developing new driving assistance technologies based on audio or video data alone or in a multimodal manner and for improving the performance of systems currently in use. The data acquisition process with sensors in multiple locations allows for the assessment of the best installation placement concerning the task. Deep learning engineers can use this dataset to build new baselines, as a comparative benchmark, and to extend existing databases for autonomous driving.
Towards Robust Robot 3D Perception in Urban Environments: The UT Campus Object Dataset
arXiv (Cornell University), 2023
We introduce the UT Campus Object Dataset (CODa), a mobile robot egocentric perception dataset collected on the University of Texas Austin Campus. Our dataset contains 8.5 hours of multimodal sensor data: synchronized 3D point clouds and stereo RGB video from a 128-channel 3D LiDAR and two 1.25MP RGB cameras at 10 fps; RGB-D videos from an additional 0.5MP sensor at 7 fps, and a 9-DOF IMU sensor at 40 Hz. We provide 58 minutes of ground-truth annotations containing 1.3 million 3D bounding boxes with instance IDs for 53 semantic classes, 5000 frames of 3D semantic annotations for urban terrain, and pseudo-ground truth localization. We repeatedly traverse identical geographic locations for a wide range of indoor and outdoor areas, weather conditions, and times of the day. Using CODa, we empirically demonstrate that: 1) 3D object detection performance in urban settings is significantly higher when trained using CODa compared to existing datasets even when employing state-of-the-art domain adaptation approaches, 2) sensor-specific fine-tuning improves 3D object detection accuracy and 3) pretraining on CODa improves cross-dataset 3D object detection performance in urban settings compared to pretraining on AV datasets. Using our dataset and annotations, we release benchmarks for 3D object detection and 3D semantic segmentation using established metrics. In the future, the CODa benchmark will include additional tasks like unsupervised object discovery and re-identification. We publicly release CODa on the Texas Data Repository [1], pre-trained models, dataset development package, and interactive dataset viewer 1. We expect CODa to be a valuable dataset for research in egocentric 3D perception and planning for autonomous navigation in urban environments.
OLIMP: A Heterogeneous Multimodal Dataset for Advanced Environment Perception
Electronics
A reliable environment perception is a crucial task for autonomous driving, especially in dense traffic areas. Recent improvements and breakthroughs in scene understanding for intelligent transportation systems are mainly based on deep learning and the fusion of different modalities. In this context, we introduce OLIMP: A heterOgeneous Multimodal Dataset for Advanced EnvIronMent Perception. This is the first public, multimodal and synchronized dataset that includes UWB radar data, acoustic data, narrow-band radar data and images. OLIMP comprises 407 scenes and 47,354 synchronized frames, presenting four categories: pedestrian, cyclist, car and tram. The dataset includes various challenges related to dense urban traffic such as cluttered environment and different weather conditions. To demonstrate the usefulness of the introduced dataset, we propose a fusion framework that combines the four modalities for multi object detection. The obtained results are promising and spur for future ...
ApolloCar3D: A Large 3D Car Instance Understanding Benchmark for Autonomous Driving
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019
Autonomous driving has attracted remarkable attention from both industry and academia. An important task is to estimate 3D properties (e.g. translation, rotation and shape) of a moving or parked vehicle on the road. This task, while critical, is still under-researched in the computer vision community-partially owing to the lack of large scale and fully-annotated 3D car database suitable for autonomous driving research. In this paper, we contribute the first largescale database suitable for 3D car instance understanding-ApolloCar3D. The dataset contains 5,277 driving images and over 60K car instances, where each car is fitted with an industry-grade 3D CAD model with absolute model size and semantically labelled keypoints. This dataset is above 20× larger than PASCAL3D+ [65] and KITTI [21], the current state-of-the-art. To enable efficient labelling in 3D, we build a pipeline by considering 2D-3D keypoint correspondences for a single instance and 3D relationship among multiple instances. Equipped with such dataset, we build various baseline algorithms with the state-of-the-art deep convolutional neural networks. Specifically, we first segment each car with a pre-trained Mask R-CNN [22], and then regress towards its 3D pose and shape based on a deformable 3D car model with or without using semantic keypoints. We show that using keypoints significantly improves fitting performance. Finally, we develop a new 3D metric jointly considering 3D pose and 3D shape, allowing for comprehensive evaluation and ablation study. By comparing with human performance we suggest several future directions for further improvements.
Multimodal Detection of Unknown Objects on Roads for Autonomous Driving
2022
Tremendous progress in deep learning over the last years has led towards a future with autonomous vehicles on our roads. Nevertheless, the performance of their perception systems is strongly dependent on the quality of the utilized training data. As these usually only cover a fraction of all object classes an autonomous driving system will face, such systems struggle with handling the unexpected. In order to safely operate on public roads, the identification of objects from unknown classes remains a crucial task. In this paper, we propose a novel pipeline to detect unknown objects. Instead of focusing on a single sensor modality, we make use of lidar and camera data by combining state-of-the art detection models in a sequential manner. We evaluate our approach on the Waymo Open Perception Dataset and point out current research gaps in anomaly detection.
Autonomous Vehicles and Machines Conference, at IS&T Electronic Imaging
2020
The performance of autonomous agents in both commercial and consumer applications increases along with their situational awareness. Tasks such as obstacle avoidance, agent to agent interaction, and path planning are directly dependent upon their ability to convert sensor readings into scene understanding. Central to this is the ability to detect and recognize objects. Many object detection methodologies operate on a single modality such as vision or LiDAR. Camera-based object detection models benefit from an abundance of feature-rich information for classifying different types of objects. LiDAR-based object detection models use sparse point clouds, where each point contains accurate 3D position of object surfaces. Camera-based methods lack accurate object to lens distance measurements, while LiDAR-based methods lack dense feature-rich details. By utilizing information from both camera and LiDAR sensors, advanced object detection and identification is possible. In this work, we introduce a deep learning framework for fusing these modalities and produce a robust real-time 3D bounding box object detection network. We demonstrate qualitative and quantitative analysis of the proposed fusion model on the popular KITTI dataset.
Technologies
The perception of the surrounding environment is a key requirement for autonomous driving systems, yet the computation of an accurate semantic representation of the scene starting from RGB information alone is very challenging. In particular, the lack of geometric information and the strong dependence on weather and illumination conditions introduce critical challenges for approaches tackling this task. For this reason, most autonomous cars exploit a variety of sensors, including color, depth or thermal cameras, LiDARs, and RADARs. How to efficiently combine all these sources of information to compute an accurate semantic description of the scene is still an unsolved task, leading to an active research field. In this survey, we start by presenting the most commonly employed acquisition setups and datasets. Then we review several different deep learning architectures for multimodal semantic segmentation. We will discuss the various techniques to combine color, depth, LiDAR, and other...