Rethinking of Radar’s Role: A Camera-Radar Dataset and Systematic Annotator via Coordinate Alignment (original) (raw)

CramNet: Camera-Radar Fusion with Ray-Constrained Cross-Attention for Robust 3D Object Detection

arXiv (Cornell University), 2022

Robust 3D object detection is critical for safe autonomous driving. Camera and radar sensors are synergistic as they capture complementary information and work well under different environmental conditions. Fusing camera and radar data is challenging, however, as each of the sensors lacks information along a perpendicular axis, that is, depth is unknown to camera and elevation is unknown to radar. We propose the camera-radar matching network CramNet, an efficient approach to fuse the sensor readings from camera and radar in a joint 3D space. To leverage radar range measurements for better camera depth predictions, we propose a novel ray-constrained cross-attention mechanism that resolves the ambiguity in the geometric correspondences between camera features and radar features. Our method supports training with sensor modality dropout, which leads to robust 3D object detection, even when a camera or radar sensor suddenly malfunctions on a vehicle. We demonstrate the effectiveness of our fusion approach through extensive experiments on the RADIATE dataset, one of the few large-scale datasets that provide radar radio frequency imagery. A camera-only variant of our method achieves competitive performance in monocular 3D object detection on the Waymo Open Dataset.

Radar+RGB Attentive Fusion for Robust Object Detection in Autonomous Vehicles

arXiv (Cornell University), 2020

This paper presents two variations of architecture referred to as RANet and BIRANet. The proposed architecture aims to use radar signal data along with RGB camera images to form a robust detection network that works efficiently, even in variable lighting and weather conditions such as rain, dust, fog, and others. First, radar information is fused in the feature extractor network. Second, radar points are used to generate guided anchors. Third, a method is proposed to improve region proposal network [1] targets. BIRANet yields 72.3/75.3% average AP/AR on the NuScenes [2] dataset, which is better than the performance of our base network Faster-RCNN with Feature pyramid network(FFPN) [3]. RANet gives 69.6/71.9% average AP/AR on the same dataset, which is reasonably acceptable performance. Also, both BI-RANet and RANet are evaluated to be robust towards the noise.

Scene-aware Learning Network for Radar Object Detection

Proceedings of the 2021 International Conference on Multimedia Retrieval, 2021

Object detection is essential to safe autonomous or assisted driving. Previous works usually utilize RGB images or LiDAR point clouds to identify and localize multiple objects in self-driving. However, cameras tend to fail in bad driving conditions, e.g. bad weather or weak lighting, while LiDAR scanners are too expensive to get widely deployed in commercial applications. Radar has been drawing more and more attention due to its robustness and low cost. In this paper, we propose a scene-aware radar learning framework for accurate and robust object detection. First, the learning framework contains branches conditioning on the scene category of the radar sequence; with each branch optimized for a specific type of scene. Second, three different 3D autoencoder-based architectures are proposed for radar object detection and ensemble learning is performed over the different architectures to further boost the final performance. Third, we propose novel scene-aware sequence mix augmentation (SceneMix) and scene-specific post-processing to generate more robust detection results. In the ROD2021 Challenge, we achieved a final result of average precision of 75.0% and an average recall of 81.0%. Moreover, in the parking lot scene, our framework ranks first with an average precision of 97.8% and an average recall of 98.6%, which demonstrates the effectiveness of our framework. CCS CONCEPTS • Computing methodologies → Object detection; Scene understanding; Neural networks.

Semi-automatic annotation of 3D Radar and Camera for Smart Infrastructure-based perception

IEEE access, 2024

Environment perception using camera, radar, and/or lidar sensors has significantly improved in the last few years because of deep learning-based methods. However, a large group of these methods fall into the category of supervised learning, which requires a considerable amount of annotated data. Due to uncertainties in multi-sensor data, automating the data labeling process is extremely challenging; hence, it is performed manually to a large extent. Even though full automation of such a process is difficult, semiautomation can be a significant step to ease this process. However, the available work in this regard is still very limited; hence, in this paper, a novel semi-automatic annotation methodology is developed for labeling RGB camera images and 3D automotive radar point cloud data using a smart infrastructure-based sensor setup. This paper also describes a new method for 3D radar background subtraction to remove clutter and a new object category, GROUP, for radar-based object detection for closely located vulnerable road users. To validate the work, a dataset named INFRA-3DRC is created using this methodology, where 75% of the labels are automatically generated. In addition, a radar cluster classifier and an image classifier are developed, trained, and tested on this dataset, achieving accuracy of 98.26% and 94.86%, respectively. The dataset and Python scripts are available at https://fraunhoferivi.github.io/INFRA-3DRC-Dataset/.

Towards Deep Radar Perception for Autonomous Driving: Datasets, Methods, and Challenges

Sensors

With recent developments, the performance of automotive radar has improved significantly. The next generation of 4D radar can achieve imaging capability in the form of high-resolution point clouds. In this context, we believe that the era of deep learning for radar perception has arrived. However, studies on radar deep learning are spread across different tasks, and a holistic overview is lacking. This review paper attempts to provide a big picture of the deep radar perception stack, including signal processing, datasets, labelling, data augmentation, and downstream tasks such as depth and velocity estimation, object detection, and sensor fusion. For these tasks, we focus on explaining how the network structure is adapted to radar domain knowledge. In particular, we summarise three overlooked challenges in deep radar perception, including multi-path effects, uncertainty problems, and adverse weather effects, and present some attempts to solve them.

Raw High-Definition Radar for Multi-Task Learning

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Fig. 1: Overview of our RADIal dataset. RADIal includes a set of 3 sensors (camera, laser scanner, high-definition radar) and comes with GPS and vehicle's CAN traces; 25K synchronized samples are recorded in raw format. (a) Camera image with projected laser point cloud in red and radar point cloud in indigo, vehicle annotation in orange and free driving space annotation in green; (b) Radar power spectrum with bounding box annotations; (c) Free driving space annotation in bird-eye view, with annotated vehicle bounding boxes in orange, radar point cloud in indigo and laser point cloud in red; (d) Range-Azimuth map in Cartesian coordinates overlayed with radar point cloud and laser point cloud; (e) GPS trace in red and odometry trajectory reconstruction in green.

RRPN: Radar Region Proposal Network for Object Detection in Autonomous Vehicles

2019 IEEE International Conference on Image Processing (ICIP), 2019

Region proposal algorithms play an important role in most state-of-the-art two-stage object detection networks by hypothesizing object locations in the image. Nonetheless, region proposal algorithms are known to be the bottleneck in most two-stage object detection networks, increasing the processing time for each image and resulting in slow networks not suitable for real-time applications such as autonomous driving vehicles. In this paper we introduce RRPN, a Radarbased real-time region proposal algorithm for object detection in autonomous driving vehicles. RRPN generates object proposals by mapping Radar detections to the image coordinate system and generating pre-defined anchor boxes for each mapped Radar detection point. These anchor boxes are then transformed and scaled based on the object's distance from the vehicle, to provide more accurate proposals for the detected objects. We evaluate our method on the newly released NuScenes dataset [1] using the Fast R-CNN object detection network [2]. Compared to the Selective Search object proposal algorithm [3], our model operates more than 100× faster while at the same time achieves higher detection precision and recall. Code has been made publicly available at https://github.com/mrnabati/RRPN.

CFTrack: Center-based Radar and Camera Fusion for 3D Multi-Object Tracking

2021 IEEE Intelligent Vehicles Symposium Workshops (IV Workshops), 2021

3D multi-object tracking is a crucial component in the perception system of autonomous driving vehicles. Tracking all dynamic objects around the vehicle is essential for tasks such as obstacle avoidance and path planning. Autonomous vehicles are usually equipped with different sensor modalities to improve accuracy and reliability. While sensor fusion has been widely used in object detection networks in recent years, most existing multi-object tracking algorithms either rely on a single input modality, or do not fully exploit the information provided by multiple sensing modalities. In this work, we propose an endto-end network for joint object detection and tracking based on radar and camera sensor fusion. Our proposed method uses a center-based radar-camera fusion algorithm for object detection and utilizes a greedy algorithm for object association. The proposed greedy algorithm uses the depth, velocity and 2D displacement of the detected objects to associate them through time. This makes our tracking algorithm very robust to occluded and overlapping objects, as the depth and velocity information can help the network in distinguishing them. We evaluate our method on the challenging nuScenes dataset, where it achieves 20.0 AMOTA and outperforms all vision-based 3D tracking methods in the benchmark, as well as the baseline LiDAR-based method. Our method is online with a runtime of 35ms per image, making it very suitable for autonomous driving applications.

Deep Radar Detector

2019 IEEE Radar Conference (RadarConf)

While camera and LiDAR processing have been revolutionized since the introduction of deep learning, radar processing still relies on classical tools. In this paper, we introduce a deep learning approach for radar processing, working directly with the radar complex data. To overcome the lack of radar labeled data, we rely in training only on the radar calibration data and introduce new radar augmentation techniques. We evaluate our method on the radar 4D detection task and demonstrate superior performance compared to the classical approaches while keeping real-time performance. Applying deep learning on radar data has several advantages such as eliminating the need for an expensive radar calibration process each time and enabling classification of the detected objects with almost zero-overhead.

Augmented Radar Points Connectivity based on Image Processing Techniques for Object Detection and Classification

Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, 2022

Perception and scene understanding are complex modules that require data from multiple types of sensors to construct a weather-resilient system that can operate in almost all conditions. This is mainly due to drawbacks of each sensor on its own. The only sensor that is able to work in a variety of conditions is the radar. However, the sparseness of radar pointclouds from open source datasets makes it under-perform in object classification tasks. This is compared to the LiDAR, which after constraints and filtration, produces an average of 22,000 points per frame within a grid map image representation of 120 x 120 meters in the real world. Therefore, in this paper, a preprocessing module is proposed to enable the radar to partially reconnect objects in the scene from a sparse pointcloud. This adapts the radar to object classification tasks rather than the conventional uses in automotive applications, such as Adaptive Cruise Control or object tracking. The proposed module is used as preprocessing step in a Deep Learning pipeline for a classification task. The evaluation was carried out on the nuScenes dataset, as it contained both radar and LiDAR data, which enables the comparison between the performance of both modules. After applying the preprocessing module, this work managed to make the radar-based classification significantly close to the performance of the LiDAR.