Context-Aware Multi-Task Learning for Traffic Scene Recognition in Autonomous Vehicles (original) (raw)
Related papers
Scene recognition under special traffic conditions based on deep multi‐task learning
The Journal of Engineering, 2020
Traffic scene recognition under special conditions is one of the most promising yet challenging tasks for autonomous driving systems. This study presents a deep multi-task classification framework for scene recognition involving special traffic conditions. The framework incorporates four learning tasks where the recognition of special traffic scenes is the chief task and the time of occurrence (daytime or night-time), the weather type and the road attribute are the three auxiliary tasks for improving the recognition performance. The four tasks share the feature map generated by a convolutional neural network followed by task-specific sub-networks which are merged in the end via a joint loss function. Moreover, a small dataset of typical special traffic conditions was built for training and testing the recognition model. Experimental results demonstrate that the proposed framework significantly improves the accuracy of scene recognition under special traffic conditions.
Learning Task Relatedness in Multi-Task Learning for Images in Context
Proceedings of the 2019 on International Conference on Multimedia Retrieval - ICMR '19, 2019
Multimedia applications often require concurrent solutions to multiple tasks. These tasks hold clues to each-others solutions, however as these relations can be complex this remains a rarely utilized property. When task relations are explicitly defined based on domain knowledge multi-task learning (MTL) offers such concurrent solutions, while exploiting relatedness between multiple tasks performed over the same dataset. In most cases however, this relatedness is not explicitly defined and the domain expert knowledge that defines it is not available. To address this issue, we introduce Selective Sharing, a method that learns the inter-task relatedness from secondary latent features while the model trains. Using this insight, we can automatically group tasks and allow them to share knowledge in a mutually beneficial way. We support our method with experiments on 5 datasets in classification, regression, and ranking tasks and compare to strong baselines and state-of-the-art approaches showing a consistent improvement in terms of accuracy and parameter counts. In addition, we perform an activation region analysis showing how Selective Sharing affects the learned representation.
SpotNet: Self-Attention Multi-Task Network for Object Detection
2020 17th Conference on Computer and Robot Vision (CRV)
Humans are very good at directing their visual attention toward relevant areas when they search for different types of objects. For instance, when we search for cars, we will look at the streets, not at the top of buildings. The motivation of this paper is to train a network to do the same via a multi-task learning approach. To train visual attention, we produce foreground/background segmentation labels in a semi-supervised way, using background subtraction or optical flow. Using these labels, we train an object detection model to produce foreground/background segmentation maps as well as bounding boxes while sharing most model parameters. We use those segmentation maps inside the network as a self-attention mechanism to weight the feature map used to produce the bounding boxes, decreasing the signal of non-relevant areas. We show that by using this method, we obtain a significant mAP improvement on two traffic surveillance datasets, with state-of-the-art results on both UA-DETRAC and UAVDT.
Multi-Task Mutual Learning for Vehicle Re-Identification
2019
Vehicle re-identification (Re-ID) aims to search a specific vehicle instance across non-overlapping camera views. The main challenge of vehicle Re-ID is that the visual appearance of vehicles may drastically changes according to diverse viewpoints and illumination. Most existing vehicle Re-ID models cannot make full use of various complementary vehicle information, e.g. vehicle type and orientation. In this paper, we propose a novel Multi-Task Mutual Learning (MTML) deep model to learn discriminative features simultaneously from multiple branches. Specifically, we design a consensus learning loss function by fusing features from the final convolutional feature maps from all branches. Extensive comparative evaluations demonstrate the effectiveness of our proposed MTML method in comparison to the state-of-the-art vehicle Re-ID techniques on a largescale benchmark dataset, VeRi-776. We also yield competitive performance on the NVIDIA 2019 AI City Challenge Track 2.
Deep Integration: A Multi-Label Architecture for Road Scene Recognition
IEEE Transactions on Image Processing, 2019
Deep convolutional neural networks have been applied by automobile industries, internet giants and academic institutes to boost autonomous driving technologies, while progress has been witnessed in environmental perception tasks such as object detection and driver state recognition, the scene-centric understanding and identification still remain a virgin land. This mainly encompasses two key issues: 1) the lack of shared large datasets with comprehensively annotated road scene information, and 2) the difficulty to find effective ways to train networks concerning the bias of category samples, image resolutions, scene dynamics, and capturing conditions, etc. In this work, we make two contributions. i) We introduce a large-scale dataset with over 110k images, dubbed DrivingScene, covering traffic scenarios under different weather conditions, road structures, environmental instances and driving places, which is the first large-scale dataset for multi-class traffic scenes classification; ii) we propose a multi-label neural network for road scene recognition, which incorporates both single-and multi-class classification modes into a multi-level cost function for training with imbalanced categories and utilizes a deep data integration strategy to improve the classification ability on hard samples. The experimental results on DrivingScene and PASCAL VOC demonstrate the effectiveness of the proposed approach in handling the challenge of data imbalance.
End-to-End Real-Time Obstacle Detection Network for Safe Self-Driving via Multi-Task Learning
IEEE Transactions on Intelligent Transportation Systems, 2022
Semantic segmentation and depth estimation lie at the heart of scene understanding and play crucial roles especially for autonomous driving. In particular, it is desirable for an intelligent self-driving agent to discern unexpected obstacles on the road ahead reliably in real-time. While existing semantic segmentation studies for small road hazard detection have incorporated fusion of multiple modalities, they require additional sensor inputs and are often limited by a heavyweight network for real-time processing. In this light, we propose an end-to-end Real-time Obstacle Detection via Simultaneous refinement, coined RODSNet (https://github.com/SAMMiCA/ RODSNet) which jointly learns semantic segmentation and disparity maps from a stereo RGB pair and refines them simultaneously in a single module. RODSNet exploits two efficient single-task network architectures and a simple refinement module in a multi-task learning scheme to recognize unexpected small obstacles on the road. We validate our method by fusing Cityscapes and Lost and Found datasets and show that our method outperforms previous approaches on the obstacle detection task, even recognizing the unannotated obstacles at 14.5 FPS on our fused dataset (2048 × 1024 resolution) using RODSNet-2×. In addition, extensive ablation studies demonstrate that our simultaneous refinement effectively facilitates contextual learning between semantic and depth information.
IRJET- MULTI-LABEL ROAD SCENE PREDICTION FOR AUTONOMOUS VEHICLES USING DEEP NEURAL NETWORKS
IRJET, 2020
Autonomous driving systems are becoming popularity nowadays. The detection of road conditions is vital for the efficient working of autonomous vehicles. Internet giants and academic institutes to spice up autonomous driving technologies; while progress has been witnessed in environmental perception tasks, like object detection and driver state recognition, the scene centric understanding and identification still remain a virgin land. This mainly encompasses two key issues: the shortage of shared large datasets with comprehensively annotated road scene information and the difficulty to find effective ways to coach networks concerning the bias of category samples, image resolutions, scene dynamics, and capturing conditions. We propose a multi-label neural network for road scene recognition, which includes both single-and multi-class classification modes into a multi-level cost function for training with imbalanced categories and utilizes a deep data integration strategy to enhance the classification ability on hard samples. Along side road conditions the situation of the vehicle and lane departure warning system is additionally included for the efficient autonomous system.
Semi-supervised Multi-task Learning for Semantics and Depth
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022
Multi-Task Learning (MTL) aims to enhance the model generalization by sharing representations between related tasks for better performance. Typical MTL methods are jointly trained with the complete multitude of ground-truths for all tasks simultaneously. However, one single dataset may not contain the annotations for each task of interest. To address this issue, we propose the Semi-supervised Multi-Task Learning (SemiMTL) method to leverage the available supervisory signals from different datasets, particularly for semantic segmentation and depth estimation tasks. To this end, we design an adversarial learning scheme in our semisupervised training by leveraging unlabeled data to optimize all the task branches simultaneously and accomplish all tasks across datasets with partial annotations. We further present a domain-aware discriminator structure with various alignment formulations to mitigate the domain discrepancy issue among datasets. Finally, we demonstrate the effectiveness of the proposed method to learn across different datasets on challenging street view and remote sensing benchmarks.
Cross-task Attention Mechanism for Dense Multi-task Learning
arXiv (Cornell University), 2022
Multi-task learning has recently become a promising solution for comprehensive understanding of complex scenes. Not only being memory-efficient, multi-task models with an appropriate design can favor exchange of complementary signals across tasks. In this work, we jointly address 2D semantic segmentation, and two geometry-related tasks, namely dense depth, surface normal estimation as well as edge estimation showing their benefit on indoor and outdoor datasets. We propose a novel multi-task learning architecture that exploits pairwise cross-task exchange through correlation-guided attention and self-attention to enhance the average representation learning for all tasks. We conduct extensive experiments considering three multi-task setups, showing the benefit of our proposal in comparison to competitive baselines in both synthetic and real benchmarks. We also extend our method to the novel multi-task unsupervised domain adaptation setting.