Learning to Recognize, Observe, Analyze and Drive Through Work Zones (original) (raw)
Robotics Institute, Carnegie Mellon University
ROADWork dataset contains annotated images and sequences taken by driving through nearly 5000 different work zones in 18 US cities.
Abstract
Perceiving and navigating through work zones is challenging and under-explored, even with major strides in self-driving research. An important reason is the lack of open datasets for developing new algorithms to address this long-tailed scenario. We propose the ROADWork dataset to learn how to recognize, observe and analyze and drive through work zones. We find that state-of-the-art foundation models perform poorly on work zones. With our dataset,
- We improve upon detecting work zone objects (+26.2 AP) compared to open-vocabulary models.
- We discover work zones with higher precision (+32.5 %) at a much higher discovery rate (12.8X) than open-vocabulary models.
- We significantly improve detecting (+23.9 AP) and reading (+14.2% 1 - NED) work zone signs than State-of-the-art methods.
- We improve upon poorly-performing foundation models for describing work zones (+36.7 SPICE).
- We also compute drivable paths from work zone navigation videos and show that it is possible to predict 53.6% navigational goals (+9.9 %) and 75.3% pathways (+8.1 %) with angular error (AE) < 0.5 degrees over baseline.
Understanding and Navigating Work Zones is Difficult
Why are work zones so hard for self-driving cars? No two work zones are truly alike and objects like barriers, and work vehicles widely vary by the type, status, location and geography of work zones. Navigational aids (like signs) are customized to the particular work zone and require fine-grained understanding. Spatial configurations of work zone objects do not conform to the lane, road and sidewalk layouts. Often, rules of traffic are suspended and new rules are enforced that may change over time. All these reasons make work zone understanding and navigation difficult.
Work zones are dynamic. For example, in this situation, a self-driving is expected to read the signs held by the workers saying STOP, wait till the car from the opposite direction passes, and observe that the workers have changed the sign to SLOW and then proceed. This requires the car to understand the global scene context, fine grained observations like signs, and the larger context of the work zone and workers which makes navigation challenging.
ROADWork Dataset Overview
Work zone images and sequences from 18 U.S. cities. We have segmented 15 object instances like workers, vehicles and barriers. We provide scene level and object attributes (for signs and arrow boards) to enable fine-grained understanding. Work zone scene descriptions help analyze the scene globally and one passable trajectory automatically estimated from video to learn how to drive through work zones.
Recognize Work Zones
Using detectors trained on ROADWork dataset, we discovered work zones around the world in Mappilary and BDD datasets.
Observe Work Zones
ROADWork dataset improves fine-grained understanding of work zone signs, arrow boards and other rare objects. These objects are poorly detected by pre-trained foundation models like Detic and OpenSEED.
Our sign attributes contain 62 types of graphics and 360 different text annotations.
We annotate rare and diverse object instances like police cars, tubular markers, barriers and work vehicles.
search
Please hover on the images to zoom in.
Analyze Work Zones
Pre-trained foundation models like LLaVA poorly understand work zones. Using ROADWork Dataset, we improved their performance via ground truth descriptions and work zone objects as context.
Drive Through Work Zones
Using drivable paths from ROADWork dataset, we can learn drivable goals and pathways for navigating work zones.
Acknowledgments
This work was supported by a research contract from General Motors Research-Israel, NSF Grant CNS-2038612, a US DOT grant 69A3551747111 through the Mobility21 UTC and grants 69A3552344811 and 69A3552348316 through the Safety21 UTC. We thank N. Dinesh Reddy, Khiem Vuong, Shefali Srivastava, Neha Boloor, Tiffany Ma for insightful discussions.