Automatic generation of video navigation from Google Street View data with car detection and inpainting

In spite of the existence of numerous navigation tools/systems, Google Street View, offering only a single static image at a time, is still sometimes preferred for the provision of a realistic scene. However, for the sake of navigation, given the starting and ending locations, a navigation video consisting of images obtained from Google Street View service is desired. Several papers have tried to address this issue in some sense; however, there is still much room for further improvement. First, the generation of navigation video is not very smooth, i.e., the transition from one frame to another frame is not properly controlled, thus resulting a potential abrupt change from one scene toward another. Second, the generated video oftentimes contains many undesired vehicles and people, and the removal of these distracting objects would greatly enhance the quality of the navigational video. In this paper, we first make use of HOG and/or Haar features for detecting vehicles and people, and then we have also made some preliminary trials of using Faster R-CNN and Caffe to speed up detecting vehicles and people. Results are demonstrated to prove the effectiveness of our approaches and compared with similar approaches when applicable to show our improvement. In addition, a post-processing tool is also developed to interactively refine the results in case the automatic object detection is not perfect.

Automatic generation of video navigation from Google Street View data with car detection and inpainting (original) (raw)