Extreme Rotation Estimation using Dense Correlation Volumes (original) (raw)

How can we estimate relative rotation between images in extreme non-overlapping cases? (Hover over the images to reveal some implicit cues!)

Above we show two non-overlapping image pairs capturing an urban street scene (left) and a church (right). Possible cues revealing their relative geometric relationship include sunlight and direction of shadows in outdoor scenes and parallel lines and vanishing points in manmade scenes.

In this work, we present an approach for reasoning about such "hidden" cues for estimating the relative rotation between a pair of (possibly) non-overlapping images.

overview

Given a pair of images, a shared-weight Siamese encoder extracts feature maps. We compute a 4D correlation volume using the inner product of features, from which our model predicts the relative rotation (here, as distributions over Euler angles).

4dcrv

A 4D correlation volume is calculated from a pair of image feature maps. Given a feature vector from Image 1, we compute the dot product with all feature vectors in Image 2, and build up a 2D slice of size H x W. Combining all 2D slices over all feature vectors in Image 1, we obtain a 4D correlation volume of size H x W x H x W.
Our correlation volumes are implicitly assigned a dual role which emerges through training on both overlapping and non-overlapping pairs. When the input image pair contains significant overlap, pointwise correspondence can be computed and transferred onward to the rotation prediction module. When the input image pair contains little to no overlap, the correlation volume can assume the novel role of detecting implicit cues.

Predicted Rotation Results on Indoor Scenes

Hover over the images to see the full panoramas with the ground-truth perspective images marked in red. We show our predicted viewpoints (in yellow) and the result obtained by a regression model predicting a continuous representation in 6D (in blue).

pers

pano

pers

pano

Predicted Rotation Results on Outdoor Scenes and Generalization to New Cities

We show results on images from unseen panoramas in Manhattan, Pittsburgh and London, all obtained from a model trained on images from Manhattan only.

pers

pano

pers

pano

pers

pano