3D model search and pose estimation from single images using VIP features (original) (raw)

Abstract

Abstract This paper describes a method to efficiently search for 3D models in a city-scale database and to compute the camera poses from single query images. The proposed method matches SIFT features (from a single image) to viewpoint invariant patches (VIP) from a 3D model by warping the SIFT features approximately into the orthographic frame of the VIP features. This significantly increases the number of feature correspondences which results in a reliable and robust pose estimation.

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (11)

A. Akbarzadeh, J. Frahm, P. Mordohai, B. Clipp, C. En- gels, D. Gallup, P. Merrell, M. Phelps, S. Sinha, B. Tal- ton, L. Wang, Q. Yang, H. Stewenius, R. Yang, G. Welch, H. Towles, D. Nister, and M. Pollefeys. Towards urban 3d reconstruction from video. In 3D Data Processing, Visual- ization and Transmission, pages 1-8, 2006.
O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. Total recall: Automatic query expansion with a generative
D. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91-110, 2004.
K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool. A comparison of affine region detectors. International Journal of Computer Vision, 65(1-2):43-72, 2005.
D. Nistér and H. Stewénius. Scalable recognition with a vo- cabulary tree. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, New York City, New York, pages 2161-2168, 2006.
D. Robertsone and R. Cipolla. An image-based system for urban navigation. In Proc. 14th British Machine Vision Con- ference, London, UK, pages 1-10, 2004.
G. Schindler, M. Brown, and R. Szeliski. City-scale location recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, Minnesota, pages 1- 7, 2007.
H. Shao, T. Svoboda, T. Tuytelaars, , and L. J. V. Gool. Hpat indexing for fast object/scene recognition based on local ap- pearance. In Conference on Image and video retrieval, pages 71-80, 2003.
C. Wu, B. Clipp, X. Li, J.-M. Frahm, and M. Pollefeys. 3d model matching with viewpoint invariant patches (vips). In To appear in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2008.
G. Yang, J. Becker, and C. Stewart. Estimating the location of a camera with respect to a 3d model. In 3DIM07, pages 159-166, 2007.
W. Zhang and J. Kosecka. Image based localization in ur- ban environments. In 3D Data Processing, Visualization and Transmission, pages 33-40, 2006.