3D model search and pose estimation from single images using VIP features (original) (raw)

3d model matching with viewpoint-invariant patches (vip)

2008

Abstract The robust alignment of images and scenes seen from widely different viewpoints is an important challenge for camera and scene reconstruction. This paper introduces a novel class of viewpoint independent local features for robust registration and novel algorithms to use the rich information of the new features for 3D scene alignment and large scale scene reconstruction. The key point of our approach consists of leveraging local shape information for the extraction of an invariant feature descriptor.

Relative Pose from SIFT Features

ArXiv, 2022

This paper proposes the geometric relationship of epipolar geometry and orientation- and scale-covariant, e.g., SIFT, features. We derive a new linear constraint relating the unknown elements of the fundamental matrix and the orientation and scale. This equation can be used together with the well-known epipolar constraint to, e.g., estimate the fundamental matrix from four SIFT correspondences, essential matrix from three, and to solve the semi-calibrated case from three correspondences. Requiring fewer correspondences than the well-known point-based approaches (e.g., 5PT, 6PT and 7PT solvers) for epipolar geometry estimation makes RANSAC-like randomized robust estimation significantly faster. The proposed constraint is tested on a number of problems in a synthetic environment and on publicly available real-world datasets on more than 80000 image pairs. It is superior to the state-of-the-art in terms of processing time while often leading to more accurate results.

Object Recognition and Modeling Using SIFT Features

Lecture Notes in Computer Science, 2013

In this paper we present a technique for object recognition and modelling based on local image features matching. Given a complete set of views of an object the goal of our technique is the recognition of the same object in an image of a cluttered environment containing the object and an estimate of its pose. The method is based on visual modeling of objects from a multi-view representation of the object to recognize. The first step consists of creating object model, selecting a subset of the available views using SIFT descriptors to evaluate image similarity and relevance. The selected views are then assumed as the model of the object and we show that they can effectively be used to visually represent the main aspects of the object.

3D Object Partial Matching Using Panoramic Views

Lecture Notes in Computer Science, 2013

In this paper, a methodology for 3D object partial matching and retrieval based on range image queries is presented. The proposed methodology addresses the retrieval of complete 3D objects based on artificially created range image queries which represent partial views. The core methodology relies upon Dense SIFT descriptors computed on panoramic views. Performance evaluation builds upon the standard measures and a challenging 3D pottery dataset originated from the Hampson Archeological Museum collection.

3D model retrieval using accurate pose estimation and view-based similarity

Proceedings of the 1st ACM International Conference on Multimedia Retrieval - ICMR '11, 2011

In this paper, a novel framework for 3D object retrieval is presented. The paper focuses on the investigation of an accurate 3D model alignment method, which is achieved by combining two intuitive criteria, the plane reflection symmetry and rectilinearity. After proper positioning in a coordinate system, a set of 2D images (multi-views) are automatically generated from the 3D object, by taking views from uniformly distributed viewpoints. For each image, a set of flip-invariant shape descriptors is extracted. Taking advantage of both the pose estimation of the 3D objects and the flip-invariance property of the extracted descriptors, a new matching scheme for fast computation of 3D object dissimilarity is introduced. Experiments conducted in SHREC 2009 benchmark show the superiority of the pose estimation method over similar approaches, as well as the efficiency of the new matching scheme.

Graph matching using SIFT descriptors: an application to pose recovery of a mobile robot

Image-feature matching based on Local Invariant Feature Extraction (LIFE) methods has proven to be successful, and SIFT is one of the most effective. SIFT matching uses only local texture information to compute the correspondences. A number of approaches have been presented aimed at enhancing the image-features matches computed using only local information such as SIFT. What most of these approaches have in common is that they use a higher level information such as spatial arrangement of the feature points to reject a subset of outliers. The main limitation of the outlier rejectors is that they are not able to enhance the configuration of matches by adding new useful ones. In the present work we propose a graph matching algorithm aimed not only at rejecting erroneous matches but also at selecting additional useful ones. We use both the graph structure to encode the geometrical information and the SIFT descriptors in the node's attributes to provide local texture information. This algorithm is an ensemble of successful ideas previously reported by other researchers. We demonstrate the effectiveness of our algorithm in a pose recovery application.

Location Field Descriptors: Single Image 3D Model Retrieval in the Wild

2019 International Conference on 3D Vision (3DV), 2019

We present Location Field Descriptors, a novel approach for single image 3D model retrieval in the wild. In contrast to previous methods that directly map 3D models and RGB images to an embedding space, we establish a common low-level representation in the form of location fields from which we compute pose invariant 3D shape descriptors. Location fields encode correspondences between 2D pixels and 3D surface coordinates and, thus, explicitly capture 3D shape and 3D pose information without appearance variations which are irrelevant for the task. This early fusion of 3D models and RGB images results in three main advantages: First, the bottleneck location field prediction acts as a regularizer during training. Second, major parts of the system benefit from training on a virtually infinite amount of synthetic data. Finally, the predicted location fields are visually interpretable and unblackbox the system. We evaluate our proposed approach on three challenging real-world datasets (Pix3D, Comp, and Stanford) with different object categories and significantly outperform the state-of-the-art by up to 20% absolute in multiple 3D retrieval metrics.

SoftPOSIT: Simultaneous Pose and Correspondence Determination

2002

The problem of pose estimation arises in many areas of computer vision, including object recognition, object tracking, site inspection and updating, and autonomous navigation using scene models. We present a new algorithm, called SoftPOSIT, for determining the pose of a 3D object from a single 2D image in the case that correspondences between model points and image points are unknown. The algorithm combines Gold’s iterative SoftAssign algorithm [19, 20] for computing correspondences and DeMenthon’s iterative POSIT algorithm [13] for computing object pose under a full-perspective camera model. Our algorithm, unlike most previous algorithms for this problem, does not have to hypothesize small sets of matches and then verify the remaining image points. Instead, all possible matches are treated identically throughout the search for an optimal pose. The performance of the algorithm is extensively evaluated in Monte Carlo simulations on synthetic data under a variety of levels of clutter, occlusion, and image noise. These tests show that the algorithm performs well in a variety of difficult scenarios, and empirical evidence suggests that the algorithm has a run-time complexity that is better than previous methods by a factor equal to the number of image points. The algorithm is being applied to the practical problem of autonomous vehicle navigation in a city through registration of a 3D architectural models of buildings to images obtained from an on-board camera.

Partial matching of 3D cultural heritage objects using panoramic views

Multimedia Tools and Applications, 2014

In this paper, we present a method for partial matching and retrieval of 3D objects based on range image queries. The proposed methodology addresses the retrieval of complete 3D objects using range image queries that represent partial views. The core methodology relies upon Bag-of-Visual-Words modelling and enhanced Dense SIFT descriptor computed on panoramic views and range image queries. Performance evaluation builds upon standard measures and a challenging 3D pottery dataset originating from the Hampson Archaeological Museum collection.