Mathematical flaws in the essential matrix theory (original) (raw)

2012-Flaws in the Computer Algorithm for Reconstructing a Scene from Two Projections.pdf

—In 1981 Longuet-Higgins represented the world point by two vectors in the two camera reference frames and developed the essential matrix. Such a matrix is a relation between the corresponding image points on the two images of a world point on a rigid scene. The essential matrix is independent of the position and orientation of the cameras used to capture the two views. The calculation of the essential matrix requires the knowledge of at least five accurate pairs of corresponding points. The unavailability of a procedure that fulfills such a requirement led researchers to focus their attention on developing estimation methods of the essential matrix without questioning the mathematical correctness of its derivation. In this paper, we identify and expose flaws in Longuet-Higgins' derivation of the essential matrix. These flaws are the result of mixing up between the scalar product of vectors in a single reference frame and the transformation of vectors from one reference frame to another.

SHORTCOMINGS OF THE FUNDAMENTAL MATRIX EQUATION TO RECONSTRUCT 3D SCENES

In stereo vision, the epipolar geometry is the intrinsic projective geometry between the two views. The essential and fundamental matrices relate corresponding points in stereo images. The essential matrix describes the geometry when the used cameras are calibrated, and the fundamental matrix expresses the geometry when the cameras are uncalibrated. Since the nineties, researchers devoted a lot of effort to estimating the fundamental matrix. Although it is a landmark of computer vision, in the current work, three derivations of the essential and fundamental matrices have been revised. The Longuet-Higgins' derivation of the essential matrix where the author draws a mapping between the position vectors of a 3D point; however, the one-to-one feature of that mapping is lost when he changed it to a relation between the image points. In the two other derivations, we demonstrate that the authors established a mapping between the image points through the misuse of mathematics.

Flaws in the Computer Algorithm for Reconstructing a Scene from Two Projections

International Journal of Machine Learning and Computing, 2012

In 1981 Longuet-Higgins represented the world point by two vectors in the two camera reference frames and developed the essential matrix. Such a matrix is a relation between the corresponding image points on the two images of a world point on a rigid scene. The essential matrix is independent of the position and orientation of the cameras used to capture the two views. The calculation of the essential matrix requires the knowledge of at least five accurate pairs of corresponding points. The unavailability of a procedure that fulfills such a requirement led researchers to focus their attention on developing estimation methods of the essential matrix without questioning the mathematical correctness of its derivation. In this paper, we identify and expose flaws in Longuet-Higgins' derivation of the essential matrix. These flaws are the result of mixing up between the scalar product of vectors in a single reference frame and the transformation of vectors from one reference frame to another.

Algebraic Aspects of Reconstruction of 3D Scenes from One or More Views

Procedings of the British Machine Vision Conference 2001, 2001

This paper considers the problem of 3D reconstruction from 2D points in one or more images and auxiliary information about the corresponding 3D features : alignments, coplanarities, ratios of lengths or symmetries are known. Our first contribution is a necessary and sufficient criterion that indicates whether a dataset, or subsets thereof, defines a rigid reconstruction up to scale and translation. Another contribution is a reconstruction method for one or more images. We show that the observations impose linear constraints on the reconstruction. All the input data, possibly coming from many images, is summarized in a single linear system, whose solution yields the reconstruction. The criterion which indicates whether the solution is unique up to scale and translation is the rank of another linear system, called the "twin" system. Multiple objects whose relative scale can be arbitrarily chosen are identified. The reconstruction is obtained up to an affine transformation, or, if calibration is available, up to a Euclidean transformation.

Three-view camera calibration using geometric algebra

Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429)

In a former work of ours [I], we proposed a new way to express and interpret the epipolar constraint using Geometric Algebra, and we derived from it a novel and efficient 2view camera calibration technique. In this paper we extend this GA approach to the 3-view case. Aiter expressing the trifocal constraint in t e m of bivectors and trivectors, we provide an altemative geometric interpretation of the coefficients of the trifocal tensor. On the basis of that, we propose a novel solution for the simultaneous determination of the focal lengths of the cameras and the rigid motion between three views.

Stereoscopic projections and 3D scene reconstruction

Proceedings of the 1992 ACM/SIGAPP symposium on Applied computing: technological challenges of the 1990's, 1992

The problem of determining the geometry of a scene occurs in photographic surveying and binocular vision where non-visual information is not as accurate as the optical information obtained by the human eyes [Ogle 1964]. Thts problem involves determining the relative orientation of the two image planes and the depth relation of spatial points with respect to the image planes. There are several approaches to this problem. The camera information can be derived analytically from knowledge of the camera geometry, or empirically~m a small number of stereo pain [Strat 1989]. Thompson [1959] developed a technique using five stereo pairs of pointa to caIculatethe structure of the scene, Thomp30nuses a numerical technique to solve five simr.dtsneousnon-linear equations of third degree. Longuet-Higgins gives an elegant technique which uses eight stereoscopic pairs of points. This method requires solving eight simukaneous linear equations. However, it requires calculating the inverse of an 8-by-8 matrix, which can be time consuming and prone to error due to computer arithmetic. This paper presents an efficient, heuristic approach which requires at most six stereoscopic pairs of points and is baaed on solving a linear system of at most three equations. In this paper, the unique solution is determined duectly from the forward coordinates of spatial points instead of applying an iterative procedure [Longuet-Higgins 198 1]. The new approach is intuitive, non-iterative, and enables a clear understanding of the operations performed and inferences drawn.

Reshetov LA Is the Fundamental Matrix Really Independent of the Scene Structure?

In stereo vision, two images of a 3D scene are acquired from two viewpoints. One of the objectives of stereo vision work is to recover the 3D structure of the scene. Epipolar geometry describes the relationship between the images, and the essential and fundamental matrices are the algebraic representations of this geometry. The most important feature of these matrices that is emphasized in the literature is that they are independent of the scene structure. This article illustrates—empirically and theoretically—that the fundamental matrix depends on the scene structure and demonstrates that the matrix in  0 r l m Fm not only represents a relationship between corresponding points of the two views but also represents a relationship between other non-corresponding points. Furthermore, we show empirically that the equation  0 r l m Fm does not hold for any pair of corresponding points. In scenes with objects of different depths, the value of r l m Fm depends on the depths of the 3D points and increases proportionally with an increasing baseline.

An Invitation to 3-D Vision

Interdisciplinary Applied Mathematics, 2004

Contents xv 2.7 Exercises 2.A Quaternions and Euler angles for rotations Image Formation 3.1 Representation of images 3.2 Lenses, light, and basic photometry. 3.2.1 Imaging through lenses 3.2.2 Imaging through a pinhole 3.3 A geometric model of image formation 3.3.1 An ideal perspective camera 3.3.2 Camera with intrinsic parameters 3.3.3 Radial distortion 3.3.4 Image, preimage, and coimage of points and lines. . 3.4 Summary 3.5 Exercises 3.A Basic photometry with light sources and surfaces 3.B Image formation in the language of projective geometry ... Image Primitives and Correspondence 4.1 Correspondence of geometric features 4.1.1 From photometric features to geometric primitives. . 4.1.2 Local vs. global image deformations 4.2 Local deformation models 4.2.1 Transformations of the image domain 4.2.2 Transformations of the intensity value 4.3 Matching point features 4.3.1 Small baseline: feature tracking and optical flow. .. 4.3.2 Large baseline: affine model and normalized crosscorrelation 4.3.3 Point feature selection 4.4 Tracking line features 4.4.1 Edge features and edge detection 4.4.2 Composition of edge elements: line fitting 4.4.3 Tracking and matching line segments 4.5 Summary 4.6 Exercises 4. A Computing image gradients 99 II Geometry of Two Views 5 Reconstruction from Two Calibrated Views 109 5.1 Epipolar geometry 5.1.1 The epipolar constraint and the essential matrix ... 5.1.2 Elementary properties of the essential matrix

A Comparison of Projective Reconstruction Methods for Pairs of Views

1995

Recently, di erent approaches for uncalibrated stereo have been suggested which permit projective reconstructions from multiple views. These use weak calibration which is represented by the epipolar geometry, and so we require no knowledge of the intrinsic or extrinsic camera parameters. In this paper we consider projective reconstructions from pairs of views, and compare a number of the available methods.