Ivan Eichhardt | Eötvös Loránd University (original) (raw)

Papers by Ivan Eichhardt

Nowadays multi-view stereo reconstruction algorithms can achieve impressive results using many vi... more Nowadays multi-view stereo reconstruction algorithms can achieve impressive results using many views of the scene. Our primary objective is to robustly extract more information about the underlying surface from fewer images. We present a method for point-wise surface normal and tangent plane estimation in stereo case to reconstruct real-world scenes. The proposed algorithm works for general camera model, however, we choose the pinhole-camera in order to demonstrate its efficiency. The presented method uses particle swarm optimization under geometric and epipolar constraints in order to achieve suitable speed and quality. An oriented point cloud is generated using a single point correspondence for each oriented 3D point and a cost function based on photo-consistency. It can straightforwardly be extended to multi-view reconstruction. Our method is validated in both synthesized and real tests. The proposed algorithm is compared to one of the state-of-the-art patch-based multi-view reconstruction algorithms.

A new camera calibration approach is proposed that can utilize the affine transformations and sur... more A new camera calibration approach is proposed that can utilize the affine transformations and surface normals of small spatial patches. Even though classical calibration algorithms use only point locations, images contain more information than simple 2D point coordinates. New methods are presented in this paper for the calibration problem with their closed-form solutions, then the estimated parameters are numerically refined. The accuracy of our novel methods is validated on synthesized testing data, and the real-world applicability is presented on the calibration of a 3D structured-light scanner.

A novel surface normal estimator is introduced using affine-invariant features extracted and trac... more A novel surface normal estimator is introduced using affine-invariant features extracted and tracked across multiple views. Normal estimation is robustified and integrated into our reconstruction pipeline that has increased accuracy compared to the State-of-the-Art. Parameters of the views and the obtained spatial model, including surface normals, are refined by a novel bundle adjustment-like numerical optimization. The process is an alternation with a novel robust view-dependent consistency check for surface normals, removing normals inconsistent with the multipleview track. Our algorithms are quantitatively validated on the reverse engineering of geometrical elements such as planes, spheres, or cylinders. It is shown here that the accuracy of the estimated surface properties is appropriate for object detection. The pipeline is also tested on the reconstruction of man-made and free-form objects.

IEEE transactions on image processing, Jul 1, 2019

An optimal, in the least squares sense, method is proposed to estimate surface normals in both st... more An optimal, in the least squares sense, method is proposed to estimate surface normals in both stereo and multi-view cases. The proposed algorithm exploits exclusively photometric information via affine correspondences and estimates the normal for each correspondence independently. The normal is obtained as a root of a quartic polynomial. Therefore, the processing time is negligible. Eliminating the outliers, we propose a robust extension of the algorithm that combines maximum likelihood estimation and iteratively re-weighted least squares. The method has been validated on both synthetic and publicly available real-world datasets. It is superior to the state of the art in terms of accuracy and processing time. Besides, we demonstrate two possible applications: 1) using our algorithm as the seed-point generation step of patch-based multi-view stereo method, the obtained reconstruction is more accurate, and the error of the 3D points is reduced by 30% on average and 2) multiplane fitting becomes more accurate applied to the resulting oriented point cloud.

arXiv (Cornell University), Mar 25, 2021

We propose a new algorithm for finding an unknown number of geometric models, e.g., homographies.... more We propose a new algorithm for finding an unknown number of geometric models, e.g., homographies. The problem is formalized as finding dominant model instances progressively without forming crisp point-to-model assignments. Dominant instances are found via a RANSAC-like sampling and a consolidation process driven by a model quality function considering previously proposed instances. New ones are found by clustering in the consensus space. This new formulation leads to a simple iterative algorithm with state-of-the-art accuracy while running in real-time on a number of vision problems-at least two orders of magnitude faster than the competitors on two-view motion estimation. Also, we propose a deterministic sampler reflecting the fact that real-world data tend to form spatially coherent structures. The sampler returns connected components in a progressively densified neighborhood-graph. We present a number of applications where the use of multiple geometric models improves accuracy. These include pose estimation from multiple generalized homographies; trajectory estimation of fast-moving objects; and we also propose a way of using multiple homographies in global SfM algorithms. Source code: https://github.com/ danini/clustering-in-consensus-space.

arXiv (Cornell University), May 1, 2019

The technique requires the epipolar geometry to be pre-estimated between each image pair. It expl... more The technique requires the epipolar geometry to be pre-estimated between each image pair. It exploits the constraints which the camera movement implies, in order to apply a closed-form correction to the parameters of the input affinities. Also, it is shown that the rotations and scales obtained by partially affine-covariant detectors, e.g. AKAZE or SIFT, can be completed to be full affine frames by the proposed algorithm. It is validated both in synthetic experiments and on publicly available real-world datasets that the method always improves the output of the evaluated affine-covariant feature detectors. As a by-product, these detectors are compared and the ones obtaining the most accurate affine frames are reported. For demonstrating the applicability, we show that the proposed technique as a pre-processing step improves the accuracy of pose estimation for a camera rig, surface normal and homography estimation.

Lecture Notes in Computer Science, 2020

We propose a new approach for combining deep-learned nonmetric monocular depth with affine corres... more We propose a new approach for combining deep-learned nonmetric monocular depth with affine correspondences (ACs) to estimate the relative pose of two calibrated cameras from a single correspondence. Considering the depth information and affine features, two new constraints on the camera pose are derived. The proposed solver is usable within 1-point RANSAC approaches. Thus, the processing time of the robust estimation is linear in the number of correspondences and, therefore, orders of magnitude faster than by using traditional approaches. The proposed 1AC+D 4 solver is tested both on synthetic data and on 110 395 publicly available real image pairs where we used an off-the-shelf monocular depth network to provide up-to-scale depth per pixel. The proposed 1AC+D leads to similar accuracy as traditional approaches while being significantly faster. When solving large-scale problems, e.g. pose-graph initialization for Structure-from-Motion (SfM) pipelines, the overhead of obtaining ACs and monocular depth is negligible compared to the speed-up gained in the pairwise geometric verification, i.e., relative pose estimation. This is demonstrated on scenes from the 1DSfM dataset using a state-of-the-art global SfM algorithm.

Sensors, Jul 3, 2018

As autonomous driving attracts more and more attention these days, the algorithms and sensors use... more As autonomous driving attracts more and more attention these days, the algorithms and sensors used for machine perception become popular in research, as well. This paper investigates the extrinsic calibration of two frequently-applied sensors: the camera and Light Detection and Ranging (LiDAR). The calibration can be done with the help of ordinary boxes. It contains an iterative refinement step, which is proven to converge to the box in the LiDAR point cloud, and can be used for system calibration containing multiple LiDARs and cameras. For that purpose, a bundle adjustment-like minimization is also presented. The accuracy of the method is evaluated on both synthetic and real-world data, outperforming the state-of-the-art techniques. The method is general in the sense that it is both LiDAR and camera-type independent, and only the intrinsic camera parameters have to be known. Finally, a method for determining the 2D bounding box of the car chassis from LiDAR point clouds is also presented in order to determine the car body border with respect to the calibrated sensors.

We propose ways to speed up the initial pose-graph generation for global Structure-from-Motion al... more We propose ways to speed up the initial pose-graph generation for global Structure-from-Motion algorithms. To avoid forming tentative point correspondences by FLANN and geometric verification by RANSAC, which are the most time-consuming steps of the pose-graph creation, we propose two new methods-built on the fact that image pairs usually are matched consecutively. Thus, candidate relative poses can be recovered from paths in the partly-built posegraph. We propose a heuristic for the A * traversal, considering global similarity of images and the quality of the posegraph edges. Given a relative pose from a path, descriptorbased feature matching is made "light-weight" by exploiting the known epipolar geometry. To speed up PROSACbased sampling when RANSAC is applied, we propose a third method to order the correspondences by their inlier probabilities from previous estimations. The algorithms are tested on 402 130 image pairs from the 1DSfM dataset and they speed up the feature matching 17 times and pose estimation 5 times.

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

arXiv (Cornell University), Mar 25, 2021

Recently, there has been remarkable growth of interest in the development and applications of Tim... more Recently, there has been remarkable growth of interest in the development and applications of Time-of-Flight (ToF) depth cameras. However, despite the permanent improvement of their characteristics, the practical applicability of ToF cameras is still limited by low resolution and quality of depth measurements. This has motivated many researchers to combine ToF cameras with other sensors in order to enhance and upsample depth images. In this paper, we compare ToF cameras to three image-based techniques for depth recovery, discuss the upsampling problem and survey the approaches that couple ToF depth images with high-resolution optical images. Other classes of upsampling methods are also mentioned.

Kivonat Klasszikus kamera-kalibrációs algoritmusok kizárólag pontok közötti összefüggéseket haszn... more Kivonat Klasszikus kamera-kalibrációs algoritmusok kizárólag pontok közötti összefüggéseket használnak, azonban a kalibrációs képek több információt tartalmaznak, mint néhány 2D koordinátapár. Ezen megközelítés alapján bemutatunk egy eljárást, amely képi vetületek közötti lokális a n összefüggéseket, illetve a kalibrációs objektum felületi normálisait képes felhasználni a kalibrációs probléma zárt alakú megoldására. A kezdetben becsült paramétereket numerikusan is nomítjuk. Az algoritmusaink pontosságának validálása szintetikus adatokon történik. Újszer¶ megközelítésünk valós alkalmazhatóságát egy 3D strukturált fény alapú szkenner kalibrációján keresztül mutatjuk be.

In this demo we present a system for creation and visualization of mixed reality by combining the... more In this demo we present a system for creation and visualization of mixed reality by combining the spatio-temporal model of a real outdoor environment with the models of people acting in a studio. We use a LIDAR sensor to measure a scene with walking pedestrians, detect and track them, then reconstruct the static scene part. The scene is then modified and populated by human avatars created in a 4D reconstruction studio.

In this paper, we introduce a complex approach on 4D reconstruction of dynamic scenarios containi... more In this paper, we introduce a complex approach on 4D reconstruction of dynamic scenarios containing multiple walking pedestrians. The input of the process is a point cloud sequence recorded by a rotating multi-beam Lidar sensor, which monitors the scene from a fixed position. The output is a geometrically reconstructed and textured scene containing moving 4D people models, which can follow in real time the trajectories of the walking pedestrians observed on the Lidar data flow. Our implemented system consists of four main steps. First, we separate foreground and background regions in each point cloud frame of the sequence by a robust probabilistic approach. Second, we perform moving pedestrian detection and tracking, so that among the point cloud regions classified as foreground, we separate the different objects, and assign the corresponding people positions to each other over the consecutive frames of the Lidar measurement sequence. Third, we geometrically reconstruct the ground, walls and further objects of the background scene, and texture the obtained models with photos taken from the scene. Fourth we insert into the scene textured 4D models of moving pedestrians which were preliminary created in a special 4D reconstruction studio. Finally, we integrate the system elements in a joint dynamic scene model and visualize the 4D scenario.

Computer Vision – ECCV 2020, 2020

Multi-device systems of cameras and various depth sensors are widely used these days in the indus... more Multi-device systems of cameras and various depth sensors are widely used these days in the industry. Some can also operate well in conditions where others cannot (e.g. active laser sensors compared to cameras). Various views and sensors of different modalities reveal valuable information about the environment, which is crucial for the robust operation of, e.g., detection or decision-making algorithms. The fusion of sensor data, calibration and multiple-view geometry are critical topics discussed in this dissertation. Sensors are usually placed in a common frame of reference determined by intrinsic and other parameters describing sensor alignment. During the process of calibration, all parameters can be estimated and tuned based on correspondences between sensor views. The mainstream computer vision methods solving geometric tasks use corresponding points across views as an input. Based on the correspondences, it is possible to estimate the underlying geometry of the views (i.e., ep...

2016 23rd International Conference on Pattern Recognition (ICPR), 2016

The technique requires the epipolar geometry to be pre-estimated between each image pair. It expl... more The technique requires the epipolar geometry to be pre-estimated between each image pair. It exploits the constraints which the camera movement implies, in order to apply a closed-form correction to the parameters of the input affinities. Also, it is shown that the rotations and scales obtained by partially affine-covariant detectors, e.g., AKAZE or SIFT, can be completed to be full affine frames by the proposed algorithm. It is validated both in synthetic experiments and on publicly available real-world datasets that the method always improves the output of the evaluated affine-covariant feature detectors. As a by-product, these detectors are compared and the ones obtaining the most accurate affine frames are reported. For demonstrating the applicability, we show that the proposed technique as a pre-processing step improves the accuracy of pose estimation for a camera rig, surface normal and homography estimation.

IEEE transactions on image processing, Jul 1, 2019

arXiv (Cornell University), Mar 25, 2021

arXiv (Cornell University), May 1, 2019

Lecture Notes in Computer Science, 2020

Sensors, Jul 3, 2018

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

arXiv (Cornell University), Mar 25, 2021

Computer Vision – ECCV 2020, 2020

2016 23rd International Conference on Pattern Recognition (ICPR), 2016

The technique requires the epipolar geometry to be pre-estimated between each image pair. It expl... more The technique requires the epipolar geometry to be pre-estimated between each image pair. It exploits the constraints which the camera movement implies, in order to apply a closed-form correction to the parameters of the input affinities. Also, it is shown that the rotations and scales obtained by partially affine-covariant detectors, e.g., AKAZE or SIFT, can be completed to be full affine frames by the proposed algorithm. It is validated both in synthetic experiments and on publicly available real-world datasets that the method always improves the output of the evaluated affine-covariant feature detectors. As a by-product, these detectors are compared and the ones obtaining the most accurate affine frames are reported. For demonstrating the applicability, we show that the proposed technique as a pre-processing step improves the accuracy of pose estimation for a camera rig, surface normal and homography estimation.