Marc Pollefeys | Swiss Federal Institute of Technology (ETH) & University of Zurich (original) (raw)
Papers by Marc Pollefeys
From September 12th to 17th, 2010, the Dagstuhl Seminar 10371 ``Dynamic Maps '' was held ... more From September 12th to 17th, 2010, the Dagstuhl Seminar 10371 ``Dynamic Maps '' was held in Schloss Dagstuhl~--~Leibniz Center for Informatics. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available.
In recent years, the advent of car navigation systems has laid the ground for an entirely new ind... more In recent years, the advent of car navigation systems has laid the ground for an entirely new industry sector, consisting of map producers, car/ personal/ smart phone navigation manufacturers, and service providers. It has probably gone unnoticed that navigation systems mark a major change in the way we use maps. Partially, they are still just a replacement for traditional maps, providing a means to store and visualize a representation of the environment. In contrast to the traditional use of maps, however, navigation systems perform computations using the map's data structures, such as shortest route, map matching, and route guidance. That is, from an abstract point of view, part of the map is made for machine use only – the user has no direct access to it but rather is only presented the outcome of the computations.
2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021
2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021
This paper introduces a multi-view stereo matcher that generates depth in real-time from a monocu... more This paper introduces a multi-view stereo matcher that generates depth in real-time from a monocular video stream of a static scene. A key feature of our processing pipeline is that it estimates global camera gain changes in the feature tracking stage and efficiently compensates for these in the stereo stage without impacting the real-time performance. This is very important for outdoor applications where the brightness range often far exceeds the dynamic range of the camera. Real-time performance is achieved by leveraging the processing power of the graphics processing unit (GPU) in addition to the CPU. We demonstrate the effectiveness of our approach on videos of urban scenes recorded by a vehicle-mounted camera with auto-gain enabled.
Recovering motion information from input camera image sequences is a classic problem of computer ... more Recovering motion information from input camera image sequences is a classic problem of computer vision. Conventional approaches estimate motion from either dense optical flow or sparse feature correspondences identified across successive image frames. Among other things, performance depends on the accuracy of the feature detection, which can be problematic in scenes that exhibit view-dependent geometric or photometric behaviors such as occlusion, semitransparancy, specularity and curved reflections. Beyond feature measurements, researchers have also developed approaches that directly utilize appearance (intensity) measurements. Such appearance-based approaches eliminate the need for feature extraction and avoid the difficulty of identifying correspondences. However the simplicity of on-line processing of image features is usually traded for complexity in off-line modeling of the appearance function. Because the appearance function is typically very nonlinear, learning it usually re...
Procedings of the British Machine Vision Conference 2017, 2017
The 1D radial camera maps all points on a plane, containing the principal axis, onto the radial l... more The 1D radial camera maps all points on a plane, containing the principal axis, onto the radial line which is the intersection of that plane and the image plane. It is a sufficiently general model to express both central and non-central cameras, since the only assumption it makes is of known center of distortion. In this paper, we study the multi-focal tensors arising out of 1D radial cameras. There exist no two-view constraints (like the fundamental matrix) for 1D radial cameras. However, the 3-view and 4-view cases are interesting. For the 4-view case we have the radial quadrifocal tensor, which has 15 d.o.f and 2 internal constraints. For the 3-view case, we have the radial trifocal tensor, which has 7 d.o.f and no internal constraints. Under the assumption of a purely rotating central camera, this can be used to do a non-parametric estimation of the radial distortion of a 1D camera. Even in the case of a non-rotating camera it can be used to do parametric estimation, assuming a ...
utonomous microhelicopters will soon play a major role in tasks like search and rescue, environme... more utonomous microhelicopters will soon play a major role in tasks like search and rescue, environment monitoring, security surveillance, and inspection. If they are further realized in small scale, they can also be used in narrow outdoor and indoor environments and represent only a limited risk for people. However, for such operations, navigating based only on global positioning system (GPS) information is not sufficient. Fully autonomous operation in cities or other dense environments requires microhelicopters to fly at low altitudes, where GPS signals are often shadowed, or indoors and to actively explore unknown environments while avoiding collisions and creating maps. This involves a number of challenges on all levels of helicopter design, perception, actuation, control, and navigation, which still have to be solved. The Swarm of Micro Flying Robots (SFLY) project was a European Union–funded project with the goal of creating a swarm of vision-controlled microaerial vehicles (MAVs)...
Abstract—Given the growth of Internet photo collections we now have a visual index of all major c... more Abstract—Given the growth of Internet photo collections we now have a visual index of all major cities and tourist sites in the world. However, it is still a difficult task to capture that perfect shot with your own camera when visiting these places, especially when your camera itself has limitations, such as a limited field of view. In this paper, we propose a framework to overcome the imperfections of personal photos of tourist sites using the rich information provided by large scale Internet photo collections. Our method deploys state-of-the-art techniques for constructing initial 3D models from photo collections. The same techniques are then used to register personal photos to these models, allowing us to augment personal 2D images with 3D information. This strong available scene prior allows us to address a number of traditionally challenging image enhancement techniques, and achieve high quality results using simple and robust algorithms. Specifically, we demonstrate automatic...
The research on tracking templates or image patches in a sequence of images has been largely dom-... more The research on tracking templates or image patches in a sequence of images has been largely dom-inated by energy-minimization-based methods. How-ever,since its introduction in [22], the learning-based approach called Linear Predictors has proven to be an efficient and reliable alternative for template tracking, demonstrating superior tracking speed and robustness. But, their time intensive learning procedure prevented their use in applications where online learning is essen-tial. Indeed, [18] presented an iterative method to learn Linear Predictors; but it starts with a small template that makes it unstable at the beginning. Therefore, we propose three methods for highly ef-ficient learning of full-sized Linear Predictors – where the first one is based on Dimensionality Reduction us-ing the Discrete Cosine Transform; the second is based on an efficient reformulation of the learning equations; and, the third is a combination of both. They show dif-ferent characteristics with respect...
Given a picture taken somewhere in the world, automatic geo-localization of such an image is an e... more Given a picture taken somewhere in the world, automatic geo-localization of such an image is an extremely useful task especially for historical and forensic sciences, documentation purposes, organization of the world’s photographs and in-telligence applications. While tremendous progress has been made over the last years in visual location recognition within a single city, localization in natural environ-ments is much more difficult, since vegetation, illumination, seasonal changes make appearance-only approaches impractical. In this work, we target mountainous terrain and use digital elevation models to extract representations for fast visual database lookup. We propose an automated approach for very large scale visual localization that can efficiently exploit visual information (contours) and geometric constraints (consistent orientation) at the same time. We validate the system at the scale of Switzerland (40000km2) using over 1000 landscape query images with ground truth GPS pos...
Abstract—We present a supervised learning based method to estimate a per-pixel confidence for opt... more Abstract—We present a supervised learning based method to estimate a per-pixel confidence for optical flow vectors. Regions of low texture and pixels close to occlusion boundaries are known to be difficult for optical flow algorithms. Using a spatiotemporal feature vector, we estimate if a flow algorithm is likely to fail in a given region. Our method is not restricted to any specific class of flow algorithm, and does not make any scene specific assumptions. Additionally, we can combine the output of several computed flow fields from different algorithms and automatically select the best performing algorithm at each location. Our optical flow confidence measure allows one to achieve better overall results by discarding the most troublesome pixels. We illustrate the effectiveness of our method on four different optical flow algorithms over a variety of real and synthetic sequences. For algorithm selection, we achieve the top overall results on a large test set, and at times, surpasse...
Abstract—In this work we propose a dynamic scene model to provide information about the presence ... more Abstract—In this work we propose a dynamic scene model to provide information about the presence of salient motion in the scene, and that could be used for focusing the attention of a pan/tilt/zoom camera, or for background modeling purposes. Rather than proposing a set of saliency detectors, we define what we mean by salient motion, and propose a precise model for it. Detecting salient motion becomes equivalent to detecting a model change. We derive optimal online procedures to solve this problem, which enable a very fast implementation. Promising results show that our model can effectively detect salient motion even in severely cluttered scenes, and while a camera is panning and tilting. Keywords-dynamic scene modeling, PTZ camera, focus-of-attention, background modeling, sequential generalized likeli-hood ratio, model change detection, linear dynamical systems, dynamic textures. I.
We present a novel multi-baseline, multi-resolution stereo method, which varies the baseline and ... more We present a novel multi-baseline, multi-resolution stereo method, which varies the baseline and resolution pro-portionally to depth to obtain a reconstruction in which the depth error is constant. This is in contrast to traditional stereo, in which the error grows quadratically with depth, which means that the accuracy in the near range far exceeds that of the far range. This accuracy in the near range is unnecessarily high and comes at signi®cant computational cost. It is, however, non-trivial to reduce this without also reducing the accuracy in the far range. Many datasets, such as video captured from a moving camera, allow the baseline to be selected with signi®cant ¯exibility. By selecting an ap-propriate baseline and resolution (realized using an image pyramid), our algorithm computes a depthmap which has these properties: 1) the depth accuracy is constant over the reconstructed volume, 2) the computational effort is spread evenly over the volume, 3) the angle of triangulation...
In this work, we present a technique for robust estima-tion, which by explicitly incorporating th... more In this work, we present a technique for robust estima-tion, which by explicitly incorporating the inherent uncer-tainty of the estimation procedure, results in a more ecient robust estimation algorithm. In addition, we build on re-cent work in randomized model veri®cation, and use this to characterize the `non-randomness ' of a solution. The combination of these two strategies results in a robust esti-mation procedure that provides a signi®cant speed-up over existing RANSAC techniques, while requiring no prior in-formation to guide the sampling process. In particular, our algorithm requires, on average, 3-10 times fewer samples than standard RANSAC, which is in close agreement with theoretical predictions. The eciency of the algorithm is demonstrated on a selection of geometric estimation prob-lems.
Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, 2005
From September 12th to 17th, 2010, the Dagstuhl Seminar 10371 ``Dynamic Maps '' was held ... more From September 12th to 17th, 2010, the Dagstuhl Seminar 10371 ``Dynamic Maps '' was held in Schloss Dagstuhl~--~Leibniz Center for Informatics. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available.
In recent years, the advent of car navigation systems has laid the ground for an entirely new ind... more In recent years, the advent of car navigation systems has laid the ground for an entirely new industry sector, consisting of map producers, car/ personal/ smart phone navigation manufacturers, and service providers. It has probably gone unnoticed that navigation systems mark a major change in the way we use maps. Partially, they are still just a replacement for traditional maps, providing a means to store and visualize a representation of the environment. In contrast to the traditional use of maps, however, navigation systems perform computations using the map's data structures, such as shortest route, map matching, and route guidance. That is, from an abstract point of view, part of the map is made for machine use only – the user has no direct access to it but rather is only presented the outcome of the computations.
2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021
2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021
This paper introduces a multi-view stereo matcher that generates depth in real-time from a monocu... more This paper introduces a multi-view stereo matcher that generates depth in real-time from a monocular video stream of a static scene. A key feature of our processing pipeline is that it estimates global camera gain changes in the feature tracking stage and efficiently compensates for these in the stereo stage without impacting the real-time performance. This is very important for outdoor applications where the brightness range often far exceeds the dynamic range of the camera. Real-time performance is achieved by leveraging the processing power of the graphics processing unit (GPU) in addition to the CPU. We demonstrate the effectiveness of our approach on videos of urban scenes recorded by a vehicle-mounted camera with auto-gain enabled.
Recovering motion information from input camera image sequences is a classic problem of computer ... more Recovering motion information from input camera image sequences is a classic problem of computer vision. Conventional approaches estimate motion from either dense optical flow or sparse feature correspondences identified across successive image frames. Among other things, performance depends on the accuracy of the feature detection, which can be problematic in scenes that exhibit view-dependent geometric or photometric behaviors such as occlusion, semitransparancy, specularity and curved reflections. Beyond feature measurements, researchers have also developed approaches that directly utilize appearance (intensity) measurements. Such appearance-based approaches eliminate the need for feature extraction and avoid the difficulty of identifying correspondences. However the simplicity of on-line processing of image features is usually traded for complexity in off-line modeling of the appearance function. Because the appearance function is typically very nonlinear, learning it usually re...
Procedings of the British Machine Vision Conference 2017, 2017
The 1D radial camera maps all points on a plane, containing the principal axis, onto the radial l... more The 1D radial camera maps all points on a plane, containing the principal axis, onto the radial line which is the intersection of that plane and the image plane. It is a sufficiently general model to express both central and non-central cameras, since the only assumption it makes is of known center of distortion. In this paper, we study the multi-focal tensors arising out of 1D radial cameras. There exist no two-view constraints (like the fundamental matrix) for 1D radial cameras. However, the 3-view and 4-view cases are interesting. For the 4-view case we have the radial quadrifocal tensor, which has 15 d.o.f and 2 internal constraints. For the 3-view case, we have the radial trifocal tensor, which has 7 d.o.f and no internal constraints. Under the assumption of a purely rotating central camera, this can be used to do a non-parametric estimation of the radial distortion of a 1D camera. Even in the case of a non-rotating camera it can be used to do parametric estimation, assuming a ...
utonomous microhelicopters will soon play a major role in tasks like search and rescue, environme... more utonomous microhelicopters will soon play a major role in tasks like search and rescue, environment monitoring, security surveillance, and inspection. If they are further realized in small scale, they can also be used in narrow outdoor and indoor environments and represent only a limited risk for people. However, for such operations, navigating based only on global positioning system (GPS) information is not sufficient. Fully autonomous operation in cities or other dense environments requires microhelicopters to fly at low altitudes, where GPS signals are often shadowed, or indoors and to actively explore unknown environments while avoiding collisions and creating maps. This involves a number of challenges on all levels of helicopter design, perception, actuation, control, and navigation, which still have to be solved. The Swarm of Micro Flying Robots (SFLY) project was a European Union–funded project with the goal of creating a swarm of vision-controlled microaerial vehicles (MAVs)...
Abstract—Given the growth of Internet photo collections we now have a visual index of all major c... more Abstract—Given the growth of Internet photo collections we now have a visual index of all major cities and tourist sites in the world. However, it is still a difficult task to capture that perfect shot with your own camera when visiting these places, especially when your camera itself has limitations, such as a limited field of view. In this paper, we propose a framework to overcome the imperfections of personal photos of tourist sites using the rich information provided by large scale Internet photo collections. Our method deploys state-of-the-art techniques for constructing initial 3D models from photo collections. The same techniques are then used to register personal photos to these models, allowing us to augment personal 2D images with 3D information. This strong available scene prior allows us to address a number of traditionally challenging image enhancement techniques, and achieve high quality results using simple and robust algorithms. Specifically, we demonstrate automatic...
The research on tracking templates or image patches in a sequence of images has been largely dom-... more The research on tracking templates or image patches in a sequence of images has been largely dom-inated by energy-minimization-based methods. How-ever,since its introduction in [22], the learning-based approach called Linear Predictors has proven to be an efficient and reliable alternative for template tracking, demonstrating superior tracking speed and robustness. But, their time intensive learning procedure prevented their use in applications where online learning is essen-tial. Indeed, [18] presented an iterative method to learn Linear Predictors; but it starts with a small template that makes it unstable at the beginning. Therefore, we propose three methods for highly ef-ficient learning of full-sized Linear Predictors – where the first one is based on Dimensionality Reduction us-ing the Discrete Cosine Transform; the second is based on an efficient reformulation of the learning equations; and, the third is a combination of both. They show dif-ferent characteristics with respect...
Given a picture taken somewhere in the world, automatic geo-localization of such an image is an e... more Given a picture taken somewhere in the world, automatic geo-localization of such an image is an extremely useful task especially for historical and forensic sciences, documentation purposes, organization of the world’s photographs and in-telligence applications. While tremendous progress has been made over the last years in visual location recognition within a single city, localization in natural environ-ments is much more difficult, since vegetation, illumination, seasonal changes make appearance-only approaches impractical. In this work, we target mountainous terrain and use digital elevation models to extract representations for fast visual database lookup. We propose an automated approach for very large scale visual localization that can efficiently exploit visual information (contours) and geometric constraints (consistent orientation) at the same time. We validate the system at the scale of Switzerland (40000km2) using over 1000 landscape query images with ground truth GPS pos...
Abstract—We present a supervised learning based method to estimate a per-pixel confidence for opt... more Abstract—We present a supervised learning based method to estimate a per-pixel confidence for optical flow vectors. Regions of low texture and pixels close to occlusion boundaries are known to be difficult for optical flow algorithms. Using a spatiotemporal feature vector, we estimate if a flow algorithm is likely to fail in a given region. Our method is not restricted to any specific class of flow algorithm, and does not make any scene specific assumptions. Additionally, we can combine the output of several computed flow fields from different algorithms and automatically select the best performing algorithm at each location. Our optical flow confidence measure allows one to achieve better overall results by discarding the most troublesome pixels. We illustrate the effectiveness of our method on four different optical flow algorithms over a variety of real and synthetic sequences. For algorithm selection, we achieve the top overall results on a large test set, and at times, surpasse...
Abstract—In this work we propose a dynamic scene model to provide information about the presence ... more Abstract—In this work we propose a dynamic scene model to provide information about the presence of salient motion in the scene, and that could be used for focusing the attention of a pan/tilt/zoom camera, or for background modeling purposes. Rather than proposing a set of saliency detectors, we define what we mean by salient motion, and propose a precise model for it. Detecting salient motion becomes equivalent to detecting a model change. We derive optimal online procedures to solve this problem, which enable a very fast implementation. Promising results show that our model can effectively detect salient motion even in severely cluttered scenes, and while a camera is panning and tilting. Keywords-dynamic scene modeling, PTZ camera, focus-of-attention, background modeling, sequential generalized likeli-hood ratio, model change detection, linear dynamical systems, dynamic textures. I.
We present a novel multi-baseline, multi-resolution stereo method, which varies the baseline and ... more We present a novel multi-baseline, multi-resolution stereo method, which varies the baseline and resolution pro-portionally to depth to obtain a reconstruction in which the depth error is constant. This is in contrast to traditional stereo, in which the error grows quadratically with depth, which means that the accuracy in the near range far exceeds that of the far range. This accuracy in the near range is unnecessarily high and comes at signi®cant computational cost. It is, however, non-trivial to reduce this without also reducing the accuracy in the far range. Many datasets, such as video captured from a moving camera, allow the baseline to be selected with signi®cant ¯exibility. By selecting an ap-propriate baseline and resolution (realized using an image pyramid), our algorithm computes a depthmap which has these properties: 1) the depth accuracy is constant over the reconstructed volume, 2) the computational effort is spread evenly over the volume, 3) the angle of triangulation...
In this work, we present a technique for robust estima-tion, which by explicitly incorporating th... more In this work, we present a technique for robust estima-tion, which by explicitly incorporating the inherent uncer-tainty of the estimation procedure, results in a more ecient robust estimation algorithm. In addition, we build on re-cent work in randomized model veri®cation, and use this to characterize the `non-randomness ' of a solution. The combination of these two strategies results in a robust esti-mation procedure that provides a signi®cant speed-up over existing RANSAC techniques, while requiring no prior in-formation to guide the sampling process. In particular, our algorithm requires, on average, 3-10 times fewer samples than standard RANSAC, which is in close agreement with theoretical predictions. The eciency of the algorithm is demonstrated on a selection of geometric estimation prob-lems.
Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, 2005