Jean Ponce - Academia.edu (original) (raw)
Papers by Jean Ponce
2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
This paper addresses the problem of copying an unknown assembly of primitives with known shape an... more This paper addresses the problem of copying an unknown assembly of primitives with known shape and appearance using information extracted from a single photograph by an off-the-shelf procedure for object detection and pose estimation. The proposed algorithm uses a simple combination of physical stability constraints, convex optimization and Monte Carlo tree search to plan assemblies as sequences of pick-andplace operations represented by STRIPS operators. It is efficient and, most importantly, robust to the errors in object detection and pose estimation unavoidable in any real robotic system. The proposed approach is demonstrated with thorough experiments on a UR5 manipulator.
2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021
Output of Dcraw Camera's ISP (hi-quality jpeg) Joint demosaick+single-image SR Proposed method Fi... more Output of Dcraw Camera's ISP (hi-quality jpeg) Joint demosaick+single-image SR Proposed method Figure 1: ×4 super-resolution results obtained from a burst of 30 raw images acquired with a handheld Panasonic Lumix GX9 camera at 12800 ISO for the top image and 25600 for the bottom image. Dcraw performs basic demosaicking.
2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021
We present an unsupervised learning framework for decomposing images into layers of automatically... more We present an unsupervised learning framework for decomposing images into layers of automatically discovered object models. Contrary to recent approaches that model image layers with autoencoder networks, we represent them as explicit transformations of a small set of prototypical images. Our model has three main components: (i) a set of object prototypes in the form of learnable images with a transparency channel, which we refer to as sprites; (ii) differentiable parametric functions predicting occlusions and transformation parameters necessary to instantiate the sprites in a given image; (iii) a layered image formation model with occlusion for compositing these instances into complete images including background. By jointly learning the sprites and occlusion/transformation predictors to reconstruct images, our approach not only yields accurate layered image decompositions, but also identifies object categories and instance parameters. We first validate our approach by providing results on par with the state of the art on standard multiobject synthetic benchmarks (Tetrominoes, Multi-dSprites, CLEVR6). We then demonstrate the applicability of our model to real images in tasks that include clustering (SVHN, GTSRB), cosegmentation (Weizmann Horse) and object discovery from unfiltered social network images. To the best of our knowledge, our approach is the first layered image decomposition algorithm that learns an explicit and shared concept of object type, and is robust enough to be applied to real images.
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific ... more HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
ArXiv, 2021
We address the problem of non-blind deblurring and demosaicking of noisy raw images. We adapt an ... more We address the problem of non-blind deblurring and demosaicking of noisy raw images. We adapt an existing learningbased approach to RGB image deblurring to handle raw images by introducing a new interpretable module that jointly demosaicks and deblurs them. We train this model on RGB images converted into raw ones following a realistic invertible camera pipeline. We demonstrate the effectiveness of this model over two-stage approaches stacking demosaicking and deblurring modules on quantitive benchmarks. We also apply our approach to remove a camera’s inherent blur (its colordependent point-spread function) from real images, in essence deblurring sharp images.
ArXiv, 2019
We propose a differentiable algorithm for image restoration inspired by the success of sparse mod... more We propose a differentiable algorithm for image restoration inspired by the success of sparse models and self-similarity priors for natural images. Our approach builds upon the concept of joint sparsity between groups of similar image patches, and we show how this simple idea can be implemented in a differentiable architecture, allowing end-to-end training. The algorithm has the advantage of being interpretable, performing sparse decompositions of image patches, while being more parameter efficient than recent deep learning methods. We evaluate our algorithm on grayscale and color denoising, where we achieve competitive results, and on demoisaicking, where we outperform the most recent state-of-the-art deep learning model with 47 times less parameters and a much shallower architecture.
ArXiv, 2020
We present a novel approach to image restoration that leverages ideas from localized structured p... more We present a novel approach to image restoration that leverages ideas from localized structured prediction and non-linear multi-task learning. We optimize a penalized energy function regularized by a sum of terms measuring the distance between patches to be restored and clean patches from an external database gathered beforehand. The resulting estimator comes with strong statistical guarantees leveraging local dependency properties of overlapping patches. We derive the corresponding algorithms for energies based on the mean-squared and Euclidean norm errors. Finally, we demonstrate the practical effectiveness of our model on different image restoration problems using standard benchmarks.
Computer Vision – ECCV 2020, 2020
Non-local self-similarity and sparsity principles have proven to be powerful priors for natural i... more Non-local self-similarity and sparsity principles have proven to be powerful priors for natural image modeling. We propose a novel differentiable relaxation of joint sparsity that exploits both principles and leads to a general framework for image restoration which is (1) trainable end to end, (2) fully interpretable, and (3) much more compact than competing deep learning architectures. We apply this approach to denoising, blind denoising, jpeg deblocking, and demosaicking, and show that, with as few as 100K parameters, its performance on several standard benchmarks is on par or better than state-of-the-art methods that may have an order of magnitude or more parameters.
2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019
In this paper, we propose a novel approach to two-view minimal-case relative pose problems based ... more In this paper, we propose a novel approach to two-view minimal-case relative pose problems based on homography with a common reference direction. We explore the rank-1 constraint on the difference between the Euclidean homography matrix and the corresponding rotation, and propose an efficient two-step solution for solving both the calibrated and partially calibrated (unknown focal length) problems. We derive new 3.5-point, 3.5-point, 4-point solvers for two cameras such that the two focal lengths are unknown but equal, one of them is unknown, and both are unknown and possibly different, respectively. We present detailed analyses and comparisons with existing 6-and 7-point solvers, including results with smart phone images
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020
We propose minimal solutions to relative pose estimation problem from two views sharing a common ... more We propose minimal solutions to relative pose estimation problem from two views sharing a common direction with unknown focal length. This is relevant for cameras equipped with an IMU (inertial measurement unit), e.g., smart phones, tablets. Similar to the 6-point algorithm for two cameras with unknown but equal focal lengths and 7point algorithm for two cameras with different and unknown focal lengths, we derive new 4-and 5-point algorithms for these two cases, respectively. The proposed algorithms can cope with coplanar points, which is a degenerate configuration for these 6-and 7-point counterparts. We present a detailed analysis and comparisons with the state of the art. Experimental results on both synthetic data and real images from a smart phone demonstrate the usefulness of the proposed algorithms.
Computer Vision – ECCV 2020, 2020
Non-blind image deblurring is typically formulated as a linear leastsquares problem regularized b... more Non-blind image deblurring is typically formulated as a linear leastsquares problem regularized by natural priors on the corresponding sharp picture's gradients, which can be solved, for example, using a half-quadratic splitting method with Richardson fixed-point iterations for its least-squares updates and a proximal operator for the auxiliary variable updates. We propose to precondition the Richardson solver using approximate inverse filters of the (known) blur and natural image prior kernels. Using convolutions instead of a generic linear preconditioner allows extremely efficient parameter sharing across the image, and leads to significant gains in accuracy and/or speed compared to classical FFT and conjugate-gradient methods. More importantly, the proposed architecture is easily adapted to learning both the preconditioner and the proximal operator using CNN embeddings. This yields a simple and efficient algorithm for non-blind image deblurring which is fully interpretable, can be learned end to end, and whose accuracy matches or exceeds the state of the art, quite significantly, in the non-uniform case.
Lecture Notes in Computer Science, 2018
A set of fundamental matrices relating pairs of cameras in some configuration can be represented ... more A set of fundamental matrices relating pairs of cameras in some configuration can be represented as edges of a "viewing graph". Whether or not these fundamental matrices are generically sufficient to recover the global camera configuration depends on the structure of this graph. We study characterizations of "solvable" viewing graphs, and present several new results that can be applied to determine which pairs of views may be used to recover all camera parameters. We also discuss strategies for verifying the solvability of a graph computationally.
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015
In this paper, we address the problem of estimating and removing non-uniform motion blur from a s... more In this paper, we address the problem of estimating and removing non-uniform motion blur from a single blurry image. We propose a deep learning approach to predicting the probabilistic distribution of motion blur at the patch level using a convolutional neural network (CNN). We further extend the candidate set of motion kernels predicted by the CNN using carefully designed image rotations. A Markov random field model is then used to infer a dense non-uniform motion blur field enforcing motion smoothness. Finally, motion blur is removed by a non-uniform deblurring model using patch-level image prior. Experimental evaluations show that our approach can effectively estimate and remove complex non-uniform motion blur that is not handled well by previous approaches.
Motion Deblurring
In this chapter we discuss modeling and removing spatially-variant blur from photographs. We desc... more In this chapter we discuss modeling and removing spatially-variant blur from photographs. We describe a compact global parameterization of camera shake blur, based on the 3D rotation of the camera during the exposure. Our model uses three-parameter homographies to connect camera motion to image motion and, by assigning weights to a set of these homographies, can be seen as a generalization of the standard, spatially-invariant convolutional model of image blur. As such we show how existing algorithms, designed for spatially-invariant deblurring, can be "upgraded" in a straightforward manner to handle spatially-variant blur instead. We demonstrate this with algorithms working on real images, showing results for blind estimation of blur parameters from single images, followed by non-blind image restoration using these parameters. Finally, we introduce an efficient approximation to the global model, which significantly reduces the computational cost of modeling the spatially-variant blur. By approximating the blur as locally-uniform, we can take advantage of fast Fourier-domain convolution and deconvolution, reducing the time required for blind deblurring by an order of magnitude.
Lecture Notes in Computer Science, 1994
We present a new approach to relative stereo and motion reconstruction from a discrete set of poi... more We present a new approach to relative stereo and motion reconstruction from a discrete set of point correspondences in completely uncalibrated pairs of images. This approach also yields new projective invariants, and we present some applications to object recognition. Finally, we introduce a new approach to camera self-calibration from two images which allows full metric reconstruction up to some unknown scale factor. We have implemented the proposed methods and present examples using real images.
2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012
Bottom-up, fully unsupervised segmentation remains a daunting challenge for computer vision. In t... more Bottom-up, fully unsupervised segmentation remains a daunting challenge for computer vision. In the cosegmentation context, on the other hand, the availability of multiple images assumed to contain instances of the same object classes provides a weak form of supervision that can be exploited by discriminative approaches. Unfortunately, most existing algorithms are limited to a very small number of images and/or object classes (typically two of each). This paper proposes a novel energy-minimization approach to cosegmentation that can handle multiple classes and a significantly larger number of images. The proposed cost function combines spectral-and discriminative-clustering terms, and it admits a probabilistic interpretation. It is optimized using an efficient EM method, initialized using a convex quadratic approximation of the energy. Comparative experiments show that the proposed approach matches or improves the state of the art on several standard datasets.
Lecture Notes in Computer Science, 2006
2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009
This paper addresses the problem of characterizing a general class of cameras under reasonable, "... more This paper addresses the problem of characterizing a general class of cameras under reasonable, "linear" assumptions. Concretely, we use the formalism and terminology of classical projective geometry to model cameras by two-parameter linear families of straight lines-that is, degenerate reguli (rank-3 families) and non-degenerate linear congruences (rank-4 families). This model captures both the general linear cameras of Yu and McMillan [16] and the linear oblique cameras of Pajdla [8]. From a geometric perspective, it affords a simple classification of all possible camera configurations. From an analytical viewpoint, it also provides a simple and unified methodology for deriving general formulas for projection and inverse projection, triangulation, and binocular and trinocular geometry.
Appropriate datasets are required at all stages of object recognition research, including learnin... more Appropriate datasets are required at all stages of object recognition research, including learning visual models of object and scene categories, detecting and localizing instances of these models in images, and evaluating the performance of recognition algorithms. Current datasets are lacking in several respects, and this paper discusses some of the lessons learned from existing efforts, as well as innovative ways to obtain very large and diverse annotated datasets. It also suggests a few criteria for gathering future datasets.
2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010
This paper presents a complete analytical characterization of a large class of central and non-ce... more This paper presents a complete analytical characterization of a large class of central and non-central imaging devices dubbed linear cameras by Ponce [9]. Pajdla [7] has shown that a subset of these, the oblique cameras, can be modelled by a certain type of linear map. We give here a full tabulation of all admissible maps that induce cameras in the general sense of Grossberg and Nayar [4], and show that these cameras are exactly the linear ones. Combining these two models with a new notion of intrinsic parameters and normalized coordinates for linear cameras allows us to give simple analytical formulas for direct and inverse projections. We also show that the epipolar geometry of any two linear cameras can be characterized by a fundamental matrix whose size is at most 6 × 6 when the cameras are uncalibrated, or by an essential matrix of size at most 4 × 4 when their internal parameters are known. Similar results hold for trinocular constraints.
2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
This paper addresses the problem of copying an unknown assembly of primitives with known shape an... more This paper addresses the problem of copying an unknown assembly of primitives with known shape and appearance using information extracted from a single photograph by an off-the-shelf procedure for object detection and pose estimation. The proposed algorithm uses a simple combination of physical stability constraints, convex optimization and Monte Carlo tree search to plan assemblies as sequences of pick-andplace operations represented by STRIPS operators. It is efficient and, most importantly, robust to the errors in object detection and pose estimation unavoidable in any real robotic system. The proposed approach is demonstrated with thorough experiments on a UR5 manipulator.
2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021
Output of Dcraw Camera's ISP (hi-quality jpeg) Joint demosaick+single-image SR Proposed method Fi... more Output of Dcraw Camera's ISP (hi-quality jpeg) Joint demosaick+single-image SR Proposed method Figure 1: ×4 super-resolution results obtained from a burst of 30 raw images acquired with a handheld Panasonic Lumix GX9 camera at 12800 ISO for the top image and 25600 for the bottom image. Dcraw performs basic demosaicking.
2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021
We present an unsupervised learning framework for decomposing images into layers of automatically... more We present an unsupervised learning framework for decomposing images into layers of automatically discovered object models. Contrary to recent approaches that model image layers with autoencoder networks, we represent them as explicit transformations of a small set of prototypical images. Our model has three main components: (i) a set of object prototypes in the form of learnable images with a transparency channel, which we refer to as sprites; (ii) differentiable parametric functions predicting occlusions and transformation parameters necessary to instantiate the sprites in a given image; (iii) a layered image formation model with occlusion for compositing these instances into complete images including background. By jointly learning the sprites and occlusion/transformation predictors to reconstruct images, our approach not only yields accurate layered image decompositions, but also identifies object categories and instance parameters. We first validate our approach by providing results on par with the state of the art on standard multiobject synthetic benchmarks (Tetrominoes, Multi-dSprites, CLEVR6). We then demonstrate the applicability of our model to real images in tasks that include clustering (SVHN, GTSRB), cosegmentation (Weizmann Horse) and object discovery from unfiltered social network images. To the best of our knowledge, our approach is the first layered image decomposition algorithm that learns an explicit and shared concept of object type, and is robust enough to be applied to real images.
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific ... more HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
ArXiv, 2021
We address the problem of non-blind deblurring and demosaicking of noisy raw images. We adapt an ... more We address the problem of non-blind deblurring and demosaicking of noisy raw images. We adapt an existing learningbased approach to RGB image deblurring to handle raw images by introducing a new interpretable module that jointly demosaicks and deblurs them. We train this model on RGB images converted into raw ones following a realistic invertible camera pipeline. We demonstrate the effectiveness of this model over two-stage approaches stacking demosaicking and deblurring modules on quantitive benchmarks. We also apply our approach to remove a camera’s inherent blur (its colordependent point-spread function) from real images, in essence deblurring sharp images.
ArXiv, 2019
We propose a differentiable algorithm for image restoration inspired by the success of sparse mod... more We propose a differentiable algorithm for image restoration inspired by the success of sparse models and self-similarity priors for natural images. Our approach builds upon the concept of joint sparsity between groups of similar image patches, and we show how this simple idea can be implemented in a differentiable architecture, allowing end-to-end training. The algorithm has the advantage of being interpretable, performing sparse decompositions of image patches, while being more parameter efficient than recent deep learning methods. We evaluate our algorithm on grayscale and color denoising, where we achieve competitive results, and on demoisaicking, where we outperform the most recent state-of-the-art deep learning model with 47 times less parameters and a much shallower architecture.
ArXiv, 2020
We present a novel approach to image restoration that leverages ideas from localized structured p... more We present a novel approach to image restoration that leverages ideas from localized structured prediction and non-linear multi-task learning. We optimize a penalized energy function regularized by a sum of terms measuring the distance between patches to be restored and clean patches from an external database gathered beforehand. The resulting estimator comes with strong statistical guarantees leveraging local dependency properties of overlapping patches. We derive the corresponding algorithms for energies based on the mean-squared and Euclidean norm errors. Finally, we demonstrate the practical effectiveness of our model on different image restoration problems using standard benchmarks.
Computer Vision – ECCV 2020, 2020
Non-local self-similarity and sparsity principles have proven to be powerful priors for natural i... more Non-local self-similarity and sparsity principles have proven to be powerful priors for natural image modeling. We propose a novel differentiable relaxation of joint sparsity that exploits both principles and leads to a general framework for image restoration which is (1) trainable end to end, (2) fully interpretable, and (3) much more compact than competing deep learning architectures. We apply this approach to denoising, blind denoising, jpeg deblocking, and demosaicking, and show that, with as few as 100K parameters, its performance on several standard benchmarks is on par or better than state-of-the-art methods that may have an order of magnitude or more parameters.
2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019
In this paper, we propose a novel approach to two-view minimal-case relative pose problems based ... more In this paper, we propose a novel approach to two-view minimal-case relative pose problems based on homography with a common reference direction. We explore the rank-1 constraint on the difference between the Euclidean homography matrix and the corresponding rotation, and propose an efficient two-step solution for solving both the calibrated and partially calibrated (unknown focal length) problems. We derive new 3.5-point, 3.5-point, 4-point solvers for two cameras such that the two focal lengths are unknown but equal, one of them is unknown, and both are unknown and possibly different, respectively. We present detailed analyses and comparisons with existing 6-and 7-point solvers, including results with smart phone images
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020
We propose minimal solutions to relative pose estimation problem from two views sharing a common ... more We propose minimal solutions to relative pose estimation problem from two views sharing a common direction with unknown focal length. This is relevant for cameras equipped with an IMU (inertial measurement unit), e.g., smart phones, tablets. Similar to the 6-point algorithm for two cameras with unknown but equal focal lengths and 7point algorithm for two cameras with different and unknown focal lengths, we derive new 4-and 5-point algorithms for these two cases, respectively. The proposed algorithms can cope with coplanar points, which is a degenerate configuration for these 6-and 7-point counterparts. We present a detailed analysis and comparisons with the state of the art. Experimental results on both synthetic data and real images from a smart phone demonstrate the usefulness of the proposed algorithms.
Computer Vision – ECCV 2020, 2020
Non-blind image deblurring is typically formulated as a linear leastsquares problem regularized b... more Non-blind image deblurring is typically formulated as a linear leastsquares problem regularized by natural priors on the corresponding sharp picture's gradients, which can be solved, for example, using a half-quadratic splitting method with Richardson fixed-point iterations for its least-squares updates and a proximal operator for the auxiliary variable updates. We propose to precondition the Richardson solver using approximate inverse filters of the (known) blur and natural image prior kernels. Using convolutions instead of a generic linear preconditioner allows extremely efficient parameter sharing across the image, and leads to significant gains in accuracy and/or speed compared to classical FFT and conjugate-gradient methods. More importantly, the proposed architecture is easily adapted to learning both the preconditioner and the proximal operator using CNN embeddings. This yields a simple and efficient algorithm for non-blind image deblurring which is fully interpretable, can be learned end to end, and whose accuracy matches or exceeds the state of the art, quite significantly, in the non-uniform case.
Lecture Notes in Computer Science, 2018
A set of fundamental matrices relating pairs of cameras in some configuration can be represented ... more A set of fundamental matrices relating pairs of cameras in some configuration can be represented as edges of a "viewing graph". Whether or not these fundamental matrices are generically sufficient to recover the global camera configuration depends on the structure of this graph. We study characterizations of "solvable" viewing graphs, and present several new results that can be applied to determine which pairs of views may be used to recover all camera parameters. We also discuss strategies for verifying the solvability of a graph computationally.
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015
In this paper, we address the problem of estimating and removing non-uniform motion blur from a s... more In this paper, we address the problem of estimating and removing non-uniform motion blur from a single blurry image. We propose a deep learning approach to predicting the probabilistic distribution of motion blur at the patch level using a convolutional neural network (CNN). We further extend the candidate set of motion kernels predicted by the CNN using carefully designed image rotations. A Markov random field model is then used to infer a dense non-uniform motion blur field enforcing motion smoothness. Finally, motion blur is removed by a non-uniform deblurring model using patch-level image prior. Experimental evaluations show that our approach can effectively estimate and remove complex non-uniform motion blur that is not handled well by previous approaches.
Motion Deblurring
In this chapter we discuss modeling and removing spatially-variant blur from photographs. We desc... more In this chapter we discuss modeling and removing spatially-variant blur from photographs. We describe a compact global parameterization of camera shake blur, based on the 3D rotation of the camera during the exposure. Our model uses three-parameter homographies to connect camera motion to image motion and, by assigning weights to a set of these homographies, can be seen as a generalization of the standard, spatially-invariant convolutional model of image blur. As such we show how existing algorithms, designed for spatially-invariant deblurring, can be "upgraded" in a straightforward manner to handle spatially-variant blur instead. We demonstrate this with algorithms working on real images, showing results for blind estimation of blur parameters from single images, followed by non-blind image restoration using these parameters. Finally, we introduce an efficient approximation to the global model, which significantly reduces the computational cost of modeling the spatially-variant blur. By approximating the blur as locally-uniform, we can take advantage of fast Fourier-domain convolution and deconvolution, reducing the time required for blind deblurring by an order of magnitude.
Lecture Notes in Computer Science, 1994
We present a new approach to relative stereo and motion reconstruction from a discrete set of poi... more We present a new approach to relative stereo and motion reconstruction from a discrete set of point correspondences in completely uncalibrated pairs of images. This approach also yields new projective invariants, and we present some applications to object recognition. Finally, we introduce a new approach to camera self-calibration from two images which allows full metric reconstruction up to some unknown scale factor. We have implemented the proposed methods and present examples using real images.
2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012
Bottom-up, fully unsupervised segmentation remains a daunting challenge for computer vision. In t... more Bottom-up, fully unsupervised segmentation remains a daunting challenge for computer vision. In the cosegmentation context, on the other hand, the availability of multiple images assumed to contain instances of the same object classes provides a weak form of supervision that can be exploited by discriminative approaches. Unfortunately, most existing algorithms are limited to a very small number of images and/or object classes (typically two of each). This paper proposes a novel energy-minimization approach to cosegmentation that can handle multiple classes and a significantly larger number of images. The proposed cost function combines spectral-and discriminative-clustering terms, and it admits a probabilistic interpretation. It is optimized using an efficient EM method, initialized using a convex quadratic approximation of the energy. Comparative experiments show that the proposed approach matches or improves the state of the art on several standard datasets.
Lecture Notes in Computer Science, 2006
2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009
This paper addresses the problem of characterizing a general class of cameras under reasonable, "... more This paper addresses the problem of characterizing a general class of cameras under reasonable, "linear" assumptions. Concretely, we use the formalism and terminology of classical projective geometry to model cameras by two-parameter linear families of straight lines-that is, degenerate reguli (rank-3 families) and non-degenerate linear congruences (rank-4 families). This model captures both the general linear cameras of Yu and McMillan [16] and the linear oblique cameras of Pajdla [8]. From a geometric perspective, it affords a simple classification of all possible camera configurations. From an analytical viewpoint, it also provides a simple and unified methodology for deriving general formulas for projection and inverse projection, triangulation, and binocular and trinocular geometry.
Appropriate datasets are required at all stages of object recognition research, including learnin... more Appropriate datasets are required at all stages of object recognition research, including learning visual models of object and scene categories, detecting and localizing instances of these models in images, and evaluating the performance of recognition algorithms. Current datasets are lacking in several respects, and this paper discusses some of the lessons learned from existing efforts, as well as innovative ways to obtain very large and diverse annotated datasets. It also suggests a few criteria for gathering future datasets.
2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010
This paper presents a complete analytical characterization of a large class of central and non-ce... more This paper presents a complete analytical characterization of a large class of central and non-central imaging devices dubbed linear cameras by Ponce [9]. Pajdla [7] has shown that a subset of these, the oblique cameras, can be modelled by a certain type of linear map. We give here a full tabulation of all admissible maps that induce cameras in the general sense of Grossberg and Nayar [4], and show that these cameras are exactly the linear ones. Combining these two models with a new notion of intrinsic parameters and normalized coordinates for linear cameras allows us to give simple analytical formulas for direct and inverse projections. We also show that the epipolar geometry of any two linear cameras can be characterized by a fundamental matrix whose size is at most 6 × 6 when the cameras are uncalibrated, or by an essential matrix of size at most 4 × 4 when their internal parameters are known. Similar results hold for trinocular constraints.