Satoshi Ikehata - Academia.edu (original) (raw)
Uploads
Papers by Satoshi Ikehata
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
360° cameras have gained popularity over the last few years. In this paper, we propose two fundam... more 360° cameras have gained popularity over the last few years. In this paper, we propose two fundamental techniques—Field-of-View IoU (FoV-IoU) and 360Augmentation for object detection in 360° images. Although most object detection neural networks designed for the perspective images are applicable to 360° images in equirectangular projection (ERP) format, their performance deteriorates owing to the distortion in ERP images. Our method can be readily integrated with existing perspective object detectors and significantly improves the performance. The FoV-IoU computes the intersection-overunion of two Field-of-View bounding boxes in a spherical image which could be used for training, inference, and evaluation while 360Augmentation is a data augmentation technique specific to 360° object detection task which randomly rotates a spherical image and solves the bias due to the sphere-to-plane projection. We conduct extensive experiments on the 360° indoor dataset with different types of pers...
ConstitUtion ofa database ofvection scenes in Japanese movies and animations and Experimental ass... more ConstitUtion ofa database ofvection scenes in Japanese movies and animations and Experimental assessments ofthem .
Lecture Notes in Computer Science, 2018
Most conventional photometric stereo algorithms inversely solve a BRDF-based image formation mode... more Most conventional photometric stereo algorithms inversely solve a BRDF-based image formation model. However, the actual imaging process is often far more complex due to the global light transport on the non-convex surfaces. This paper presents a photometric stereo network that directly learns relationships between the photometric stereo input and surface normals of a scene. For handling unordered, arbitrary number of input images, we merge all the input data to the intermediate representation called observation map that has a fixed shape, is able to be fed into a CNN. To improve both training and prediction, we take into account the rotational pseudo-invariance of the observation map that is derived from the isotropic constraint. For training the network, we create a synthetic photometric stereo dataset that is generated by a physics-based renderer, therefore the global light transport is considered. Our experimental results on both synthetic and real datasets show that our method outperforms conventional BRDF-based photometric stereo algorithms especially when scenes are highly non-convex.
Multimedia Tools and Applications
Cost-volume filtering (CVF) is one of the most widely used techniques for solving general multi-l... more Cost-volume filtering (CVF) is one of the most widely used techniques for solving general multi-labeling problems based on a Markov random field (MRF). However it is inefficient when the label space size (i.e., the number of labels) is large. This paper presents a coarse-to-fine strategy for cost-volume filtering that efficiently and accurately addresses multi-labeling problems with a large label space size. Based on the observation that true labels at the same coordinates in images of different scales are highly correlated, we truncate unimportant labels for cost-volume filtering by leveraging the labeling output of lower scales. Experimental results show that our algorithm achieves much higher efficiency than the original CVF method while maintaining a comparable level of accuracy. Although we performed experiments that deal with only stereo matching and optical flow estimation, the proposed method can be employed in many other applications because of the applicability of CVF to general discrete pixel-labeling problems based on an MRF.
The Journal of the Institute of Image Information and Television Engineers
Proceedings of the 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, Dec 1, 2012
ABSTRACT This paper present a practical depth-map refinement system designed for highly corrupted... more ABSTRACT This paper present a practical depth-map refinement system designed for highly corrupted multiple depth maps. We define a pixel-wise confidence measurement of depth value and apply the three-steps depth-map refinement scheme (i.e.confidence-based depth-map fusion, confidence-weighted bundle optimization and super-pixel-based planar propagation) to maximize the whole reliability of depth maps. Our experimental result shows that our refinement algorithm can dramatically improve highly corrupted depth maps acquired by previous approaches.
2015 IEEE International Conference on Computer Vision (ICCV), 2015
Depth maps captured by multiple sensors often suffer from poor resolution and missing pixels caus... more Depth maps captured by multiple sensors often suffer from poor resolution and missing pixels caused by low reflectivity and occlusions in the scene. To address these problems, we propose a combined framework of patch-based inpainting and super-resolution. Unlike previous works, which relied solely on depth information, we explicitly take advantage of the internal statistics of a depth map and a registered highresolution texture image that capture the same scene. We account these statistics to locate non-local patches for hole filling and constrain the sparse coding-based super-resolution problem. Extensive evaluations are performed and show the state-of-the-art performance when using real-world datasets.
This paper presents a photometric stereo method that is purely pixelwise and handles general isot... more This paper presents a photometric stereo method that is purely pixelwise and handles general isotropic surfaces in a stable manner. Following the recently proposed sum-of-lobes representation of the isotropic reflectance function, we constructed a constrained bivariate regression problem where the regression function is approximated by smooth, bivariate Bernstein polynomials. The unknown normal vector was separated from the unknown reflectance function by considering the inverse representation of the image formation process, and then we could accurately compute the unknown surface normals by solving a simple and efficient quadratic programming problem. Extensive evaluations that showed the state-of-the-art performance using both synthetic and real-world images were performed.
2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014
This paper presents a photometric stereo method that is purely pixelwise and handles general isot... more This paper presents a photometric stereo method that is purely pixelwise and handles general isotropic surfaces in a stable manner. Following the recently proposed sumof-lobes representation of the isotropic reflectance function, we constructed a constrained bivariate regression problem where the regression function is approximated by smooth, bivariate Bernstein polynomials. The unknown normal vector was separated from the unknown reflectance function by considering the inverse representation of the image formation process, and then we could accurately compute the unknown surface normals by solving a simple and efficient quadratic programming problem. Extensive evaluations that showed the state-of-the-art performance using both synthetic and real-world images were performed.
2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012
ABSTRACT This paper presents a robust photometric stereo method that effectively compensates for ... more ABSTRACT This paper presents a robust photometric stereo method that effectively compensates for various non-Lambertian corruptions such as specularities, shadows, and image noise. We construct a constrained sparse regression problem that enforces both Lambertian, rank-3 structure and sparse, additive corruptions. A solution method is derived using a hierarchical Bayesian approximation to accurately estimate the surface normals while simultaneously separating the non-Lambertian corruptions. Extensive evaluations are performed that show state-of-the-art performance using both synthetic and real-world images.
2014 IEEE International Conference on Image Processing (ICIP), 2014
IVMSP 2013, 2013
ABSTRACT Depth maps captured by active sensors (e.g., ToF cameras and Kinect) typically suffer fr... more ABSTRACT Depth maps captured by active sensors (e.g., ToF cameras and Kinect) typically suffer from poor spatial resolution, considerable amount of noise, and missing data. To overcome these problems, we propose a novel depth map up-sampling method which increases the resolution of the original depth map while effectively suppressing aliasing artifacts. Assuming that a registered high-resolution texture image is available, the cost-volume filtering framework is applied to this problem. Our experiments show that cost-volume filtering can generate the high-resolution depth map accurately and efficiently while preserving discontinuous object boundaries, which is often a challenge when various state-of-the-art algorithms are applied.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000
Most conventional algorithms for non-Lambertian photometric stereo can be partitioned into two ca... more Most conventional algorithms for non-Lambertian photometric stereo can be partitioned into two categories. The first category is built upon stable outlier rejection techniques while assuming a dense Lambertian structure for the inliers, and thus performance degrades when general diffuse regions are present. The second utilizes complex reflectance representations and non-linear optimization over pixels to handle non-Lambertian surfaces, but does not explicitly account for shadows or other forms of corrupting outliers. In this paper, we present a purely pixel-wise photometric stereo method that stably and efficiently handles various non-Lambertian effects by assuming that appearances can be decomposed into a sparse, non-diffuse component (e.g., shadows, specularities, etc.) and a diffuse component represented by a monotonic function of the surface normal and lighting dot-product. This function is constructed using a piecewise linear approximation to the inverse diffuse model, leading to closed-form estimates of the surface normals and model parameters in the absence of non-diffuse corruptions. The latter are modeled as latent variables embedded within a hierarchical Bayesian model such that we may accurately compute the unknown surface normals while simultaneously separating diffuse from non-diffuse components. Extensive evaluations are performed that show state-of-the-art performance using both synthetic and real-world images.
IVMSP 2013, 2013
ABSTRACT Depth maps captured by active sensors (e.g., ToF cameras and Kinect) typically suffer fr... more ABSTRACT Depth maps captured by active sensors (e.g., ToF cameras and Kinect) typically suffer from poor spatial resolution, considerable amount of noise, and missing data. To overcome these problems, we propose a novel depth map up-sampling method which increases the resolution of the original depth map while effectively suppressing aliasing artifacts. Assuming that a registered high-resolution texture image is available, the cost-volume filtering framework is applied to this problem. Our experiments show that cost-volume filtering can generate the high-resolution depth map accurately and efficiently while preserving discontinuous object boundaries, which is often a challenge when various state-of-the-art algorithms are applied.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
360° cameras have gained popularity over the last few years. In this paper, we propose two fundam... more 360° cameras have gained popularity over the last few years. In this paper, we propose two fundamental techniques—Field-of-View IoU (FoV-IoU) and 360Augmentation for object detection in 360° images. Although most object detection neural networks designed for the perspective images are applicable to 360° images in equirectangular projection (ERP) format, their performance deteriorates owing to the distortion in ERP images. Our method can be readily integrated with existing perspective object detectors and significantly improves the performance. The FoV-IoU computes the intersection-overunion of two Field-of-View bounding boxes in a spherical image which could be used for training, inference, and evaluation while 360Augmentation is a data augmentation technique specific to 360° object detection task which randomly rotates a spherical image and solves the bias due to the sphere-to-plane projection. We conduct extensive experiments on the 360° indoor dataset with different types of pers...
ConstitUtion ofa database ofvection scenes in Japanese movies and animations and Experimental ass... more ConstitUtion ofa database ofvection scenes in Japanese movies and animations and Experimental assessments ofthem .
Lecture Notes in Computer Science, 2018
Most conventional photometric stereo algorithms inversely solve a BRDF-based image formation mode... more Most conventional photometric stereo algorithms inversely solve a BRDF-based image formation model. However, the actual imaging process is often far more complex due to the global light transport on the non-convex surfaces. This paper presents a photometric stereo network that directly learns relationships between the photometric stereo input and surface normals of a scene. For handling unordered, arbitrary number of input images, we merge all the input data to the intermediate representation called observation map that has a fixed shape, is able to be fed into a CNN. To improve both training and prediction, we take into account the rotational pseudo-invariance of the observation map that is derived from the isotropic constraint. For training the network, we create a synthetic photometric stereo dataset that is generated by a physics-based renderer, therefore the global light transport is considered. Our experimental results on both synthetic and real datasets show that our method outperforms conventional BRDF-based photometric stereo algorithms especially when scenes are highly non-convex.
Multimedia Tools and Applications
Cost-volume filtering (CVF) is one of the most widely used techniques for solving general multi-l... more Cost-volume filtering (CVF) is one of the most widely used techniques for solving general multi-labeling problems based on a Markov random field (MRF). However it is inefficient when the label space size (i.e., the number of labels) is large. This paper presents a coarse-to-fine strategy for cost-volume filtering that efficiently and accurately addresses multi-labeling problems with a large label space size. Based on the observation that true labels at the same coordinates in images of different scales are highly correlated, we truncate unimportant labels for cost-volume filtering by leveraging the labeling output of lower scales. Experimental results show that our algorithm achieves much higher efficiency than the original CVF method while maintaining a comparable level of accuracy. Although we performed experiments that deal with only stereo matching and optical flow estimation, the proposed method can be employed in many other applications because of the applicability of CVF to general discrete pixel-labeling problems based on an MRF.
The Journal of the Institute of Image Information and Television Engineers
Proceedings of the 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, Dec 1, 2012
ABSTRACT This paper present a practical depth-map refinement system designed for highly corrupted... more ABSTRACT This paper present a practical depth-map refinement system designed for highly corrupted multiple depth maps. We define a pixel-wise confidence measurement of depth value and apply the three-steps depth-map refinement scheme (i.e.confidence-based depth-map fusion, confidence-weighted bundle optimization and super-pixel-based planar propagation) to maximize the whole reliability of depth maps. Our experimental result shows that our refinement algorithm can dramatically improve highly corrupted depth maps acquired by previous approaches.
2015 IEEE International Conference on Computer Vision (ICCV), 2015
Depth maps captured by multiple sensors often suffer from poor resolution and missing pixels caus... more Depth maps captured by multiple sensors often suffer from poor resolution and missing pixels caused by low reflectivity and occlusions in the scene. To address these problems, we propose a combined framework of patch-based inpainting and super-resolution. Unlike previous works, which relied solely on depth information, we explicitly take advantage of the internal statistics of a depth map and a registered highresolution texture image that capture the same scene. We account these statistics to locate non-local patches for hole filling and constrain the sparse coding-based super-resolution problem. Extensive evaluations are performed and show the state-of-the-art performance when using real-world datasets.
This paper presents a photometric stereo method that is purely pixelwise and handles general isot... more This paper presents a photometric stereo method that is purely pixelwise and handles general isotropic surfaces in a stable manner. Following the recently proposed sum-of-lobes representation of the isotropic reflectance function, we constructed a constrained bivariate regression problem where the regression function is approximated by smooth, bivariate Bernstein polynomials. The unknown normal vector was separated from the unknown reflectance function by considering the inverse representation of the image formation process, and then we could accurately compute the unknown surface normals by solving a simple and efficient quadratic programming problem. Extensive evaluations that showed the state-of-the-art performance using both synthetic and real-world images were performed.
2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014
This paper presents a photometric stereo method that is purely pixelwise and handles general isot... more This paper presents a photometric stereo method that is purely pixelwise and handles general isotropic surfaces in a stable manner. Following the recently proposed sumof-lobes representation of the isotropic reflectance function, we constructed a constrained bivariate regression problem where the regression function is approximated by smooth, bivariate Bernstein polynomials. The unknown normal vector was separated from the unknown reflectance function by considering the inverse representation of the image formation process, and then we could accurately compute the unknown surface normals by solving a simple and efficient quadratic programming problem. Extensive evaluations that showed the state-of-the-art performance using both synthetic and real-world images were performed.
2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012
ABSTRACT This paper presents a robust photometric stereo method that effectively compensates for ... more ABSTRACT This paper presents a robust photometric stereo method that effectively compensates for various non-Lambertian corruptions such as specularities, shadows, and image noise. We construct a constrained sparse regression problem that enforces both Lambertian, rank-3 structure and sparse, additive corruptions. A solution method is derived using a hierarchical Bayesian approximation to accurately estimate the surface normals while simultaneously separating the non-Lambertian corruptions. Extensive evaluations are performed that show state-of-the-art performance using both synthetic and real-world images.
2014 IEEE International Conference on Image Processing (ICIP), 2014
IVMSP 2013, 2013
ABSTRACT Depth maps captured by active sensors (e.g., ToF cameras and Kinect) typically suffer fr... more ABSTRACT Depth maps captured by active sensors (e.g., ToF cameras and Kinect) typically suffer from poor spatial resolution, considerable amount of noise, and missing data. To overcome these problems, we propose a novel depth map up-sampling method which increases the resolution of the original depth map while effectively suppressing aliasing artifacts. Assuming that a registered high-resolution texture image is available, the cost-volume filtering framework is applied to this problem. Our experiments show that cost-volume filtering can generate the high-resolution depth map accurately and efficiently while preserving discontinuous object boundaries, which is often a challenge when various state-of-the-art algorithms are applied.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000
Most conventional algorithms for non-Lambertian photometric stereo can be partitioned into two ca... more Most conventional algorithms for non-Lambertian photometric stereo can be partitioned into two categories. The first category is built upon stable outlier rejection techniques while assuming a dense Lambertian structure for the inliers, and thus performance degrades when general diffuse regions are present. The second utilizes complex reflectance representations and non-linear optimization over pixels to handle non-Lambertian surfaces, but does not explicitly account for shadows or other forms of corrupting outliers. In this paper, we present a purely pixel-wise photometric stereo method that stably and efficiently handles various non-Lambertian effects by assuming that appearances can be decomposed into a sparse, non-diffuse component (e.g., shadows, specularities, etc.) and a diffuse component represented by a monotonic function of the surface normal and lighting dot-product. This function is constructed using a piecewise linear approximation to the inverse diffuse model, leading to closed-form estimates of the surface normals and model parameters in the absence of non-diffuse corruptions. The latter are modeled as latent variables embedded within a hierarchical Bayesian model such that we may accurately compute the unknown surface normals while simultaneously separating diffuse from non-diffuse components. Extensive evaluations are performed that show state-of-the-art performance using both synthetic and real-world images.
IVMSP 2013, 2013
ABSTRACT Depth maps captured by active sensors (e.g., ToF cameras and Kinect) typically suffer fr... more ABSTRACT Depth maps captured by active sensors (e.g., ToF cameras and Kinect) typically suffer from poor spatial resolution, considerable amount of noise, and missing data. To overcome these problems, we propose a novel depth map up-sampling method which increases the resolution of the original depth map while effectively suppressing aliasing artifacts. Assuming that a registered high-resolution texture image is available, the cost-volume filtering framework is applied to this problem. Our experiments show that cost-volume filtering can generate the high-resolution depth map accurately and efficiently while preserving discontinuous object boundaries, which is often a challenge when various state-of-the-art algorithms are applied.