Zhou Xue - Profile on Academia.edu (original) (raw)
Related Authors
Indian Institute of Engineering Science and Technology, Shibpur
Uploads
Papers by Zhou Xue
Lecture Notes in Computer Science, 2022
Significant geometric structures can be compactly described by global wireframes in the estimatio... more Significant geometric structures can be compactly described by global wireframes in the estimation of 3D room layout from a single panoramic image. Based on this observation, we present an alternative approach to estimate the walls in 3D space by modeling long-range geometric patterns in a learnable Hough Transform block. We transform the image feature from a cubemap tile to the Hough space of a Manhattan world and directly map the feature to the geometric output. The convolutional layers not only learn the local gradient-like line features, but also utilize the global information to successfully predict occluded walls with a simple network structure. Unlike most previous work, the predictions are performed individually on each cubemap tile, and then assembled to get the layout estimation. Experimental results show that we achieve comparable results with recent state-of-the-art in prediction accuracy and performance. Code is available at .
2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021
Multi-view image denoising based on graphical model of surface patch
2010 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video, 2010
ABSTRACT The paper targets denoising of multi-view images with both intra-view and inter-view red... more ABSTRACT The paper targets denoising of multi-view images with both intra-view and inter-view redundancy exploited under the guidance of 3-D geometry constraints. A graphical model of surface patches from each view of the multi-view image sequence is proposed to model the redundancy more effectively and efficiently. Patches are clustered according to their similarities between each other measured by the geodesic distance on the graph. Noises are attenuated via Wiener filtering on the sparse representations transformed by DCT of these patches. The graphical model is first used in image denoising and outperforms the state-of-the-art de-noising methods on the multi-view image sequence because the model fits the feature of the two kinds of redundancy very well. Furthermore, the 3-D model reconstructed from multi-view images denoised by our method is more accurate and complete compared with those reconstructed from denoised images by other methods.
Consumer-grade plenoptic camera Lytro draws a lot of interest from both academic and industrial w... more Consumer-grade plenoptic camera Lytro draws a lot of interest from both academic and industrial world. However its low resolution in both spatial and angular domain prevents it from being used for fine and detailed light field acquisition. This paper proposes to use a plenoptic camera as an image scanner and perform light field stitching to increase the size of the acquired light field data. We consider a simplified plenoptic camera model comprising a pinhole camera moving behind a thin lens. Based on this model, we describe how to perform light field acquisition and stitching under two different scenarios: by camera translation or by camera translation and rotation. In both cases, we assume the camera motion to be known. In the case of camera translation, we show how the acquired light fields should be resampled to increase the spatial range and ultimately obtain a wider field of view. In the case of camera translation and rotation, the camera motion is calculated such that the light fields can be directly stitched and extended in the angular domain. Simulation results verify our approach and demonstrate the potential of the motion model for further light field applications such as registration and super-resolution.
Lecture Notes in Computer Science, 2022
Significant geometric structures can be compactly described by global wireframes in the estimatio... more Significant geometric structures can be compactly described by global wireframes in the estimation of 3D room layout from a single panoramic image. Based on this observation, we present an alternative approach to estimate the walls in 3D space by modeling long-range geometric patterns in a learnable Hough Transform block. We transform the image feature from a cubemap tile to the Hough space of a Manhattan world and directly map the feature to the geometric output. The convolutional layers not only learn the local gradient-like line features, but also utilize the global information to successfully predict occluded walls with a simple network structure. Unlike most previous work, the predictions are performed individually on each cubemap tile, and then assembled to get the layout estimation. Experimental results show that we achieve comparable results with recent state-of-the-art in prediction accuracy and performance. Code is available at .
2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021
Multi-view image denoising based on graphical model of surface patch
2010 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video, 2010
ABSTRACT The paper targets denoising of multi-view images with both intra-view and inter-view red... more ABSTRACT The paper targets denoising of multi-view images with both intra-view and inter-view redundancy exploited under the guidance of 3-D geometry constraints. A graphical model of surface patches from each view of the multi-view image sequence is proposed to model the redundancy more effectively and efficiently. Patches are clustered according to their similarities between each other measured by the geodesic distance on the graph. Noises are attenuated via Wiener filtering on the sparse representations transformed by DCT of these patches. The graphical model is first used in image denoising and outperforms the state-of-the-art de-noising methods on the multi-view image sequence because the model fits the feature of the two kinds of redundancy very well. Furthermore, the 3-D model reconstructed from multi-view images denoised by our method is more accurate and complete compared with those reconstructed from denoised images by other methods.
Consumer-grade plenoptic camera Lytro draws a lot of interest from both academic and industrial w... more Consumer-grade plenoptic camera Lytro draws a lot of interest from both academic and industrial world. However its low resolution in both spatial and angular domain prevents it from being used for fine and detailed light field acquisition. This paper proposes to use a plenoptic camera as an image scanner and perform light field stitching to increase the size of the acquired light field data. We consider a simplified plenoptic camera model comprising a pinhole camera moving behind a thin lens. Based on this model, we describe how to perform light field acquisition and stitching under two different scenarios: by camera translation or by camera translation and rotation. In both cases, we assume the camera motion to be known. In the case of camera translation, we show how the acquired light fields should be resampled to increase the spatial range and ultimately obtain a wider field of view. In the case of camera translation and rotation, the camera motion is calculated such that the light fields can be directly stitched and extended in the angular domain. Simulation results verify our approach and demonstrate the potential of the motion model for further light field applications such as registration and super-resolution.