Gi-Mun Um - Academia.edu (original) (raw)
Papers by Gi-Mun Um
ITC-CSCC :International Technical Conference on Circuits Systems, Computers and Communications, 1997
Proceedings of SPIE, Jun 16, 2003
In stereoscopic television, there is a trade-off between visual comfort and 3D impact with respec... more In stereoscopic television, there is a trade-off between visual comfort and 3D impact with respect to the baseline-stretch of 3D camera. It has been reported that an optimal condition can be reached when we set the baseline-stretch at about the distance of human pupils1. However, we cannot get such distance in case that the sizes of the lens and CCD module are big. In order to overcome this limitation, we attempt to control the baseline-stretch of stereoscopic camera by synthesizing virtual views at the desired location of interval between two cameras. Proposed technique is based on the stereo matching and view synthesis techniques. We first obtain a dense disparity map using a hierarchical stereo matching with the edge-adaptive shifted window. And then we synthesize virtual views using the disparity map. Simulation results with various stereoscopic images demonstrate the effectiveness of the proposed technique.
Proceedings of SPIE, Mar 22, 2005
ABSTRACT This paper presents a novel multi-depth map fusion approach for the 3D scene reconstruct... more ABSTRACT This paper presents a novel multi-depth map fusion approach for the 3D scene reconstruction. Traditional stereo matching techniques that estimate disparities between two images often produce inaccurate depth map because of occlusion and homogeneous area. On the other hand, Depth map obtained from the depth camera is globally accurate but noisy and provides a limited depth range. In order to compensate pros and cons of these two methods, we propose a depth map fusion method that fuses the multi-depth maps from stereo matching and the depth camera. Using a 3-view camera system that includes a depth camera for the center-view, we first obtain 3-view images and a depth map from the center-view depth camera. Then we calculate camera parameters by camera calibration. Using the camera parameters, we rectify left and right-view images with respect to the center-view image for satisfying the well-known epipolar constraint. Using the center-view image as a reference, we obtain two depth maps by stereo matching between the center-left image pair and the center-right image pair. After preprocessing each depth map, we pick an appropriate depth value for each pixel from the processed depth maps based on the depth reliability. Simulation results obtained by our proposed method showed improvements in some background regions.
ITC-CSCC :International Technical Conference on Circuits Systems, Computers and Communications, Jul 1, 2004
Journal of Visual Communication and Image Representation, Jun 1, 2023
arXiv (Cornell University), Apr 19, 2023
This paper presents a simple yet powerful method for 3D human mesh reconstruction from a single R... more This paper presents a simple yet powerful method for 3D human mesh reconstruction from a single RGB image. Most recently, the non-local interactions of the whole mesh vertices have been effectively estimated in the transformer while the relationship between body parts also has begun to be handled via the graph model. Even though those approaches have shown the remarkable progress in 3D human mesh reconstruction, it is still difficult to directly infer the relationship between features, which are encoded from the 2D input image, and 3D coordinates of each vertex. To resolve this problem, we propose to design a simple feature sampling scheme. The key idea is to sample features in the embedded space by following the guide of points, which are estimated as projection results of 3D mesh vertices (i.e., ground truth). This helps the model to concentrate more on vertex-relevant features in the 2D space, thus leading to the reconstruction of the natural human pose. Furthermore, we apply progressive attention masking to precisely estimate local interactions between vertices even under severe occlusions. Experimental results on benchmark datasets show that the proposed method efficiently improves the performance of 3D human mesh reconstruction.
IEEE Transactions on Multimedia, 2022
Lecture Notes in Computer Science, 2022
Recently, there has been growing attention on an end-toend deep learning-based stitching model. H... more Recently, there has been growing attention on an end-toend deep learning-based stitching model. However, the most challenging point in deep learning-based stitching is to obtain pairs of input images with a narrow field of view and ground truth images with a wide field of view captured from real-world scenes. To overcome this difficulty, we develop a weakly-supervised learning mechanism to train the stitching model without requiring genuine ground truth images. In addition, we propose a stitching model that takes multiple real-world fisheye images as inputs and creates a 360 • output image in an equirectangular projection format. In particular, our model consists of color consistency corrections, warping, and blending, and is trained by perceptual and SSIM losses. The effectiveness of the proposed algorithm is verified on two real-world stitching datasets.
전자 정보 통신 학술 대회 (CEIC) 2019, 2019
전자 정보 통신 학술 대회 (CEIC) 2020, 2020
ITC-CSCC :International Technical Conference on Circuits Systems, Computers and Communications, 1997
Proceedings of SPIE, Jun 16, 2003
In stereoscopic television, there is a trade-off between visual comfort and 3D impact with respec... more In stereoscopic television, there is a trade-off between visual comfort and 3D impact with respect to the baseline-stretch of 3D camera. It has been reported that an optimal condition can be reached when we set the baseline-stretch at about the distance of human pupils1. However, we cannot get such distance in case that the sizes of the lens and CCD module are big. In order to overcome this limitation, we attempt to control the baseline-stretch of stereoscopic camera by synthesizing virtual views at the desired location of interval between two cameras. Proposed technique is based on the stereo matching and view synthesis techniques. We first obtain a dense disparity map using a hierarchical stereo matching with the edge-adaptive shifted window. And then we synthesize virtual views using the disparity map. Simulation results with various stereoscopic images demonstrate the effectiveness of the proposed technique.
Proceedings of SPIE, Mar 22, 2005
ABSTRACT This paper presents a novel multi-depth map fusion approach for the 3D scene reconstruct... more ABSTRACT This paper presents a novel multi-depth map fusion approach for the 3D scene reconstruction. Traditional stereo matching techniques that estimate disparities between two images often produce inaccurate depth map because of occlusion and homogeneous area. On the other hand, Depth map obtained from the depth camera is globally accurate but noisy and provides a limited depth range. In order to compensate pros and cons of these two methods, we propose a depth map fusion method that fuses the multi-depth maps from stereo matching and the depth camera. Using a 3-view camera system that includes a depth camera for the center-view, we first obtain 3-view images and a depth map from the center-view depth camera. Then we calculate camera parameters by camera calibration. Using the camera parameters, we rectify left and right-view images with respect to the center-view image for satisfying the well-known epipolar constraint. Using the center-view image as a reference, we obtain two depth maps by stereo matching between the center-left image pair and the center-right image pair. After preprocessing each depth map, we pick an appropriate depth value for each pixel from the processed depth maps based on the depth reliability. Simulation results obtained by our proposed method showed improvements in some background regions.
ITC-CSCC :International Technical Conference on Circuits Systems, Computers and Communications, Jul 1, 2004
Journal of Visual Communication and Image Representation, Jun 1, 2023
arXiv (Cornell University), Apr 19, 2023
This paper presents a simple yet powerful method for 3D human mesh reconstruction from a single R... more This paper presents a simple yet powerful method for 3D human mesh reconstruction from a single RGB image. Most recently, the non-local interactions of the whole mesh vertices have been effectively estimated in the transformer while the relationship between body parts also has begun to be handled via the graph model. Even though those approaches have shown the remarkable progress in 3D human mesh reconstruction, it is still difficult to directly infer the relationship between features, which are encoded from the 2D input image, and 3D coordinates of each vertex. To resolve this problem, we propose to design a simple feature sampling scheme. The key idea is to sample features in the embedded space by following the guide of points, which are estimated as projection results of 3D mesh vertices (i.e., ground truth). This helps the model to concentrate more on vertex-relevant features in the 2D space, thus leading to the reconstruction of the natural human pose. Furthermore, we apply progressive attention masking to precisely estimate local interactions between vertices even under severe occlusions. Experimental results on benchmark datasets show that the proposed method efficiently improves the performance of 3D human mesh reconstruction.
IEEE Transactions on Multimedia, 2022
Lecture Notes in Computer Science, 2022
Recently, there has been growing attention on an end-toend deep learning-based stitching model. H... more Recently, there has been growing attention on an end-toend deep learning-based stitching model. However, the most challenging point in deep learning-based stitching is to obtain pairs of input images with a narrow field of view and ground truth images with a wide field of view captured from real-world scenes. To overcome this difficulty, we develop a weakly-supervised learning mechanism to train the stitching model without requiring genuine ground truth images. In addition, we propose a stitching model that takes multiple real-world fisheye images as inputs and creates a 360 • output image in an equirectangular projection format. In particular, our model consists of color consistency corrections, warping, and blending, and is trained by perceptual and SSIM losses. The effectiveness of the proposed algorithm is verified on two real-world stitching datasets.
전자 정보 통신 학술 대회 (CEIC) 2019, 2019
전자 정보 통신 학술 대회 (CEIC) 2020, 2020