samer jammal - Academia.edu (original) (raw)

Papers by samer jammal

Research paper thumbnail of Multiview video view synthesis and quality enhancement using convolutional neural networks

Multiview videos, which are recorded from different viewpoints by multiple synchronized cameras, ... more Multiview videos, which are recorded from different viewpoints by multiple synchronized cameras, provide an immersive experience of 3D scene perception and more realistic 3D viewing experience. However, this imposes an enormous load on the acquisition, storage, compression, and transmission of multiview video data. Consequentially, new and advanced 3D video technologies for efficient representation and transmission of multiview data are important aspects for the success of multiview applications. Various methods aiming at improving multiview video coding efficiency have been developed in this thesis, where convolutional neural networks are used as a core engine in these methods. The thesis includes two novel methods for accurate disparity estimation from stereo images. It proposes the use of convolutional neural networks with multi-scale correlation for disparity estimation. This method exploits the dependency between two feature maps by combining the benefits of using both a small correlating scale for fine details and a big scale for larger areas. Nevertheless, rendering accurate disparity maps for foreground and background objects with fine details in real scenarios is a challenging task. Thus, a framework with a three-stage strategy for the generation of high-quality disparity maps for both near and far objects is proposed. Furthermore, the current techniques for multiview data representation, even if they exploit interview correlation, require large storage size or bandwidth for transmission. Such bandwidth is almost linear with the number of transmitted views. To address this problem, we proposed a novel view synthesis method for multiview video systems. In this approach the intermediate views are solely represented using their edges while dropping their texture content. These texture contents can get synthesized using a convolutional neural network, by matching and exploiting the edges and other information in the central view. Experimental results verify the effectiveness of the proposed framework. Finally, highly compressed multiview videos produce severe quality degradation. Thus, it is necessary to enhance the visual quality of highly compressed views at the decoder side. Consequentially,a novel method for multiview quality enhancement that directly learns an end-to-end mapping between the low-quality and high-quality views and recovers the details of the low-quality view is proposed.

Research paper thumbnail of Multiview video quality enhancement without depth information

Signal Processing-image Communication, Jul 1, 2019

The past decade has witnessed fast development in multiview 3D video technologies, such as Three-... more The past decade has witnessed fast development in multiview 3D video technologies, such as Three-Dimensional Video (3DV), Virtual Reality (VR), and Free Viewpoint Video (FVV). However, large information redundancy and a vast amount of multiview video data needs to be stored or transmitted, which poses a serious problem for multiview video systems. Asymmetric multiview video compression can alleviate this problem by coding views with different qualities. Only several viewpoints are kept with high-quality and other views are highly compressed to low-quality. However, highly compressed views may incur severe quality degradation. Thus, it is necessary to enhance the visual quality of highly compressed views at the decoder side. Exploiting similarities among the multiview images is the key to efficiently reconstruct the multiview compressed views. In this paper, we propose a novel method for multiview quality enhancement, which directly learns an endto-end mapping between the low-quality and high-quality views and recovers the details of the low-quality view. The mapping process is realized using a deep convolutional neural network (MVENet). MVENet takes a low-quality image of one view and a high-quality image of another view of the same scene as inputs and outputs an enhanced image for the low-quality view. To the best of our knowledge, this is the first work for multiview video enhancement where neither a depth map nor a projected virtual view is required in the enhancement process. Experimental results on both computer graphic and real datasets demonstrate the effectiveness of the proposed approach with a peak signal-to-noise ratio (PSNR) gain of up to 2dB over lowquality compressed views using HEVC and up to 3.7dB over low-quality compressed views using JPEG on the benchmark Cityscapes.

Research paper thumbnail of Disparity Estimation Using Convolutional Neural Networks with Multi-scale Correlation

Lecture Notes in Computer Science, 2017

Disparity estimation is a long-standing task in computer vision and multiple approaches have been... more Disparity estimation is a long-standing task in computer vision and multiple approaches have been proposed to solve this problem. A recent work based on convolutional neural networks, which uses a correlation layer to perform the matching process, has achieved state-of-the-art results for the disparity estimation task. This correlation layer employs a single kernel unit which is not suitable for low texture content and repeated patterns. In this paper we tackle this problem by using a multi-scale correlation layer with several correlation kernels and different scales. The major target is to integrate the information of the local matching process by combining the benefits of using both a small correlating scale for fine details and bigger scales for larger areas. Furthermore, we investigate the training approach using horizontally elongated patches that fits the disparity estimation task. The results obtained demonstrate the benefits of the proposed approach on both synthetic and real images.

Research paper thumbnail of Disparity Estimation Using Convolutional Neural Networks with Multi-scale Correlation

Neural Information Processing, 2017

Disparity estimation is a long-standing task in computer vision and multiple approaches have been... more Disparity estimation is a long-standing task in computer vision and multiple approaches have been proposed to solve this problem. A recent work based on convolutional neural networks, which uses a correlation layer to perform the matching process, has achieved state-of-the-art results for the disparity estimation task. This correlation layer employs a single kernel unit which is not suitable for low texture content and repeated patterns. In this paper we tackle this problem by using a multi-scale correlation layer with several correlation kernels and different scales. The major target is to integrate the information of the local matching process by combining the benefits of using both a small correlating scale for fine details and bigger scales for larger areas. Furthermore, we investigate the training approach using horizontally elongated patches that fits the disparity estimation task. The results obtained demonstrate the benefits of the proposed approach on both synthetic and real images.

Research paper thumbnail of Multiview video quality enhancement without depth information

Signal Processing: Image Communication, 2019

The past decade has witnessed fast development in multiview 3D video technologies, such as Three-... more The past decade has witnessed fast development in multiview 3D video technologies, such as Three-Dimensional Video (3DV), Virtual Reality (VR), and Free Viewpoint Video (FVV). However, large information redundancy and a vast amount of multiview video data needs to be stored or transmitted, which poses a serious problem for multiview video systems. Asymmetric multiview video compression can alleviate this problem by coding views with different qualities. Only several viewpoints are kept with high-quality and other views are highly compressed to low-quality. However, highly compressed views may incur severe quality degradation. Thus, it is necessary to enhance the visual quality of highly compressed views at the decoder side. Exploiting similarities among the multiview images is the key to efficiently reconstruct the multiview compressed views. In this paper, we propose a novel method for multiview quality enhancement, which directly learns an endto-end mapping between the low-quality and high-quality views and recovers the details of the low-quality view. The mapping process is realized using a deep convolutional neural network (MVENet). MVENet takes a low-quality image of one view and a high-quality image of another view of the same scene as inputs and outputs an enhanced image for the low-quality view. To the best of our knowledge, this is the first work for multiview video enhancement where neither a depth map nor a projected virtual view is required in the enhancement process. Experimental results on both computer graphic and real datasets demonstrate the effectiveness of the proposed approach with a peak signal-to-noise ratio (PSNR) gain of up to 2dB over lowquality compressed views using HEVC and up to 3.7dB over low-quality compressed views using JPEG on the benchmark Cityscapes.

Research paper thumbnail of Multi-resolution for disparity estimation with convolutional neural networks

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2017

Estimation of stereovision disparity maps is important for many applications that require informa... more Estimation of stereovision disparity maps is important for many applications that require information about objects' position and geometry. For example, as depth surrogate, disparity maps are essential for objects' 3D shape reconstruction and indeed other applications that do require three-dimensional representation of a scene. Recently, deep learning (DL) methodology has enabled novel approaches for the disparity estimation with some focus on the real-time processing requirement that is critical for applications in robotics and autonomous navigation. Previously, that constraint was not always addressed. Furthermore, for robust disparity estimation the occlusion effects should be explicitly modelled. In the described method, the effective detection of occlusion regions is achieved through disparity estimation in both, forward and backward correspondence model with two matching deep subnetworks. These two subnetworks are trained jointly in a single training process. Initially the subnetworks are trained using simulated data with the know ground truth, then to improve generalisation properties the whole model is fine-tuned in an unsupervised fashion on real data. During the unsupervised training, the model is equipped with bilinear interpolation warping function to directly measure quality of the correspondence with the disparity maps estimated for both the left and right image. During this phase forward-backward consistency constraint loss function is also applied to regularise the disparity estimators for non-occluding pixels. The described network model computes, at the same time, the forward and backward disparity maps as well as corresponding occlusion masks. It showed improved results on simulated and real images with occluded objects, when compared with the results obtained without using the forward-backward consistency constraint loss function.

Research paper thumbnail of Multiview video view synthesis and quality enhancement using convolutional neural networks

Multiview videos, which are recorded from different viewpoints by multiple synchronized cameras, ... more Multiview videos, which are recorded from different viewpoints by multiple synchronized cameras, provide an immersive experience of 3D scene perception and more realistic 3D viewing experience. However, this imposes an enormous load on the acquisition, storage, compression, and transmission of multiview video data. Consequentially, new and advanced 3D video technologies for efficient representation and transmission of multiview data are important aspects for the success of multiview applications. Various methods aiming at improving multiview video coding efficiency have been developed in this thesis, where convolutional neural networks are used as a core engine in these methods. The thesis includes two novel methods for accurate disparity estimation from stereo images. It proposes the use of convolutional neural networks with multi-scale correlation for disparity estimation. This method exploits the dependency between two feature maps by combining the benefits of using both a small correlating scale for fine details and a big scale for larger areas. Nevertheless, rendering accurate disparity maps for foreground and background objects with fine details in real scenarios is a challenging task. Thus, a framework with a three-stage strategy for the generation of high-quality disparity maps for both near and far objects is proposed. Furthermore, the current techniques for multiview data representation, even if they exploit interview correlation, require large storage size or bandwidth for transmission. Such bandwidth is almost linear with the number of transmitted views. To address this problem, we proposed a novel view synthesis method for multiview video systems. In this approach the intermediate views are solely represented using their edges while dropping their texture content. These texture contents can get synthesized using a convolutional neural network, by matching and exploiting the edges and other information in the central view. Experimental results verify the effectiveness of the proposed framework. Finally, highly compressed multiview videos produce severe quality degradation. Thus, it is necessary to enhance the visual quality of highly compressed views at the decoder side. Consequentially,a novel method for multiview quality enhancement that directly learns an end-to-end mapping between the low-quality and high-quality views and recovers the details of the low-quality view is proposed.

Research paper thumbnail of Multiview video quality enhancement without depth information

Signal Processing-image Communication, Jul 1, 2019

The past decade has witnessed fast development in multiview 3D video technologies, such as Three-... more The past decade has witnessed fast development in multiview 3D video technologies, such as Three-Dimensional Video (3DV), Virtual Reality (VR), and Free Viewpoint Video (FVV). However, large information redundancy and a vast amount of multiview video data needs to be stored or transmitted, which poses a serious problem for multiview video systems. Asymmetric multiview video compression can alleviate this problem by coding views with different qualities. Only several viewpoints are kept with high-quality and other views are highly compressed to low-quality. However, highly compressed views may incur severe quality degradation. Thus, it is necessary to enhance the visual quality of highly compressed views at the decoder side. Exploiting similarities among the multiview images is the key to efficiently reconstruct the multiview compressed views. In this paper, we propose a novel method for multiview quality enhancement, which directly learns an endto-end mapping between the low-quality and high-quality views and recovers the details of the low-quality view. The mapping process is realized using a deep convolutional neural network (MVENet). MVENet takes a low-quality image of one view and a high-quality image of another view of the same scene as inputs and outputs an enhanced image for the low-quality view. To the best of our knowledge, this is the first work for multiview video enhancement where neither a depth map nor a projected virtual view is required in the enhancement process. Experimental results on both computer graphic and real datasets demonstrate the effectiveness of the proposed approach with a peak signal-to-noise ratio (PSNR) gain of up to 2dB over lowquality compressed views using HEVC and up to 3.7dB over low-quality compressed views using JPEG on the benchmark Cityscapes.

Research paper thumbnail of Disparity Estimation Using Convolutional Neural Networks with Multi-scale Correlation

Lecture Notes in Computer Science, 2017

Disparity estimation is a long-standing task in computer vision and multiple approaches have been... more Disparity estimation is a long-standing task in computer vision and multiple approaches have been proposed to solve this problem. A recent work based on convolutional neural networks, which uses a correlation layer to perform the matching process, has achieved state-of-the-art results for the disparity estimation task. This correlation layer employs a single kernel unit which is not suitable for low texture content and repeated patterns. In this paper we tackle this problem by using a multi-scale correlation layer with several correlation kernels and different scales. The major target is to integrate the information of the local matching process by combining the benefits of using both a small correlating scale for fine details and bigger scales for larger areas. Furthermore, we investigate the training approach using horizontally elongated patches that fits the disparity estimation task. The results obtained demonstrate the benefits of the proposed approach on both synthetic and real images.

Research paper thumbnail of Disparity Estimation Using Convolutional Neural Networks with Multi-scale Correlation

Neural Information Processing, 2017

Disparity estimation is a long-standing task in computer vision and multiple approaches have been... more Disparity estimation is a long-standing task in computer vision and multiple approaches have been proposed to solve this problem. A recent work based on convolutional neural networks, which uses a correlation layer to perform the matching process, has achieved state-of-the-art results for the disparity estimation task. This correlation layer employs a single kernel unit which is not suitable for low texture content and repeated patterns. In this paper we tackle this problem by using a multi-scale correlation layer with several correlation kernels and different scales. The major target is to integrate the information of the local matching process by combining the benefits of using both a small correlating scale for fine details and bigger scales for larger areas. Furthermore, we investigate the training approach using horizontally elongated patches that fits the disparity estimation task. The results obtained demonstrate the benefits of the proposed approach on both synthetic and real images.

Research paper thumbnail of Multiview video quality enhancement without depth information

Signal Processing: Image Communication, 2019

The past decade has witnessed fast development in multiview 3D video technologies, such as Three-... more The past decade has witnessed fast development in multiview 3D video technologies, such as Three-Dimensional Video (3DV), Virtual Reality (VR), and Free Viewpoint Video (FVV). However, large information redundancy and a vast amount of multiview video data needs to be stored or transmitted, which poses a serious problem for multiview video systems. Asymmetric multiview video compression can alleviate this problem by coding views with different qualities. Only several viewpoints are kept with high-quality and other views are highly compressed to low-quality. However, highly compressed views may incur severe quality degradation. Thus, it is necessary to enhance the visual quality of highly compressed views at the decoder side. Exploiting similarities among the multiview images is the key to efficiently reconstruct the multiview compressed views. In this paper, we propose a novel method for multiview quality enhancement, which directly learns an endto-end mapping between the low-quality and high-quality views and recovers the details of the low-quality view. The mapping process is realized using a deep convolutional neural network (MVENet). MVENet takes a low-quality image of one view and a high-quality image of another view of the same scene as inputs and outputs an enhanced image for the low-quality view. To the best of our knowledge, this is the first work for multiview video enhancement where neither a depth map nor a projected virtual view is required in the enhancement process. Experimental results on both computer graphic and real datasets demonstrate the effectiveness of the proposed approach with a peak signal-to-noise ratio (PSNR) gain of up to 2dB over lowquality compressed views using HEVC and up to 3.7dB over low-quality compressed views using JPEG on the benchmark Cityscapes.

Research paper thumbnail of Multi-resolution for disparity estimation with convolutional neural networks

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2017

Estimation of stereovision disparity maps is important for many applications that require informa... more Estimation of stereovision disparity maps is important for many applications that require information about objects' position and geometry. For example, as depth surrogate, disparity maps are essential for objects' 3D shape reconstruction and indeed other applications that do require three-dimensional representation of a scene. Recently, deep learning (DL) methodology has enabled novel approaches for the disparity estimation with some focus on the real-time processing requirement that is critical for applications in robotics and autonomous navigation. Previously, that constraint was not always addressed. Furthermore, for robust disparity estimation the occlusion effects should be explicitly modelled. In the described method, the effective detection of occlusion regions is achieved through disparity estimation in both, forward and backward correspondence model with two matching deep subnetworks. These two subnetworks are trained jointly in a single training process. Initially the subnetworks are trained using simulated data with the know ground truth, then to improve generalisation properties the whole model is fine-tuned in an unsupervised fashion on real data. During the unsupervised training, the model is equipped with bilinear interpolation warping function to directly measure quality of the correspondence with the disparity maps estimated for both the left and right image. During this phase forward-backward consistency constraint loss function is also applied to regularise the disparity estimators for non-occluding pixels. The described network model computes, at the same time, the forward and backward disparity maps as well as corresponding occlusion masks. It showed improved results on simulated and real images with occluded objects, when compared with the results obtained without using the forward-backward consistency constraint loss function.