Label Refinement Network for Coarse-to-Fine Semantic Segmentation (original) (raw)

Gated Feedback Refinement Network for Coarse-to-Fine Dense Semantic Image Labeling

ArXiv, 2018

Effective integration of local and global contextual information is crucial for semantic segmentation and dense image labeling. We develop two encoder-decoder based deep learning architectures to address this problem. We first propose a network architecture called Label Refinement Network (LRN) that predicts segmentation labels in a coarse-to-fine fashion at several spatial resolutions. In this network, we also define loss functions at several stages to provide supervision at different stages of training. However, there are limits to the quality of refinement possible if ambiguous information is passed forward. In order to address this issue, we also propose Gated Feedback Refinement Network (G-FRNet) that addresses this limitation. Initially, G-FRNet makes a coarse-grained prediction which it progressively refines to recover details by effectively integrating local and global contextual information during the refinement stages. This is achieved by gate units proposed in this work, ...

From image-level to pixel-level labeling with Convolutional Networks

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015

We are interested in inferring object segmentation by leveraging only object class information, and by considering only minimal priors on the object segmentation task. This problem could be viewed as a kind of weakly supervised segmentation task, and naturally fits the Multiple Instance Learning (MIL) framework: every training image is known to have (or not) at least one pixel corresponding to the image class label, and the segmentation task can be rewritten as inferring the pixels belonging to the class of the object (given one image, and its object class). We propose a Convolutional Neural Network-based model, which is constrained during training to put more weight on pixels which are important for classifying the image. We show that at test time, the model has learned to discriminate the right pixels well enough, such that it performs very well on an existing segmentation benchmark, by adding only few smoothing priors. Our system is trained using a subset of the Imagenet dataset and the segmentation experiments are performed on the challenging Pascal VOC dataset (with no fine-tuning of the model on Pascal VOC). Our model beats the state of the art results in weakly supervised object segmentation task by a large margin. We also compare the performance of our model with state of the art fully-supervised segmentation approaches.

Enhancing Semantic Segmentation throughCycle-consistent Label Propagation in Video

Research Square (Research Square), 2023

To perform semantic image segmentation using deep learning models, a significant quantity of data and meticulous manual annotation is necessary [1], and the process consumes a lot of resources, including time and money. To resolve such issues, we introduce a unique label propagation method that utilizes cycle consistency across time to propagate labels over longer time horizons with higher accuracy. Additionally, we acknowledge that dense pixel annotation is a noisy process [2], whether performed manually or automatically. To address this, we present a principled approach that accounts for label uncertainty when training with labels from multiple noisy labeling processes. We introduce two new approaches; Warp-Refine Propagation and Uncertainty-Aware Training, for improving label propagation and handling noisy labels, respectively, and support the process with quantitative and qualitative evaluations and theoretical justification. Our contributions are validated on the Cityscapes and ApolloScape datasets, where we achieve encouraging results. In later endeavors, the aim should be to expand such approaches to include other noisy augmentation processes like image-based rendering methods [3], thanks to the noisy label learning approach.

Semantic segmentation using reinforced fully convolutional densenet with multiscale kernel

Multimedia Tools and Applications, 2019

In recent years, semantic segmentation has become one of the most active tasks of the computer vision field. Its goal is to group image pixels into semantically meaningful regions. Deep learning methods, in particular those who use convolutional neural network (CNN), have shown a big success for the semantic segmentation task. In this paper, we will introduce a semantic segmentation system using a reinforced fully convolutional densenet with multiscale kernel prediction method. Our main contribution is to build an encoder-decoder based architecture where we increase the width of dense block in the encoder part by conducting recurrent connections inside the dense block. The resulting network structure is called wider dense block where each dense block takes not only the output of the previous layer but also the initial input of the dense block. These recurrent structure emulates the human brain system and helps to strengthen the extraction of the target features. As a result, our network becomes deeper and wider with no additional parameters used because of weights sharing. Moreover, a multiscale convolutional layer has been conducted after the last dense block of the decoder part to perform model averaging over different spatial scales and to provide a more flexible method. This proposed method has been evaluated on two semantic segmentation benchmarks: CamVid and Cityscapes. Our method outperforms many recent works from the state of the art.

On the Iterative Refinement of Densely Connected Representation Levels for Semantic Segmentation

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2018

State-of-the-art semantic segmentation approaches increase the receptive field of their models by using either a downsampling path composed of poolings/strided convolutions or successive dilated convolutions. However, it is not clear which operation leads to best results. In this paper, we systematically study the differences introduced by distinct receptive field enlargement methods and their impact on the performance of a novel architecture, called Fully Convolutional DenseResNet (FC-DRN). FC-DRN has a densely connected backbone composed of residual networks. Following standard image segmentation architectures, receptive field enlargement operations that change the representation level are interleaved among residual networks. This allows the model to exploit the benefits of both residual and dense connectivity patterns, namely: gradient flow, iterative refinement of representations, multi-scale feature combination and deep supervision. In order to highlight the potential of our model, we test it on the challenging CamVid urban scene understanding benchmark and make the following observations: 1) downsampling operations outperform dilations when the model is trained from scratch, 2) dilations are useful during the finetuning step of the model, 3) coarser representations require less refinement steps, and 4) ResNets (by model construction) are good regularizers, since they can reduce the model capacity when needed. Finally, we compare our architecture to alternative methods and report state-of-the-art result on the Camvid dataset, with at least twice fewer parameters.

Multi-Scale Convolutional Architecture for Semantic Segmentation

2015

Advances in 3D sensing technologies have made the availability of RGB and Depth information easier than earlier which can greatly assist in the semantic segmentation of 2D scenes. There are many works in literature that perform semantic segmentation in such scenes, but few relates to the environment that possesses a high degree of clutter in general e.g. indoor scenes. In this paper, we explore the use of depth information along with RGB and deep convolutional network for indoor scene understanding through semantic labeling. Our work exploits the geocentric encoding of a depth image and uses a multi-scale deep convolutional neural network architecture that captures high and lowlevel features of a scene to generate rich semantic labels. We apply our method on indoor RGBD images from NYUD2 dataset [1] and achieve a competitive performance of 70.45 % accuracy in labeling four object classes compared with some prior approaches. The results show our system is capable of generating a pixe...

Multiscale Fully Convolutional DenseNet for Semantic Segmentation

Journal of WSCG, 2018

In the computer vision field, semantic segmentation represents a very interesting task. Convolutional Neural Network methods have shown their great performances in comparison with other semantic segmentation methods. In this paper, we propose a multiscale fully convolutional DenseNet approach for semantic segmentation. Our approach is based on the successful fully convolutional DenseNet method. It is reinforced by integrating a multiscale kernel prediction after the last dense block which performs model averaging over different spatial scales and provides more flexibility of our network to presume more information. Experiments on two semantic segmentation benchmarks: CamVid and Cityscapes have shown the effectiveness of our approach which has outperformed many recent works.

CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020

State-of-the-art semantic segmentation methods were almost exclusively trained on images within a fixed resolution range. These segmentations are inaccurate for very high-resolution images since using bicubic upsampling of low-resolution segmentation does not adequately capture high-resolution details along object boundaries. In this paper, we propose a novel approach to address the highresolution segmentation problem without using any highresolution training data. The key insight is our CascadePSP network which refines and corrects local boundaries whenever possible. Although our network is trained with lowresolution segmentation data, our method is applicable to any resolution even for very high-resolution images larger than 4K. We present quantitative and qualitative studies on different datasets to show that CascadePSP can reveal pixel-accurate segmentation boundaries using our novel refinement module without any finetuning. Thus, our method can be regarded as class-agnostic. Finally, we demonstrate the application of our model to scene parsing in multi-class segmentation.

How deep learning is empowering semantic segmentation

Multimedia Tools and Applications

Semantic segmentation involves extracting meaningful information from images or input from a video or recording frames. It is the way to perform the extraction by checking pixels by pixel using a classification approach. It gives us more accurate and fine details from the data we need for further evaluation. Formerly, we had a few techniques based on some unsupervised learning perspectives or some conventional ways to do some image processing tasks. With the advent of time, techniques are improving, and we now have more improved and efficient methods for segmentation. Image segmentation is slightly simpler than semantic segmentation because of the technical perspective as semantic segmentation is pixels based. After that, the detected part based on the label will be masked and refer to the masked objects based on the classes we have defined with a relevant class name and the designated color. In this paper, we have reviewed almost all the supervised and unsupervised learning algorithms from scratch to advanced and more efficient algorithms that have been done for semantic segmentation. As far as deep learning is concerned, we have many techniques already developed until now. We have studied around 120 papers in this research area. We have concluded how deep learning is helping in solving the critical issues of semantic segmentation and gives us more efficient results. We have reviewed and comprehensively studied different surveys on semantic segmentation, specifically using deep learning.

DNS: A multi-scale deconvolution semantic segmentation network for joint detection and segmentation

MATEC Web of Conferences, 2019

Real-time semantic segmentation has become crucial in many applications such as medical image analysis and autonomous driving. In this paper, we introduce a single semantic segmentation network, called DNS, for joint object detection and segmentation task. We take advantage of multi-scale deconvolution mechanism to perform real time computations. To this goal, down-scale and up-scale streams are utilized to combine the multi-scale features for the final detection and segmentation task. By using the proposed DNS, not only the tradeoff between accuracy and cost but also the balance of detection and segmentation performance are settled. Experimental results for PASCAL VOC datasets show competitive performance for joint object detection and segmentation task.