Weakly Supervised Semantic Segmentation by Multi Image Model (original) (raw)

Weakly Supervised Semantic Segmentation with a Multi-image Model

2011

We propose a novel method for weakly supervised semantic segmentation. Training images are labeled only by the classes they contain, not by their location in the image. On test images instead, the method predicts a class label for every pixel. Our main innovation is a multi-image model (MIM)-a graphical model for recovering the pixel labels of the training images. The model connects superpixels from all training images in a data-driven fashion, based on their appearance similarity. For generalizing to new test images ...

A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains

International Journal of Computer Vision, 2020

Recently proposed methods for weakly-supervised semantic segmentation have achieved impressive performance in predicting pixel classes despite being trained with only image labels which lack positional information. Because image annotations are cheaper and quicker to generate, weak supervision is more practical than full supervision for training segmentation algorithms. These methods have been predominantly developed to solve the background separation and partial segmentation problems presented by natural scene images and it is unclear whether they can be simply transferred to other domains with different characteristics, such as histopathology and satellite images, and still perform well. This paper evaluates state-of-the-art weakly-supervised semantic segmentation methods on natural scene, histopathology, and satellite image datasets and analyzes how to determine which method is most suitable for a given dataset. Our experiments indicate that histopathology and satellite images present a different set of problems for weakly-supervised semantic segmentation than natural scene images, such as ambiguous boundaries and class co-occurrence. Methods perform well for datasets they were developed on, but tend to perform poorly on other datasets. We present some practical techniques for these methods on unseen datasets and argue that more work is needed for a generalizable approach to weakly-supervised semantic segmentation. Our full code implementation is available on GitHub: https://github. com/lyndonchan/wsss-analysis.

Weakly-Supervised Image Semantic Segmentation Based on Superpixel Region Merging

Big Data and Cognitive Computing, 2019

In this paper, we propose a semantic segmentation method based on superpixel region merging and convolutional neural network (CNN), referred to as regional merging neural network (RMNN). Image annotation has always been an important role in weakly-supervised semantic segmentation. Most methods use manual labeling. In this paper, super-pixels with similar features are combined using the relationship between each pixel after super-pixel segmentation to form a plurality of super-pixel blocks. Rough predictions are generated by the fully convolutional networks (FCN) so that certain super-pixel blocks will be labeled. We perceive and find other positive areas in an iterative way through the marked areas. This reduces the feature extraction vector and reduces the data dimension due to super-pixels. The algorithm not only uses superpixel merging to narrow down the target’s range but also compensates for the lack of weakly-supervised semantic segmentation at the pixel level. In the training...

NoPeopleAllowed: The Three-Step Approach to Weakly Supervised Semantic Segmentation

ArXiv, 2020

We propose a novel approach to weakly supervised semantic segmentation, which consists of three consecutive steps. The first two steps extract high-quality pseudo masks from image-level annotated data, which are then used to train a segmentation model on the third step. The presented approach also addresses two problems in the data: class imbalance and missing labels. Using only image-level annotations as supervision, our method is capable of segmenting various classes and complex objects. It achieves 37.34 mean IoU on the test set, placing 3rd at the LID Challenge in the task of weakly supervised semantic segmentation.

Towards Weakly Supervised Semantic Segmentation by Means of Multiple Instance and Multitask Learning.

We address the task of learning a semantic segmentation from weakly supervised data. Our aim is to devise a system that predicts an object label for each pixel by making use of only image level labels during training -- the information whether a certain object is present or not in the image. Such coarse tagging of images is faster and easier to obtain as opposed to the tedious task of pixelwise labeling required in state of the art systems. We cast this task naturally as a multiple instance learning (MIL) problem. We use Semantic Texton Forest (STF) as the basic framework and extend it for the MIL setting. We make use of multitask learning (MTL) to regularize our solution. Here, an external task of geometric context estimation is used to improve on the task of semantic segmentation. We report experimental results on the MSRC21 and the very challenging VOC2007 datasets. On MSRC21 dataset we are able, by using 276 weakly labeled images, to achieve the performance of a supervised STF trained on pixelwise labeled training set of 56 images, which is a significant reduction in supervision needed.

Weakly Supervised Learning of Instance Segmentation With Inter-Pixel Relations

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019

This paper presents a novel approach for learning instance segmentation with image-level class labels as supervision. Our approach generates pseudo instance segmentation labels of training images, which are used to train a fully supervised model. For generating the pseudo labels, we first identify confident seed areas of object classes from attention maps of an image classification model, and propagate them to discover the entire instance areas with accurate boundaries. To this end, we propose IRNet, which estimates rough areas of individual instances and detects boundaries between different object classes. It thus enables to assign instance labels to the seeds and to propagate them within the boundaries so that the entire areas of instances can be estimated accurately. Furthermore, IRNet is trained with interpixel relations on the attention maps, thus no extra supervision is required. Our method with IRNet achieves an outstanding performance on the PASCAL VOC 2012 dataset, surpassing not only previous state-of-the-art trained with the same level of supervision, but also some of previous models relying on stronger supervision.

The Effect of Scene Context on Weakly Supervised Semantic Segmentation

2020 International Conference on Machine Vision and Image Processing (MVIP), 2020

Image semantic segmentation is parsing image into several partitions in such a way that each region of which involves a semantic concept. In a weakly supervised manner, since only image-level labels are available, discriminating objects from the background is challenging, and in some cases, much more difficult. More specifically, some objects which are commonly seen in one specific scene (e.g. "train" typically is seen on "railroad track") are much more likely to be confused. In this paper, we propose a method to add the target-specific scenes in order to overcome the aforementioned problem. Actually, we propose a scene recommender which suggests to add some specific scene contexts to the target dataset in order to train the model more accurately. It is notable that this idea could be a complementary part of the baselines of many other methods. The experiments validate the effectiveness of the proposed method for the objects for which the scene context is added.

Weakly Supervised Structured Output Learning for Semantic Segmentation

2012

Abstract We address the problem of weakly supervised semantic segmentation. The training images are labeled only by the classes they contain, not by their location in the image. On test images instead, the method must predict a class label for every pixel. Our goal is to enable segmentation algorithms to use multiple visual cues in this weakly supervised setting, analogous to what is achieved by fully supervised methods.

Built-in Foreground/Background Prior for Weakly-Supervised Semantic Segmentation

Computer Vision – ECCV 2016, 2016

Pixel-level annotations are expensive and time consuming to obtain. Hence, weak supervision using only image tags could have a significant impact in semantic segmentation. Recently, CNN-based methods have proposed to fine-tune pre-trained networks using image tags. Without additional information, this leads to poor localization accuracy. This problem, however, was alleviated by making use of objectness priors to generate foreground/background masks. Unfortunately these priors either require training pixel-level annotations/bounding boxes, or still yield inaccurate object boundaries. Here, we propose a novel method to extract markedly more accurate masks from the pre-trained network itself, forgoing external objectness modules. This is accomplished using the activations of the higher-level convolutional layers, smoothed by a dense CRF. We demonstrate that our method, based on these masks and a weakly-supervised loss, outperforms the state-of-the-art tag-based weakly-supervised semantic segmentation techniques. Furthermore, we introduce a new form of inexpensive weak supervision yielding an additional accuracy boost.