Revisiting CycleGAN for semi-supervised segmentation (original) (raw)

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

ArXiv, 2021

We introduce a method that allows to automatically segment images into semantically meaningful regions without human supervision. Derived regions are consistent across different images and coincide with human-defined semantic classes on some datasets. In cases where semantic regions might be hard for human to define and consistently label, our method is still able to find meaningful and consistent semantic classes. In our work, we use pretrained StyleGAN2 [1] generative model: clustering in the feature space of the generative model allows to discover semantic classes. Once classes are discovered, a synthetic dataset with generated images and corresponding segmentation masks can be created. After that a segmentation model is trained on the synthetic dataset and is able to generalize to real images. Additionally, by using CLIP [2] we are able to use prompts defined in a natural language to discover some desired semantic classes. We test our method on publicly available datasets and sh...

Pixel Level Data Augmentation for Semantic Image Segmentation Using Generative Adversarial Networks

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019

Semantic segmentation is one of the basic topics in computer vision, it aims to assign semantic labels to every pixel of an image. Unbalanced semantic label distribution could have a negative influence on segmentation accuracy. In this paper, we investigate using data augmentation approach to balance the semantic label distribution in order to improve segmentation performance. We propose using generative adversarial networks (GANs) to generate realistic images for improving the performance of semantic segmentation networks. Experimental results show that the proposed method can not only improve segmentation performance on those classes with low accuracy, but also obtain 1.3% to 2.1% increase in average segmentation accuracy. It shows that this augmentation method can boost accuracy and be easily applicable to any other segmentation models.

Generating Synthetic Training Datasets using Conditional Generative Adversarial Network to Improve Images Segmentation

Jurnal Aplikasi Statistika & Komputasi Statistik

Limited amount of training datasets in deep learning research could impact the accuracy of the resulting models. This situation can cause overfit, so the model cannot work correctly. A conditional Generative Adversarial Network (CGAN) was introduced to generate synthetic data by considering certain conditions. This study aims to generate additional synthetic training datasets to improve the accuracy of the object segmentation model of images. Firstly, we evaluated CGAN-based dataset generator accuracy against several open datasets. Then, we applied the generator to train two object segmentation models, i.e., FCN and CNN U-Net. Our evaluation shows that CGAN can generate synthetic datasets well. Complex datasets require more training iterations. It also improves the validation loss and validation accuracy of both segmentation models, although other metrics still need further improvement

SS-CPGAN: Self-Supervised Cut-and-Pasting Generative Adversarial Network for Object Segmentation

Sensors

This paper proposes a novel self-supervised based Cut-and-Paste GAN to perform foreground object segmentation and generate realistic composite images without manual annotations. We accomplish this goal by a simple yet effective self-supervised approach coupled with the U-Net discriminator. The proposed method extends the ability of the standard discriminators to learn not only the global data representations via classification (real/fake) but also learn semantic and structural information through pseudo labels created using the self-supervised task. The proposed method empowers the generator to create meaningful masks by forcing it to learn informative per-pixel and global image feedback from the discriminator. Our experiments demonstrate that our proposed method significantly outperforms the state-of-the-art methods on the standard benchmark datasets.

Extremely Weak Supervised Image-to-Image Translation for Semantic Segmentation

2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)

Recent advances in generative models and adversarial training have led to a flourishing image-to-image (I2I) translation literature. The current I2I translation approaches require training images from the two domains that are either all paired (supervised) or all unpaired (unsupervised). In practice, obtaining paired training data in sufficient quantities is often very costly and cumbersome. Therefore solutions that employ unpaired data, while less accurate, are largely preferred. In this paper, we aim to bridge the gap between supervised and unsupervised I2I translation, with application to semantic image segmentation. We build upon pix2pix and CycleGAN, state-of-the-art seminal I2I translation techniques. We propose a method to select (very few) paired training samples and achieve significant improvements in both supervised and unsupervised I2I translation settings over random selection. Further, we boost the performance by incorporating both (selected) paired and unpaired samples in the training process. Our experiments show that an extremely weak supervised I2I translation solution using only one paired training sample can achieve a quantitative performance much better than the unsupervised CycleGAN model, and comparable to that of the supervised pix2pix model trained on thousands of pairs.

Attentively Conditioned Generative Adversarial Network for Semantic Segmentation

IEEE Access

Generative Adversarial Network has proven to produce state-of-the-art results by framing a generative modeling task into a supervised learning problem. In this paper, we propose Attentively Conditioned Generative Adversarial Network (ACGAN) for semantic segmentation by designing a segmentor model that generates probability maps from images and a discriminator model which discriminates the segmentor's output from the ground truth labels. Additionally, we conditioned the discriminator's dual inputs with extra information as a conditional adversarial model such that, an attention obtained probability distribution of the segmentor's feature maps is incorporated, and the ground truth is also accompanied by a vector of the class label. We demonstrate that our proposed model can provide better semantic segmentation results while stabilizing the discriminator to model long-range dependencies as a result of the supplementary inputs to the network. The attention network particularly provides more insights by extracting cues from the feature locations, and alongside the class label vector, gives the model an advantage to inform better spectral sensitivity. Experiments on the PASCAL VOC 2012 and the CamVid datasets show that our adversarial training technique yields improved accuracy. INDEX TERMS Generative adversarial network, deep convolutional neural network, attention network, conditional gan, semantic segmentation, deep learning.

Instance Semantic Segmentation Benefits from Generative Adversarial Networks

2020

In design of instance segmentation networks that reconstruct masks, segmentation is often taken as its literal definition – assigning each pixel a label. This has led to thinking the problem as a template matching one with the goal of minimizing the loss between the reconstructed and the ground truth pixels. Rethinking reconstruction networks as a generator, we define the problem of predicting masks as a GANs game framework: A segmentation network generates the masks, and a discriminator network decides on the quality of the masks. To demonstrate this game, we show effective modifications on the general segmentation framework in Mask R-CNN. We find that playing the game in feature space is more effective than the pixel space leading to stable training between the discriminator and the generator, predicting object coordinates should be replaced by predicting contextual regions for objects, and overall the adversarial loss helps the performance and removes the need for any custom sett...

Deep Co-Training for Semi-Supervised Image Segmentation

Pattern Recognition

In this paper, we aim to improve the performance of semantic image segmentation in a semi-supervised setting where training is performed with a reduced set of annotated images and additional non-annotated images. We present a method based on an ensemble of deep segmentation models. Models are trained on subsets of the annotated data and use non-annotated images to exchange information with each other, similar to co-training. Diversity across models is enforced with the use of adversarial samples. We demonstrate the potential of our method on two challenging image segmentation problems, and illustrate its ability to share information between simultaneously trained models, while preserving their diversity. Results indicate clear advantages in terms of performance compared to recently proposed semi-supervised methods for segmentation.

ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019

Semantic segmentation is a key problem for many computer vision tasks. While approaches based on convolutional neural networks constantly break new records on different benchmarks, generalizing well to diverse testing environments remains a major challenge. In numerous real world applications, there is indeed a large gap between data distributions in train and test domains, which results in severe performance loss at run-time. In this work, we address the task of unsupervised domain adaptation in semantic segmentation with losses based on the entropy of the pixel-wise predictions. To this end, we propose two novel, complementary methods using (i) an entropy loss and (ii) an adversarial loss respectively. We demonstrate state-of-theart performance in semantic segmentation on two challenging "synthetic-2-real" setups 1 and show that the approach can also be used for detection.

The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017

State-of-the-art approaches for semantic image segmentation are built on Convolutional Neural Networks (CNNs). The typical segmentation architecture is composed of (a) a downsampling path responsible for extracting coarse semantic features, followed by (b) an upsampling path trained to recover the input image resolution at the output of the model and, optionally, (c) a post-processing module (e.g. Conditional Random Fields) to refine the model predictions. Recently, a new CNN architecture, Densely Connected Convolutional Networks (DenseNets), has shown excellent results on image classification tasks. The idea of DenseNets is based on the observation that if each layer is directly connected to every other layer in a feed-forward fashion then the network will be more accurate and easier to train. In this paper, we extend DenseNets to deal with the problem of semantic segmentation. We achieve state-of-the-art results on urban scene benchmark datasets such as CamVid and Gatech, without any further post-processing module nor pretraining. Moreover, due to smart construction of the model, our approach has much less parameters than currently published best entries for these datasets. Code