A Semantic-based Scene segmentation using convolutional neural networks (original) (raw)
Related papers
Convolution-deconvolution architecture with the pyramid pooling module for semantic segmentation
Multimedia Tools and Applications, 2019
Recognizing the content of an image is an important challenge in machine vision. Semantic segmentation is one of the most important ways to overcome this challenge. It is utilized in different applications such as autonomous driving, indoor navigation, virtual or augmented reality systems, and recognition tasks. In this paper, a novel and practical deep fully convolutional neural network architecture was introduced for semantic pixel-wise segmentation termed as P-DecovNet. The proposed architecture combines the Convolution-Deconvolution Neural Network architecture with the Pyramid Pooling Module. In this project, the high-level features were extracted from the image using the Convolutional Neural Network. To reinforce the local information, the Pooling module was added to the architecture. CamVid road scene dataset was used to evaluate the performance of the P-DecovNet. With respect to different criteria (includingbut not limited to-accuracy and mIoU), the experimental results demonstrated that P-DecovNet practically has a good performance in the domain of Convolution-Deconvolution Network. To achieve such performance, this work uses a smaller number of training images with lesser iterations compared to the existing methods.
Deep-Learning-Based Approaches for Semantic Segmentation of Natural Scene Images: A Review
Electronics
The task of semantic segmentation holds a fundamental position in the field of computer vision. Assigning a semantic label to each pixel in an image is a challenging task. In recent times, significant advancements have been achieved in the field of semantic segmentation through the application of Convolutional Neural Networks (CNN) techniques based on deep learning. This paper presents a comprehensive and structured analysis of approximately 150 methods of semantic segmentation based on CNN within the last decade. Moreover, it examines 15 well-known datasets in the semantic segmentation field. These datasets consist of 2D and 3D image and video frames, including general, indoor, outdoor, and street scenes. Furthermore, this paper mentions several recent techniques, such as SAM, UDA, and common post-processing algorithms, such as CRF and MRF. Additionally, this paper analyzes the performance evaluation of reviewed state-of-the-art methods, pioneering methods, common backbone networks...
Multi-Scale Convolutional Architecture for Semantic Segmentation
2015
Advances in 3D sensing technologies have made the availability of RGB and Depth information easier than earlier which can greatly assist in the semantic segmentation of 2D scenes. There are many works in literature that perform semantic segmentation in such scenes, but few relates to the environment that possesses a high degree of clutter in general e.g. indoor scenes. In this paper, we explore the use of depth information along with RGB and deep convolutional network for indoor scene understanding through semantic labeling. Our work exploits the geocentric encoding of a depth image and uses a multi-scale deep convolutional neural network architecture that captures high and lowlevel features of a scene to generate rich semantic labels. We apply our method on indoor RGBD images from NYUD2 dataset [1] and achieve a competitive performance of 70.45 % accuracy in labeling four object classes compared with some prior approaches. The results show our system is capable of generating a pixe...
IJERT-A Survey on Semantic Segmentation using Deep Learning Techniques
International Journal of Engineering Research and Technology (IJERT), 2021
https://www.ijert.org/a-survey-on-semantic-segmentation-using-deep-learning-techniques https://www.ijert.org/research/a-survey-on-semantic-segmentation-using-deep-learning-techniques-IJERTCONV9IS05011.pdf Semantic segmentation is a challenging task in the field of computer vision. It is process of classifying each pixel belonging to a particular label. It has many challenging applications such as autonomous vehicles, human-computer interaction, robot navigation, medical research and so on, which motivates us to survey the different semantic segmentation architectures. Most of these methods have been built using the deep learning techniques. In this paper we made a review of some state-of-the-art Convolutional Neural Network(CNN) architectures such as AlexNet, GoogleNet, VGGNet, ResNet which form the basis for Semantic Segmentation. Further, we presenteddifferent semanticsegmentation architectures such as Fully Convolutional Network (FCN), ParseNet, Deconvolution Network, U-Net, Feature Pyramid Network(FPN), Mask R-CNN. Finally, we compared the performances of all these architectures.
Street Scene understanding via Semantic Segmentation Using Deep Learning
Maǧallaẗ al-handasaẗ wa-al-tiknūlūǧiyā, 2022
Scene classification is an essential conception task used by robotics for understanding the environment. The deep learning technique has been proved as a great role in the challenging scene understanding application. Using data augmentation to increase dataset size Using K-means clustering as a preprocessor for the input dataset The proposed hydride model is generated by combined two of the deep, deep neural networks as an xception and U-net models. Scene classification is an essential conception task used by robotics for understanding the environment. Like the street scene, the outdoor scene is composed of images with depth that has a greater variety than iconic object images. Image semantic segmentation is an important task for Autonomous driving and Mobile robotics applications because it introduces enormous information needed for safe navigation and complex reasoning. This paper provides a model for semantic segmentation of outdoor sense to classify each object in the scene. The proposed network model generates a hybrid model that combines U-NET with Xception networks to work on 2.5 dimensions cityscape dataset, which is used for 3D applications. This process contains two stages. The first is the pre-processing operation on the RGB-D dataset (data Augmentation and k-means cluster). The second stage designed the hybrid model, which achieves a pixel accuracy is 0.7874. The output module is generated using a computer with GPU memory NVIDIA GeForce RTX 2060 6G, programming with python 3.7.
A CNN Architecture for Efficient Semantic Segmentation of Street Scenes
2018 IEEE 8th International Conference on Consumer Electronics - Berlin (ICCE-Berlin), 2018
In recent years, representation learning approaches have disrupted many multimedia computing tasks. Among those approaches, deep convolutional neural networks (CNNs) have notably reached human level expertise on some constrained image classification tasks. Nonetheless, training CNNs from scratch for new task or simply new data turns out to be complex and time-consuming. Recently, transfer learning has emerged as an effective methodology for adapting pretrained CNNs to new data and classes, by only retraining the last classification layer. This paper focuses on improving this process, in order to better transfer knowledge between CNN architectures for faster trainings in the case of fine tuning for image classification. This is achieved by combining and transfering supplementary weights, based on similarity considerations between source and target classes. The study includes a comparison between semantic and content-based similarities, and highlights increased initial performances and training speed, along with superior long term performances when limited training samples are available.
The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation
2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017
State-of-the-art approaches for semantic image segmentation are built on Convolutional Neural Networks (CNNs). The typical segmentation architecture is composed of (a) a downsampling path responsible for extracting coarse semantic features, followed by (b) an upsampling path trained to recover the input image resolution at the output of the model and, optionally, (c) a post-processing module (e.g. Conditional Random Fields) to refine the model predictions. Recently, a new CNN architecture, Densely Connected Convolutional Networks (DenseNets), has shown excellent results on image classification tasks. The idea of DenseNets is based on the observation that if each layer is directly connected to every other layer in a feed-forward fashion then the network will be more accurate and easier to train. In this paper, we extend DenseNets to deal with the problem of semantic segmentation. We achieve state-of-the-art results on urban scene benchmark datasets such as CamVid and Gatech, without any further post-processing module nor pretraining. Moreover, due to smart construction of the model, our approach has much less parameters than currently published best entries for these datasets. Code
A Fast Panoptic Segmentation Network for Self-Driving Scene Understanding
Computer Systems Science and Engineering
In recent years, a gain in popularity and significance of science understanding has been observed due to the high paced progress in computer vision techniques and technologies. The primary focus of computer vision based scene understanding is to label each and every pixel in an image as the category of the object it belongs to. So it is required to combine segmentation and detection in a single framework. Recently many successful computer vision methods has been developed to aid scene understanding for a variety of real world application. Scene understanding systems typically involves detection and segmentation of different natural and manmade things. A lot of research has been performed in recent years, mostly with a focus on things (a well-defined objects that has shape, orientations and size) with a less focus on stuff classes (amorphous regions that are unclear and lack a shape, size or other characteristics Stuff region describes many aspects of scene, like type, situation, environment of scene etc. and hence can be very helpful in scene understanding. Existing methods for scene understanding still have to cover a challenging path to cope up with the challenges of computational time, accuracy and robustness for varying level of scene complexity. A robust scene understanding method has to effectively deal with imbalanced distribution of classes, overlapping objects, fuzzy object boundaries and poorly localized objects. The proposed method presents Panoptic Segmentation on Cityscapes Dataset. Mobilenet-V2 is used as a backbone for feature extraction that is pre-trained on ImageNet. MobileNet-V2 with state-of-art encoder-decoder architecture of DeepLabV3+ with some customization and optimization is employed Atrous convolution along with Spatial Pyramid Pooling are also utilized in the proposed method to make it more accurate and robust. Very promising and encouraging results have been achieved that indicates the potential of the proposed method for robust scene understanding in a fast and reliable way.
Global-and-Local Context Network for Semantic Segmentation of Street View Images
Sensors, 2020
Semantic segmentation of street view images is an important step in scene understanding for autonomous vehicle systems. Recent works have made significant progress in pixel-level labeling using Fully Convolutional Network (FCN) framework and local multi-scale context information. Rich global context information is also essential in the segmentation process. However, a systematic way to utilize both global and local contextual information in a single network has not been fully investigated. In this paper, we propose a global-and-local network architecture (GLNet) which incorporates global spatial information and dense local multi-scale context information to model the relationship between objects in a scene, thus reducing segmentation errors. A channel attention module is designed to further refine the segmentation results using low-level features from the feature map. Experimental results demonstrate that our proposed GLNet achieves 80.8% test accuracy on the Cityscapes test dataset...
ENET: A DEEP NEURAL NETWORK ARCHITECTURE FOR REAL-TIME SEMANTIC SEGMENTATION
The ability to perform pixel-wise semantic segmentation in real-time is of paramount importance in practical mobile applications. Recent deep neural networks aimed at this task have the disadvantage of requiring a large number of floating point operations and have long run-times that hinder their usability. In this paper, we propose a novel deep neural network architecture named ENet (efficient neural network), created specifically for tasks requiring low latency operation. ENet is up to 18× faster, requires 75× less FLOPs, has 79× less parameters, and provides similar or better accuracy to existing models. We have tested it on CamVid, Cityscapes and SUN datasets and report on comparisons with existing state-of-the-art methods, and the trade-offs between accuracy and processing time of a network. We present performance measurements of the proposed architecture on embedded systems and suggest possible software improvements that could make ENet even faster.