UnetDVH-Linear: Linear Feature Segmentation by Dilated Convolution with Vertical and Horizontal Kernels (original) (raw)
Related papers
Deep residual coalesced convolutional network for efficient semantic road segmentation
IPSJ Transactions on Computer Vision and Applications, 2017
This paper proposes a deep learning-based efficient and compact solution for road scene segmentation problem, named deep residual coalesced convolutional network (RCC-Net). Initially, the RCC-Net performs dimensionality reduction to compress and extract relevant features, from which it is subsequently delivered to the encoder. The encoder adopts the residual network style for efficient model size. In the core of each residual network, three different convolutional layers are simultaneously coalesced for obtaining broader information. The decoder is then altered to upsample the encoder for pixel-wise mapping from the input images to the segmented output. Experimental results reveal the efficacy of the proposed network over the state-of-the-art methods and its capability to be deployed in an average system.
Efficient lightweight residual network for real-time road semantic segmentation
IAES International Journal of Artificial Intelligence (IJ-AI)
Intelligent transportation system (ITS) is currently one of the most discussed topics in scientific research. Actually, ITS offers advanced monitoring systems that include vehicle counting, pedestrian detection. Lately, convolutional neural networks (CNNs) are extensively used in computer vision tasks, including segmentation, classification, and detection. In fact, image semantic segmentation is a critical issue in computer vision applications. For example, self-driving vehicles require high accuracy with lower parameter requirements to segment the road scene objects in real-time. However, most related work focus on one side, accuracy or parameter requirements, which make CNN models difficult to use in real-time applications. In order to resolve this issue, we propose the efficient lightweight residual network (ELRNet), a novel and ELRNet, which is an asymmetrical encoder-decoder architecture. Indeed, in this network, we compare four varieties of the proposed factorized block, and t...
A new CNN-based semantic object segmentation for autonomous vehicles in urban traffic scenes
International journal of multimedia information retrieval, 2024
Semantic segmentation is the most important stage of making sense of the visual traffic scene for autonomous driving. In recent years, convolutional neural networks (CNN)-based methods for semantic segmentation of urban traffic scenes are among the trending studies. However, the methods developed in the studies carried out so far are insufficient in terms of accuracy performance criteria. In this study, a new CNN-based semantic segmentation method with higher accuracy performance is proposed. A new module, the Attentional Atrous Feature Pooling (AAFP) Module, has been developed for the proposed method. This module is located between the encoder and decoder in the general network structure and aims to obtain multiscale information and add attentional features to large and small objects. As a result of experimental tests with the CamVid data set, an accuracy value of approximately 2% higher was achieved with a mIoU value of 70.59% compared to other state-of-art methods. Therefore, the proposed method can semantically segment objects in the urban traffic scene better than other methods.
Multi-Feature View-Based Shallow Convolutional Neural Network for Road Segmentation
IEEE Access, 2020
This study presents a shallow and robust road segmentation model. The computer-aided real-time applications, like driver assistance, require real-time and accurate processing. Current studies use Deep Convolutional Neural Networks (DCNN) for road segmentation. However, DCNN requires high computational power and lots of labeled data to learn abstract features for deeper layers. The deeper the layer is, the more abstract information it tends to learn. Moreover, the prediction time of the DCNN network is an important aspect of autonomous vehicles. To overcome these issues, a Multi-feature View-based Shallow Convolutional Neural Network (MVS-CNN) is proposed that utilizes the abstract features extracted from the explicitly derived representations of the input image. Gradient information of the input image is used as additional channels to enhance the learning process of the proposed deep learning architecture. The multi-feature views are fed to a fully-connected neural network to accurately segment the road regions. The testing accuracy demonstrates that the proposed MVS-CNN achieved an improvement of 2.7% as compared to baseline CNN consisting of only RGB inputs. Furthermore, the comparison of the proposed method with the popular semantic segmentation network (SegNet) has shown that the proposed scheme performs better while being more efficient during training and evaluation. Unlike traditional segmentation techniques, which are based on the encoder-decoder architecture, the proposed MVS-CNN consists of only the encoder network. The proposed MVS-CNN has been trained and validated with two well-known datasets: the KITTI Vision Benchmark and the Cityscapes dataset. The results have been compared with the state-ofthe-art deep learning architectures. The proposed MVS-CNN outperforms and shows supremacy in terms of model accuracy, processing time, and segmentation accuracy. Based on the experimental results, the proposed architecture can be considered as an efficient road segmentation architecture for autonomous vehicle systems.
A Deep Learning-Based Semantic Segmentation Architecture for Autonomous Driving Applications
Wireless Communications and Mobile Computing
In recent years, the development of smart transportation has accelerated research on semantic segmentation as it is one of the most important problems in this area. A large receptive field has always been the center of focus when designing convolutional neural networks for semantic segmentation. A majority of recent techniques have used maxpooling to increase the receptive field of a network at an expense of decreasing its spatial resolution. Although this idea has shown improved results in object detection applications, however, when it comes to semantic segmentation, a high spatial resolution also needs to be considered. To address this issue, a new deep learning model, the M-Net is proposed in this paper which satisfies both high spatial resolution and a large enough receptive field while keeping the size of the model to a minimum. The proposed network is based on an encoder-decoder architecture. The encoder uses atrous convolution to encode the features at full resolution, and i...
2019
Recent researches on pixel-wise semantic segmentation use deep neural networks to improve accuracy and speed of these networks in order to increase the efficiency in practical applications such as automatic driving. These approaches have used deep architecture to predict pixel tags, but the obtained results seem to be undesirable. The reason for these unacceptable results is mainly due to the existence of max pooling operators, which reduces the resolution of the feature maps. In this paper, we present a convolutional neural network composed of encoder-decoder segments based on successful SegNet network. The encoder section has a depth of 2, which in the first part has 5 convolutional layers, in which each layer has 64 filters with dimensions of 3×3. In the decoding section, the dimensions of the decoding filters are adjusted according to the convolutions used at each step of the encoding. So, at each step, 64 filters with the size of 3×3 are used for coding where the weights of the...
A Brief Survey and an Application of Semantic Image Segmentation for Autonomous Driving
Handbook of Deep Learning Applications, 2019
Deep learning is a fast-growing machine learning approach to perceive and understand large amounts of data. In this paper, general information about the deep learning approach which is attracted much attention in the field of machine learning is given in recent years and an application about semantic image segmentation is carried out in order to help autonomous driving of autonomous vehicles. This application is implemented with Fully Convolutional Network (FCN) architectures obtained by modifying the Convolutional Neural Network (CNN) architectures based on deep learning. Experimental studies for the application are utilized 4 different FCN architectures named FCN-AlexNet, FCN-8s, FCN-16s and FCN-32s. For the experimental studies, FCNs are first trained separately and validation accuracies of these trained network models on the used dataset is compared. In addition, image segmentation inferences are visualized to take account of how precisely FCN architectures can segment objects.
IJERT-A Survey on Semantic Segmentation using Deep Learning Techniques
International Journal of Engineering Research and Technology (IJERT), 2021
https://www.ijert.org/a-survey-on-semantic-segmentation-using-deep-learning-techniques https://www.ijert.org/research/a-survey-on-semantic-segmentation-using-deep-learning-techniques-IJERTCONV9IS05011.pdf Semantic segmentation is a challenging task in the field of computer vision. It is process of classifying each pixel belonging to a particular label. It has many challenging applications such as autonomous vehicles, human-computer interaction, robot navigation, medical research and so on, which motivates us to survey the different semantic segmentation architectures. Most of these methods have been built using the deep learning techniques. In this paper we made a review of some state-of-the-art Convolutional Neural Network(CNN) architectures such as AlexNet, GoogleNet, VGGNet, ResNet which form the basis for Semantic Segmentation. Further, we presenteddifferent semanticsegmentation architectures such as Fully Convolutional Network (FCN), ParseNet, Deconvolution Network, U-Net, Feature Pyramid Network(FPN), Mask R-CNN. Finally, we compared the performances of all these architectures.
Convolution-deconvolution architecture with the pyramid pooling module for semantic segmentation
Multimedia Tools and Applications, 2019
Recognizing the content of an image is an important challenge in machine vision. Semantic segmentation is one of the most important ways to overcome this challenge. It is utilized in different applications such as autonomous driving, indoor navigation, virtual or augmented reality systems, and recognition tasks. In this paper, a novel and practical deep fully convolutional neural network architecture was introduced for semantic pixel-wise segmentation termed as P-DecovNet. The proposed architecture combines the Convolution-Deconvolution Neural Network architecture with the Pyramid Pooling Module. In this project, the high-level features were extracted from the image using the Convolutional Neural Network. To reinforce the local information, the Pooling module was added to the architecture. CamVid road scene dataset was used to evaluate the performance of the P-DecovNet. With respect to different criteria (includingbut not limited to-accuracy and mIoU), the experimental results demonstrated that P-DecovNet practically has a good performance in the domain of Convolution-Deconvolution Network. To achieve such performance, this work uses a smaller number of training images with lesser iterations compared to the existing methods.
ESSN: Enhanced Semantic Segmentation Network by Residual Concatenation of Feature Maps
IEEE Access, 2020
Semantic segmentation performs pixel-level classification of multiple classes in the input image. Previous studies on semantic segmentation have used various methods such as multi-scale image, encoder-decoder, attention, spatial pyramid pooling, conditional random field, and generative models. However, the contexts of various sizes and types in diverse environments make their performance limited in robustly detecting and classifying objects. To address this problem, we propose an enhanced semantic segmentation network (ESSN) robust to various objects, contexts, and environments. The ESSN can extract multi-scale information well by concatenating the residual feature maps with various receptive fields extracted from sequential convolution blocks, and it can improve the performance of semantic segmentation without additional modules such as loss or attention during the training process. We performed the experiments with two open databases, the Stanford background dataset (SBD) and Cambridge-driving labeled video database (CamVid). Experimental results demonstrated the pixel acc. of 92.74%, class acc. of 79.66%, and mIoU of 71.67% with CamVid, and pixel acc. of 87.46%, class acc. of 81.51%, and mIoU of 71.56% with SBD, which are higher than those of the existing state-of-the-art methods. In addition, the average processing time were 31.12 ms and 92.46 ms on the desktop computer and Jetson TX2 embedded system, respectively, which confirmed that ESSN is applicable to both the desktop computer and Jetson TX2 embedded system which is widely used in autonomous vehicles.