Learning Dual Convolutional Neural Networks for Low-Level Vision (original) (raw)

A taxonomy of Deep Convolutional Neural Nets for Computer Vision

Traditional architectures for solving computer vision problems and the degree of success they enjoyed have been heavily reliant on hand-crafted features. However, of late, deep learning techniques have offered a compelling alternative – that of automatically learning problem-specific features. With this new paradigm, every problem in computer vision is now being re-examined from a deep learning perspective. Therefore, it has become important to understand what kind of deep networks are suitable for a given problem. Although general surveys of this fast-moving paradigm (i.e., deep-networks) exist, a survey specific to computer vision is missing. We specifically consider one form of deep networks widely used in computer vision – convolutional neural networks (CNNs). We start with “AlexNet” as our base CNN and then examine the broad variations proposed over time to suit different applications. We hope that our recipe-style survey will serve as a guide, particularly for novice practitioners intending to use deep-learning techniques for computer vision.

Deep network in network

The different CNN models use many layers that typically include a stack of linear convolution layers combined with pooling and normalization layers to extract the characteristics of the images. Unlike these models, and instead of using a linear filter for convolution, the network in network (NiN) model uses a multilayer perception (MLP), a nonlinear function, to replace the linear filter. This article presents a new deep network in network (DNIN) model based on the NiN structure, NiN drag a universal approximator, (MLP) with rectified linear unit (ReLU) to improve classification performance. The use of MLP leads to an increase in the density of the connection. This makes learning more difficult and time learning slower. In this article, instead of ReLU, we use the linear exponential unit (eLU) to solve the vanishing gradient problem that can occur when using ReLU and to speed up the learning process. In addition, a reduction in the convolution filters size by increasing the depth is used in order to reduce the number of parameters. Finally, a batch normalization layer is applied to reduce the saturation of the eLUs and the dropout layer is applied to avoid overfitting. The experimental results on the CIFAR-10 database show that the DNIN can reduce the complexity of implementation due to the reduction in the adjustable parameters. Also the reduction in the filters size shows an improvement in the recognition accuracy of the model. Keywords Exponential linear unit (ELU) Á Convolutional neural networks (CNNs) Á Deep MLPconv Á Image recognition Á Network in Network (NiN)

Convolution Neural Networks by Image Net

We trained a large, deep convolution neural network to classify the 1.2 million high-resolution images in the Image Net LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolution layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way soft ax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called ―dropout‖ that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry Introduction Current approaches to object recognition make essential use of machine learning methods. To improve their performance, we can collect larger datasets, learn more powerful models, and use better techniques for preventing overfitting. Until recently, datasets of labeled images were relatively small — on the order of tens of thousands of images (e.g., NORB [16], Caltech-101/256 [8, 9], and CIFAR-10/100 [12]). Simple recognition tasks can be solved quite well with datasets of this size, especially if they are augmented with label-preserving transformations. For example, the currentbest error rate on the MNIST digit-recognition task Despite the attractive qualities of CNNs, and despite the relative efficiency of their local architecture, they have still been prohibitively expensive to apply in large scale to high-resolution images. Luckily, current GPUs, paired with a highly-optimized implementation of 2D convolution, are powerful enough to facilitate the training of interestingly-large CNNs, and recent datasets such as ImageNet contain enough labeled examples to train such models without severe overfitting. The specific contributions of this paper are as follows: we trained one of the largest convolutional neural networks to date on the subsets of ImageNet used in the ILSVRC-2010 and ILSVRC-2012 competitions [2] and achieved by far the best results ever reported on these datasets. We wrote a highly-optimized GPU implementation of 2D convolution and all the other operations inherent in training convolutional neural networks, which we make available publicly1. Our network contains a number of new and unusual features which improve its performance and reduce its training time, which are detailed in Section 3. The size of our network made overfitting a significant problem, even with 1.2 million labeled training examples, so we used several effective techniques for preventing overfitting, which are described in Section 4.

Wide deep residual networks in networks

The Deep Residual Network in Network (DrNIN) model [18] is an important extension of the convolutional neural network (CNN). They have proven capable of scaling up to dozens of layers. This model exploits a nonlinear function, to replace linear filter, for the convolution represented in the layers of multilayer perceptron (MLP) [23]. Increasing the depth of DrNIN can contribute to improved classification and detection accuracy. However, training the deep model becomes more difficult, the training time slows down, and a problem of decreasing feature reuse arises. To address these issues, in this paper, we conduct a detailed experimental study on the architecture of DrMLPconv blocks, based on which we present a new model that represents a wider model of DrNIN. In this model, we increase the width of the DrNINs and decrease the depth. We call the result module (WDrNIN). On the CIFAR-10 dataset, we will provide an experimental study showing that WDrNIN models can gain accuracy through increased width. Moreover, we demonstrate that even a single WDrNIN outperforms all network-based models in MLPconv network models in accuracy and efficiency with an accuracy equivalent to 93.553% for WDrNIN-4-2.