Complexity of Deep Convolutional Neural Networks in Mobile Computing (original) (raw)

DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices

ArXiv, 2018

Deploying deep neural networks on mobile devices is a challenging task. Current model compression methods such as matrix decomposition effectively reduce the deployed model size, but still cannot satisfy real-time processing requirement. This paper first discovers that the major obstacle is the excessive execution time of non-tensor layers such as pooling and normalization without tensor-like trainable parameters. This motivates us to design a novel acceleration framework: DeepRebirth through "slimming" existing consecutive and parallel non-tensor and tensor layers. The layer slimming is executed at different substructures: (a) streamline slimming by merging the consecutive non-tensor and tensor layer vertically; (b) branch slimming by merging non-tensor and tensor branches horizontally. The proposed optimization operations significantly accelerate the model execution and also greatly reduce the run-time memory cost since the slimmed model architecture contains less hidden...

Data-Driven Compression of Convolutional Neural Networks

2019

Deploying trained convolutional neural networks (CNNs) to mobile devices is a challenging task because of the simultaneous requirements of the deployed model to be fast, lightweight and accurate. Designing and training a CNN architecture that does well on all three metrics is highly non-trivial and can be very time-consuming if done by hand. One way to solve this problem is to compress the trained CNN models before deploying to mobile devices. This work asks and answers three questions on compressing CNN models automatically: a) How to control the trade-off between speed, memory and accuracy during model compression? b) In practice, a deployed model may not see all classes and/or may not need to produce all class labels. Can this fact be used to improve the trade-off? c) How to scale the compression algorithm to execute within a reasonable amount of time for many deployments? The paper demonstrates that a model compression algorithm utilizing reinforcement learning with architecture...

Deep compression of convolutional neural networks with low-rank approximation

ETRI Journal

The application of deep neural networks (DNNs) to connect the world with cyber physical systems (CPSs) has attracted much attention. However, DNNs require a large amount of memory and computational cost, which hinders their use in the relatively low-end smart devices that are widely used in CPSs. In this paper, we aim to determine whether DNNs can be efficiently deployed and operated in lowend smart devices. To do this, we develop a method to reduce the memory requirement of DNNs and increase the inference speed, while maintaining the performance (for example, accuracy) close to the original level. The parameters of DNNs are decomposed using a hybrid of canonical polyadic-singular value decomposition, approximated using a tensor power method, and fine-tuned by performing iterative one-shot hybrid fine-tuning to recover from a decreased accuracy. In this study, we evaluate our method on frequently used networks. We also present results from extensive experiments on the effects of several fine-tuning methods, the importance of iterative fine-tuning, and decomposition techniques. We demonstrate the effectiveness of the proposed method by deploying compressed networks in smartphones.

Deep learning on mobile devices: a review

Mobile Multimedia/Image Processing, Security, and Applications 2019, 2019

Recent breakthroughs in deep learning and artificial intelligence technologies have enabled numerous mobile applications. While traditional computation paradigms rely on mobile sensing and cloud computing, deep learning implemented on mobile devices provides several advantages. These advantages include low communication bandwidth, small cloud computing resource cost, quick response time, and improved data privacy. Research and development of deep learning on mobile and embedded devices has recently attracted much attention. This paper provides a timely review of this fast-paced field to give the researcher, engineer, practitioner, and graduate student a quick grasp on the recent advancements of deep learning on mobile devices. In this paper, we discuss hardware architectures for mobile deep learning, including Field Programmable Gate Arrays (FPGA), Application Specific Integrated Circuit (ASIC), and recent mobile Graphic Processing Units (GPUs). We present Size, Weight, Area and Power (SWAP) considerations and their relation to algorithm optimizations, such as quantization, pruning, compression, and approximations that simplify computation while retaining performance accuracy. We cover existing systems and give a state-of-the-industry review of TensorFlow, MXNet, Mobile AI Compute Engine (MACE), and Paddle-mobile deep learning platform. We discuss resources for mobile deep learning practitioners, including tools, libraries, models, and performance benchmarks. We present applications of various mobile sensing modalities to industries, ranging from robotics, healthcare and multimedia, biometrics to autonomous drive and defense. We address the key deep learning challenges to overcome, including low quality data, and small training/adaptation data sets. In addition, the review provides numerous citations and links to existing code bases implementing various technologies. These resources lower the user's barrier to entry into the field of mobile deep learning.

Small and Slim Deep Convolutional Neural Network for Mobile Device

IEEE Access, 2020

Recent development of deep convolutional neural networks (DCNN) devoted in creating a slim model for devices with lower specification such as embedded, mobile hardware, or microcomputer. Slim model can be achieved by minimizing computational complexity which theoretically will make processing time faster. Therefore, our focus is to build an architecture with minimum floating-point operation per second (FLOPs). In this work, we propose a small and slim architecture which later will be compared to state-of-the-art models. This architecture will be implemented into two models which are CustomNet and CustomNet2. Each of these models implements 3 convolutional blocks which reduce the computational complexity while maintains its accuracy and able to compete with state-of-the-art DCNN models. These models will be trained using ImageNet, CIFAR 10, CIFAR 100 and other datasets. The result will be compared based on accuracy, complexity, size, processing time, and trainable parameter. From the result, we found that one of our models which is CustomNet2, is better than MobileNet, MobileNet-v2, DenseNet, NASNetMobile in accuracy, trainable parameter, and complexity. For future implementation, this architecture can be adapted using region based DCNN for multiple object detection. INDEX TERMS Artificial neural network, image recognition, machine learning, deep learning.

Empirical Compression Features of Mobile Computing and Data Applications Using Deep Neural Networks

Security and Communication Networks

Due to the enormous data sizes involved in mobile computing and multimedia data transfer, it is possible that more data traffic may be generated, necessitating the use of data compression. As a result, this paper investigates how mobile computing data are compressed under all transmission scenarios. The suggested approach integrates deep neural networks (DNN) at high weighting functionalities for compression modes. The proposed method employs appropriate data loading and precise compression ratios for successful data compression. The accuracy of multimedia data that must be conveyed to various users is higher even though compression ratios are higher. The same data are transferred at significantly higher compression ratios, which save time while also minimizing data mistakes that may occur at the receiver. The DNN process also includes a visible parameter for handling high data-weight situations. The visible parameter optimizes the data results, allowing simulation tools to readily ...

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

We introduce an extremely computation-efficient CNN architecture named ShuffleNet, which is designed specially for mobile devices with very limited computing power (e.g., 10-150 MFLOPs). The new architecture utilizes two new operations, pointwise group convolution and channel shuffle , to greatly reduce computation cost while maintaining accuracy. Experiments on ImageNet classification and MS COCO object detection demonstrate the superior performance of ShuffleNet over other structures, e.g. lower top-1 error (absolute 7.8%) than recent MobileNet [12] on Ima-geNet classification task, under the computation budget of 40 MFLOPs. On an ARM-based mobile device, ShuffleNet achieves ∼13× actual speedup over AlexNet while maintaining comparable accuracy.

A Survey on Deep Neural Network Compression: Challenges, Overview, and Solutions

ArXiv, 2020

Deep Neural Network (DNN) has gained unprecedented performance due to its automated feature extraction capability. This high order performance leads to significant incorporation of DNN models in different Internet of Things (IoT) applications in the past decade. However, the colossal requirement of computation, energy, and storage of DNN models make their deployment prohibitive on resource constraint IoT devices. Therefore, several compression techniques were proposed in recent years for reducing the storage and computation requirements of the DNN model. These techniques on DNN compression have utilized a different perspective for compressing DNN with minimal accuracy compromise. It encourages us to make a comprehensive overview of the DNN compression techniques. In this paper, we present a comprehensive review of existing literature on compressing DNN model that reduces both storage and computation requirements. We divide the existing approaches into five broad categories, i.e., ne...

DropNet: Reducing Neural Network Complexity via Iterative Pruning

2020

Modern deep neural networks require a significant amount of computing time and power to train and deploy, which limits their usage on edge devices. Inspired by the iterative weight pruning in the Lottery Ticket Hypothesis (Frankle & Carbin, 2018), we propose DropNet, an iterative pruning method which prunes nodes/filters to reduce network complexity. DropNet iteratively removes nodes/filters with the lowest average post-activation value across all training samples. Empirically, we show that DropNet is robust across diverse scenarios, including MLPs and CNNs using the MNIST, CIFAR-10 and Tiny ImageNet datasets. We show that up to 90% of the nodes/filters can be removed without any significant loss of accuracy. The final pruned network performs well even with reinitialization of the weights and biases. DropNet also has similar accuracy to an oracle which greedily removes nodes/filters one at a time to minimise training loss, highlighting its effectiveness.

Challenges and Obstacles Towards Deploying Deep Learning Models on Mobile Devices

2021

From computer vision and speech recognition to forecasting trajectories in autonomous vehicles, deep learning approaches are at the forefront of so many domains. Deep learning models are developed using plethora of high-level, generic frameworks and libraries. Running those models on the mobile devices require hardware-aware optimizations and in most cases converting the models to other formats or using a third-party framework. In reality, most of the developed models need to undergo a process of conversion, adaptation, and, in some cases, full retraining to match the requirements and features of the framework that is deploying the model on the target platform. Variety of hardware platforms with heterogeneous computing elements, from wearable devices to high-performance GPU clusters are used to run deep learning models. In this paper, we present the existing challenges, obstacles, and practical solutions towards deploying deep learning models on mobile devices.