Energy-Aware Inference Offloading for DNN-Driven Applications in Mobile Edge Clouds (original) (raw)
Related papers
Cost-effective Machine Learning Inference Offload for Edge Computing
ArXiv, 2020
Computing at the edge is increasingly important since a massive amount of data is generated. This poses challenges in transporting all that data to the remote data centers and cloud, where they can be processed and analyzed. On the other hand, harnessing the edge data is essential for offering data-driven and machine learning-based applications, if the challenges, such as device capabilities, connectivity, and heterogeneity can be mitigated. Machine learning applications are very compute-intensive and require processing of large amount of data. However, edge devices are often resources-constrained, in terms of compute resources, power, storage, and network connectivity. Hence, limiting their potential to run efficiently and accurately state-of-the art deep neural network (DNN) models, which are becoming larger and more complex. This paper proposes a novel offloading mechanism by leveraging installed-base on-premises (edge) computational resources. The proposed mechanism allows the e...
JointDNN: An Efficient Training and Inference Engine for Intelligent Mobile Cloud Computing Services
Deep neural networks are among the most influential architectures of deep learning algorithms, being deployed in many mobile intelligent applications. End-side services, such as intelligent personal assistants (IPAs), autonomous cars, and smart home services often employ either simple local models or complex remote models on the cloud. Mobile-only and cloud-only computations are currently the status-quo approaches. In this paper, we propose an efficient, adap-tive, and practical engine, JointDNN, for collaborative computation between a mobile device and cloud for DNNs in both inference and training phase. JointDNN not only provides an energy and performance efficient method of querying DNNs for the mobile side, but also benefits the cloud server by reducing the amount of its workload and communications compared to the cloud-only approach. Given the DNN architecture, we investigate the efficiency of processing some layers on the mobile device and some layers on the cloud server. We provide optimization formulations at layer granularity for forward and backward propagation in DNNs, which can adapt to mobile battery limitations and cloud server load constraints and quality of service. JointDNN achieves up to 18× and 32× reductions on the latency and mobile energy consumption of querying DNNs compared to the status-quo approaches, respectively .
A Deep Learning Approach for Energy Efficient Computational Offloading in Mobile Edge Computing
IEEE Access, 2019
Mobile edge computing (MEC) has shown tremendous potential as a means for computationally intensive mobile applications by partially or entirely offloading computations to a nearby server to minimize the energy consumption of user equipment (UE). However, the task of selecting an optimal set of components to offload considering the amount of data transfer as well as the latency in communication is a complex problem. In this paper, we propose a novel energy-efficient deep learning based offloading scheme (EEDOS) to train a deep learning based smart decision-making algorithm that selects an optimal set of application components based on remaining energy of UEs, energy consumption by application components, network conditions, computational load, amount of data transfer, and delays in communication. We formulate the cost function involving all aforementioned factors, obtain the cost for all possible combinations of component offloading policies, select the optimal policies over an exhaustive dataset, and train a deep learning network as an alternative for the extensive computations involved. Simulation results show that our proposed model is promising in terms of accuracy and energy consumption of UEs. INDEX TERMS Computational offloading, deep learning, energy efficient offloading, mobile edge computing, user equipment.
Sensors, 2021
In mobile edge computing (MEC), partial computational offloading can be intelligently investigated to reduce the energy consumption and service delay of user equipment (UE) by dividing a single task into different components. Some of the components execute locally on the UE while the remaining are offloaded to a mobile edge server (MES). In this paper, we investigate the partial offloading technique in MEC using a supervised deep learning approach. The proposed technique, comprehensive and energy efficient deep learning-based offloading technique (CEDOT), intelligently selects the partial offloading policy and also the size of each component of a task to reduce the service delay and energy consumption of UEs. We use deep learning to find, simultaneously, the best partitioning of a single task with the best offloading policy. The deep neural network (DNN) is trained through a comprehensive dataset, generated from our mathematical model, which reduces the time delay and energy consump...
Enabling DNN Acceleration with Data and Model Parallelization over Ubiquitous End Devices
IEEE Internet of Things Journal, 2021
Deep neural network (DNN) shows great promise in providing more intelligence to ubiquitous end devices. However, the existing partition-offloading schemes adopt data-parallel or model-parallel collaboration between devices and the cloud, which does not make full use of the resources of end devices for deep-level parallel execution. This paper proposes eDDNN (i.e. enabling Distributed DNN), a collaborative inference scheme over heterogeneous end devices using cross-platform web technology, moving the computation close to ubiquitous end devices, improving resource utilization, and reducing the computing pressure of data centers. eDDNN implements D2D communication and collaborative inference among heterogeneous end devices with WebRTC protocol, divides the data and corresponding DNN model into pieces simultaneously, and then executes inference almost independently by establishing a layer dependency table. Besides, eDDNN provides a dynamic allocation algorithm based on deep reinforcemen...
Journal of Information and Data Management
Internet-of-Things (IoT) applications based on Artificial Intelligence, such as mobile object detection and recognition from images and videos, may greatly benefit from inferences made by state-of-the-art Deep Neural Network(DNN) models. However, adopting such models in IoT applications poses an important challenge since DNNs usually require lots of computational resources (i.e. memory, disk, CPU/GPU, and power), which may prevent them to run on resource-limited edge devices. On the other hand, moving the heavy computation to the Cloud may significantly increase running costs and latency of IoT applications. Among the possible strategies to tackle this challenge are: (i) DNN model partitioning between edge and cloud; and (ii) running simpler models in the edge and more complex ones in the cloud, with information exchange between models, when needed. Variations of strategy (i) also include: running the entire DNN on the edge device (sometimes not feasible) and running the entire DNN ...
AoDNN: An Auto-Offloading Approach to Optimize Deep Inference for Fostering Mobile Web
IEEE INFOCOM 2022 - IEEE Conference on Computer Communications
Employing today's deep neural network (DNN) into the cross-platform web with an offloading way has been a promising means to alleviate the tension between intensive inference and limited computing resources. However, it is still challenging to directly leverage the distributed DNN execution into web apps with the following limitations, including (1) how special computing tasks such as DNN inference can provide fine-grained and efficient offloading in the inefficient JavaScriptbased environment? (2) lacking the ability to balance the latency and mobile energy to partition the inference facing various web applications' requirements. (3) and ignoring that DNN inference is vulnerable to the operating environment and mobile devices' computing capability, especially dedicated web apps. This paper designs AoDNN, an automatic offloading framework to orchestrate the DNN inference across the mobile web and the edge server, with three main contributions. First, we design the DNN offloading based on providing a snapshot mechanism and use multi-threads to monitor dynamic contexts, partition decision, trigger offloading, etc. Second, we provide a learning-based latency and mobile energy prediction framework for supporting various web browsers and platforms. Third, we establish a multiobjective optimization to solve the optimal partition by balancing the latency and mobile energy.
A Lightweight Collaborative Deep Neural Network for the Mobile Web in Edge Cloud
IEEE Transactions on Mobile Computing
Enabling deep learning technology on the mobile web can improve the user's experience for achieving web artificial intelligence in various fields. However, heavy DNN models and limited computing resources of the mobile web are now unable to support executing computationally intensive DNNs when deploying in a cloud computing platform. With the help of promising edge computing, we propose a lightweight collaborative deep neural network for the mobile web, named LcDNN, which contributes to three aspects: (1) We design a composite collaborative DNN that reduces the model size, accelerates inference, and reduces mobile energy cost by executing a lightweight binary neural network (BNN) branch on the mobile web. (2) We provide a jointly training method for LcDNN and implement an energy-efficient inference library for executing the BNN branch on the mobile web. (3) To further promote the resource utilization of the edge cloud, we develop a DRL-based online scheduling scheme to obtain an optimal allocation for LcDNN. The experimental results show that LcDNN outperforms existing approaches for reducing the model size by about 16x to 29x. It also reduces the end-to-end latency and mobile energy cost with acceptable accuracy and improves the throughput and resource utilization of the edge cloud.
A Deep Learning Approach for Task Offloading in Multi-UAV Aided Mobile Edge Computing
IEEE Access
Computation offloading has proven to be an effective method for facilitating resourceintensive tasks on IoT mobile edge nodes with limited processing capabilities. Additionally, in the context of Mobile Edge Computing (MEC) systems, edge nodes can offload its computation-intensive tasks to a suitable edge server. Hence, they can reduce energy cost and speed up processing. Despite the numerous accomplished efforts in task offloading problems on the Internet of Things (IoT), this problem remains a research gap mainly because of its NP-hardness in addition to the unrealistic assumptions in many proposed solutions. In order to accurately extract information from raw sensor data from IoT devices deployed in complicated contexts, Deep Learning (DL) is a potential method. Therefore, in this paper, an approach based on Deep Reinforcement Learning (DRL) will be presented to optimize the offloading process for IoT in MEC environments. This approach can achieve the optimal offloading decision. A Markov Decision Problem (MDP) is used to formulate the offloading problem. Delay time and consumed energy are the main optimization targets in this work. The proposed approach has been verified using extensive simulations. Simulation results demonstrate that the proposed model can effectively improve the MEC system latency, energy consumption, and significantly outperforms the Deep Q Networks (DQNs) and Actor Critic (AC) approaches. INDEX TERMS Deep learning, deep reinforcement learning, Internet of Things, mobile edge computing, task offloading. I. INTRODUCTION 18 The 5G era networks has been realized based on networking 19 technologies, innovations, and the new computing and com-20 munication paradigms [1]. Mobile Edge Computing (MEC) is 21 one of the key technologies for computation distribution that 22 boosts the performance of 5G cellular networks [2]. The main 23 role of MEC is the minimization of communication latency 24 between the user and the server. This behavior has a great 25 importance for Internet of Things (IoT) environments. IoT 26 has become an important area of research due to its rapid 27 use in our daily lives and in industry. Therefore, It faces 28 numerous challenges, including latency reduction, storage 29 management, energy consumption, task offloading, etc [3].
Dynamic Hierarchical Neural Network Offloading in IoT Edge Networks
2021 10th IFIP International Conference on Performance Evaluation and Modeling in Wireless and Wired Networks (PEMWN), 2021
In recent developments in machine learning, a trend has emerged where larger models achieve better performance. At the same time, deploying these models in real-life scenarios is difficult due to the parallel trend of pushing them on endusers or IoT devices with strong resource limitations. In this work, we develop a novel technique for executing parts of a single model successively through multiple devices (IoT, edge, cloud) while respecting each device's resource limitations. For that, we introduce a new offloading mechanism where, during computation, a decision can be made to offload work, together with the ability to exit early in the computation with intermediate results. The decision itself is tuned through Deep Q-Learning.