Benchmarking Convolutional Neural Network Inference on Low-Power Edge Devices (original) (raw)

DeepEdgeBench: Benchmarking Deep Neural Networks on Edge Devices

2021

EdgeAI (Edge computing based Artificial Intelligence) has been most actively researched for the last few years to handle variety of massively distributed AI applications to meet up the strict latency requirements. Meanwhile, many companies have released edge devices with smaller form factors (low power consumption and limited resources) like the popular Raspberry Pi and Nvidia’s Jetson Nano for acting as compute nodes at the edge computing environments. Although the edge devices are limited in terms of computing power and hardware resources, they are powered by accelerators to enhance their performance behavior. Therefore, it is interesting to see how AI-based Deep Neural Networks perform on such devices with limited resources. In this work, we present and compare the performance in terms of inference time and power consumption of the four SoCs: Asus Tinker Edge R, Raspberry Pi 4, Google Coral Dev Board, Nvidia Jetson Nano, and one microcontroller: Arduino Nano 33 BLE, on different ...

TinyML: Enabling of Inference Deep Learning Models on Ultra-Low-Power IoT Edge Devices for AI Applications

Micromachines

Recently, the Internet of Things (IoT) has gained a lot of attention, since IoT devices are placed in various fields. Many of these devices are based on machine learning (ML) models, which render them intelligent and able to make decisions. IoT devices typically have limited resources, which restricts the execution of complex ML models such as deep learning (DL) on them. In addition, connecting IoT devices to the cloud to transfer raw data and perform processing causes delayed system responses, exposes private data and increases communication costs. Therefore, to tackle these issues, there is a new technology called Tiny Machine Learning (TinyML), that has paved the way to meet the challenges of IoT devices. This technology allows processing of the data locally on the device without the need to send it to the cloud. In addition, TinyML permits the inference of ML models, concerning DL models on the device as a Microcontroller that has limited resources. The aim of this paper is to p...

Performance Prediction for Convolutional Neural Networks in Edge Devices

ArXiv, 2020

Running Convolutional Neural Network (CNN) based applications on edge devices near the source of data can meet the latency and privacy challenges. However due to their reduced computing resources and their energy constraints, these edge devices can hardly satisfy CNN needs in processing and data storage. For these platforms, choosing the CNN with the best trade-off between accuracy and execution time while respecting Hardware constraints is crucial. In this paper, we present and compare five (5) of the widely used Machine Learning based methods for execution time prediction of CNNs on two (2) edge GPU platforms. For these 5 methods, we also explore the time needed for their training and tuning their corresponding hyperparameters. Finally, we compare times to run the prediction models on different platforms. The utilization of these methods will highly facilitate design space exploration by providing quickly the best CNN on a target edge GPU. Experimental results show that eXtreme Gr...

Unveiling Energy Efficiency in Deep Learning: Measurement, Prediction, and Scoring across Edge Devices

Proceedings of ACM/IEEE Symposium on Edge Computing (SEC), 2023

Today, deep learning optimization is primarily driven by research focused on achieving high inference accuracy and reducing latency. However, the energy efficiency aspect is often overlooked, possibly due to a lack of sustainability mindset in the field and the absence of a holistic energy dataset. In this paper, we conduct a threefold study, including energy measurement, prediction, and efficiency scoring, with an objective to foster transparency in power and energy consumption within deep learning across various edge devices. Firstly, we present a detailed, first-of-its-kind measurement study that uncovers the energy consumption characteristics of on-device deep learning. This study results in the creation of three extensive energy datasets for edge devices, covering a wide range of kernels, state-of-the-art DNN models, and popular AI applications. Secondly, we design and implement the first kernel level energy predictors for edge devices based on our kernel-level energy dataset. Evaluation results demonstrate the ability of our predictors to provide consistent and accurate energy estimations on unseen DNN models. Lastly, we introduce two scoring metrics, PCS and IECS, developed to convert complex power and energy consumption data of an edge device into an easily understandable manner for edge device end-users. We hope our work can help shift the mindset of both end-users and the research community towards sustainability in edge computing, a principle that drives our research. Find data, code, and more up-to-date information at https://amai-gsu.github.io/DeepEn2023.

User Driven FPGA-Based Design Automated Framework of Deep Neural Networks for Low-Power Low-Cost Edge Computing

IEEE Access

Deep Learning techniques have been successfully applied to solve many Artificial Intelligence (AI) applications problems. However, owing to topologies with many hidden layers, Deep Neural Networks (DNNs) have high computational complexity, which makes their deployment difficult in contexts highly constrained by requirements such as performance, real-time processing, or energy efficiency. Numerous hardware/software optimization techniques using GPUs, ASICs, and reconfigurable computing (i.e, FPGAs), have been proposed in the literature. With FPGAs, very specialized architectures have been developed to provide an optimal balance between high-speed and low power. However, when targeting edge computing, user requirements and hardware constraints must be efficiently met. Therefore, in this work, we only focus on reconfigurable embedded systems based on the Xilinx ZYNQ SoC and popular DNNs that can be implemented on Embedded Edge improving performance per watt while maintaining accuracy. In this context, we propose an automated framework for the implementation of hardware-accelerated DNN architectures. This framework provides an end-to-end solution that facilitates the efficient deployment of topologies on FPGAs by combining custom hardware scalability with optimization strategies. Cutting-edge comparisons and experimental results demonstrate that the architectures developed by our framework offer the best compromise between performance, energy consumption, and system costs. For instance, the low power (0.266W) DNN topologies generated for the MNIST database achieved a high throughput of 3,626 FPS. INDEX TERMS Deep learning, electronic design automation, edge computing, FPGA, low power systems.

DistrEdge: Speeding up Convolutional Neural Network Inference on Distributed Edge Devices

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

As the number of edge devices with computing resources (e.g., embedded GPUs, mobile phones, and laptops) increases, recent studies demonstrate that it can be beneficial to collaboratively run convolutional neural network (CNN) inference on more than one edge device. However, these studies make strong assumptions on the devices' conditions, and their application is far from practical. In this work, we propose a general method, called DistrEdge, to provide CNN inference distribution strategies in environments with multiple IoT edge devices. By addressing heterogeneity in devices, network conditions, and nonlinear characters of CNN computation, DistrEdge is adaptive to a wide range of cases (e.g., with different network conditions, various device types) using deep reinforcement learning technology. We utilize the latest embedded AI computing devices (e.g., NVIDIA Jetson products) to construct cases of heterogeneous devices' types in the experiment. Based on our evaluations, DistrEdge can properly adjust the distribution strategy according to the devices' computing characters and the network conditions. It achieves 1.1 to 3× speedup compared to state-of-the-art methods.

Optimization of Convolutional Neural Networks on Resource Constrained Devices

Implementation of convolutional neural networks (CNNs) on resource constrained devices like FPGA (example: Zynq) etc. is important for intelligence in edge computing. This paper presents and discusses different hardware optimization methods that were employed to design a CNN model that is amenable to such devices, in general. Adaptive processing, exploitation of parallelism etc. are employed to show the superior performance of proposed methods over state of the art.

Power Efficient Machine Learning Models Deployment on Edge IoT Devices

Sensors

Computing has undergone a significant transformation over the past two decades, shifting from a machine-based approach to a human-centric, virtually invisible service known as ubiquitous or pervasive computing. This change has been achieved by incorporating small embedded devices into a larger computational system, connected through networking and referred to as edge devices. When these devices are also connected to the Internet, they are generally named Internet-of-Thing (IoT) devices. Developing Machine Learning (ML) algorithms on these types of devices allows them to provide Artificial Intelligence (AI) inference functions such as computer vision, pattern recognition, etc. However, this capability is severely limited by the device’s resource scarcity. Embedded devices have limited computational and power resources available while they must maintain a high degree of autonomy. While there are several published studies that address the computational weakness of these small systems-m...

Efficient neural architectures for edge devices

2022

The rise of IoT networks, with numerous interconnected edge devices, has led to an increase in demand for intelligent data processing closer to the data source. Deployment of neural networks at the edge is desirable, though challenging since an edge has limitations on available resources. The focus of this thesis is on neural architectures for Convolutional Neural Networks (CNNs) that execute on the edge. The thesis presents Evolutionary Piecemeal Training (EPT), an algorithm for an efficient Neural Architecture Search (NAS). This flexible algorithm treats NAS as an optimization problem with a variable number of objectives possible. To highlight the versatility of EPT, three different sets of experiments have been shown in the thesis, with one, two and four objectives respectively. The multi-objective algorithm typically involves hardware specific objectives in addition to accuracy of the CNN to produce a pareto-optimal set of neural architectures. Further, the thesis examines adaptivity of the CNN-based application running at the edge. The first work is Scenario Based Run-time Switching (SBRS) framework, where every scenario represents an operation mode and has an associated CNN. An application may switch between scenarios to allow synchronous adaptation with environmental changes. Additionally, a framework was presented to efficiently share and reuse CNNs in distributed IoT networks. This framework supports maintenance and adaptation of existing and deployed CNNs at the edge. To conclude, this thesis demonstrates various methodologies to improve the performance of a CNN deployed on a resource-constrained edge device. The key ideas include searching for an efficient neural architecture, adaptive applications with run-time CNN switching and CNNs as dynamic entities in a distributed IoT network.

Towards Enabling Dynamic Convolution Neural Network Inference for Edge Intelligence

2022 IEEE International Symposium on Circuits and Systems (ISCAS)

Deep learning applications have achieved great success in numerous real-world applications. Deep learning models, especially Convolution Neural Networks (CNN) are often prototyped using FPGA because it offers high power efficiency, and reconfigurability. The deployment of CNNs on FPGAs follows a design cycle that requires saving of model parameters in the on-chip memory during High level synthesis (HLS). Recent advances in edge intelligence requires CNN inference on edge network to increase throughput and reduce latency. To provide flexibility, dynamic parameter allocation to different mobile devices is required to implement either a predefined or defined on-the-fly CNN architecture. In this study, we present novel methodologies for dynamically streaming the model parameters at run-time to implement a traditional CNN architecture. We further propose a library-based approach to design scalable and dynamic distributed CNN inference on the fly leveraging partial-reconfiguration techniques, which is particularly suitable for resource constrained edge devices. The proposed techniques are implemented on the Xilinx PYNQ-Z2 board to prove the concept by utilizing the LeNet-5 CNN model. The results show that the proposed methodologies are effective, with classification accuracy rates of 92%, 86% and 94% respectively.