Toward Distributed, Global, Deep Learning Using IoT Devices (original) (raw)

Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT

ACM Transactions on Embedded Computing Systems, 2019

Model compression has emerged as an important area of research for deploying deep learning models on Internet-of-Things (IoT). However, for extremely memory-constrained scenarios, even the compressed models cannot fit within the memory of a single device and, as a result, must be distributed across multiple devices. This leads to a distributed inference paradigm in which memory and communication costs represent a major bottleneck. Yet, existing model compression techniques are not communication-aware. Therefore, we propose Network of Neural Networks (NoNN), a new distributed IoT learning paradigm that compresses a large pretrained ‘teacher’ deep network into several disjoint and highly-compressed ‘student’ modules, without loss of accuracy. Moreover, we propose a network science-based knowledge partitioning algorithm for the teacher model, and then train individual students on the resulting disjoint partitions. Extensive experimentation on five image classification datasets, for use...

Federated Learning for Resource-Constrained IoT Devices: Panoramas and State of the Art

Adaptation, learning, and optimization, 2022

Nowadays, devices are equipped with advanced sensors with higher processing/computing capabilities. Further, widespread Internet availability enables communication among sensing devices. As a result, vast amounts of data are generated on edge devices to drive Internet-of-Things (IoT), crowdsourcing, and other emerging technologies. The extensive amount of collected data can be pre-processed, scaled, classified, and finally, used for predicting future events with machine learning (ML) methods. In traditional ML approaches, data is sent to and processed in a central server, which encounters communication overhead, processing delay, privacy leakage, and security issues. To overcome these challenges, each client can be trained locally based on its available data and by learning from the global model. This decentralized learning approach is referred to as federated learning (FL). However, in large-scale networks, there may be clients with varying computational resource capabilities. This may lead to implementation and scalability challenges for FL techniques. In this paper, we first introduce some recently implemented real-life applications of FL. We then emphasize on the core challenges of implementing the FL algorithms from the perspective of resource limitations (e.g., memory, bandwidth, and energy budget) of client devices. We finally discuss open issues associated with FL and highlight future directions in the FL area concerning resource-constrained devices.

INVITED: New Directions in Distributed Deep Learning: Bringing the Network at Forefront of IoT Design

2020 57th ACM/IEEE Design Automation Conference (DAC), 2020

In this paper, we first highlight three major challenges to large-scale adoption of deep learning at the edge: (i) Hardware-constrained IoT devices, (ii) Data security and privacy in the IoT era, and (iii) Lack of network-aware deep learning algorithms for distributed inference across multiple IoT devices. We then provide a unified view targeting three research directions that naturally emerge from the above challenges: (1) Federated learning for training deep networks, (2) Data-independent deployment of learning algorithms, and (3) Communicationaware distributed inference. We believe that the above research directions need a network-centric approach to enable the edge intelligence and, therefore, fully exploit the true potential of IoT.

Multi-Component Optimization and Efficient Deployment of Neural-Networks on Resource-Constrained IoT Hardware

Cornell University - arXiv, 2022

The majority of IoT devices like smartwatches, smart plugs, HVAC controllers, etc., are powered by hardware with a constrained specification (low memory, clock speed and processor) which is insufficient to accommodate and execute large, high-quality models. On such resource-constrained devices, manufacturers still manage to provide attractive functionalities (to boost sales) by following the traditional approach of programming IoT devices/products to collect and transmit data (image, audio, sensor readings, etc.) to their cloud-based ML analytics platforms. For decades, this online approach has been facing issues such as compromised data streams, non-real-time analytics due to latency, bandwidth constraints, costly subscriptions, recent privacy issues raised by users and the GDPR guidelines, etc. In this paper, to enable ultra-fast and accurate AI-based offline analytics on resource-constrained IoT devices, we present an end-toend multi-component model optimization sequence and open-source its implementation. Researchers and developers can use our optimization sequence to optimize high memory, computation demanding models in multiple aspects in order to produce small size, low latency, low-power consuming models that can comfortably fit and execute on resource-constrained hardware. The experimental results show that our optimization components can produce models that are; (i) 12.06 x times compressed; (ii) 0.13% to 0.27% more accurate; (iii) Orders of magnitude faster unit inference at 0.06 ms. Our optimization sequence is generic and can be applied to any state-of-the-art models trained for anomaly detection, predictive maintenance, robotics, voice recognition, and machine vision. INDEX TERMS Edge Intelligence, Neural Networks, Optimization, TinyML, IoT Hardware. I. INTRODUCTION A RTIFICIAL Intelligence (AI) have been used as the principal approach to solve a variety of significant problems in machine translation, video analytics, voice localization, handwriting recognition, etc. Commonly, to provide edge-level AI-functionalities to customers, manufacturers program their IoT devices/products to capture, compress and transmit data (image, audio, sensor readings, etc.) over the network to their central server/cloud where advanced analytics are performed [1]. Although such cloud-based approaches reduce the maintenance cost by keeping the analytics models in one central location, it may not be suitable for most applications [2] because; First, there is a latency caused when transmitting data to a central server for analysis and back to the application. Second, the use of a server for continuous data storage and analysis is expensive because these applications generate high volumes of data. Furthermore, the processing and storage of multiple data streams make the subscription more costly. This design requires a huge amount of reliable bandwidth, which may not always be

New Directions in Distributed Deep Learning: Bringing the Network at Forefront of IoT Design

arXiv (Cornell University), 2020

EdgeAl: A Vision for Deep Learning in the IoT Era

IEEE Design & Test, 2019

The significant computational requirements of deep learning present a major bottleneck for its large-scale adoption on hardwareconstrained IoT-devices. Here, we envision a new paradigm called EdgeAI to address major impediments associated with deploying deep networks at the edge. Specifically, we discuss the existing directions in computation-aware deep learning and describe two new challenges in the IoT era: (1) Data-independent deployment of learning, and (2) Communication-aware distributed inference. We further present new directions from our recent research to alleviate the latter two challenges. Overcoming these challenges is crucial for rapid adoption of learning on IoT-devices in order to truly enable EdgeAI.

FedAdapt: Adaptive Offloading for IoT Devices in Federated Learning

2021

Applying Federated Learning (FL) on Internet-of-Things devices is necessitated by the large volumes of data they produce and growing concerns of data privacy. However, there are three challenges that need to be addressed to make FL efficient: (i) execute on devices with limited computational capabilities, (ii) account for stragglers due to computational heterogeneity of devices, and (iii) adapt to the changing network bandwidths. This paper presents FedAdapt, an adaptive offloading FL framework to mitigate the aforementioned challenges. FedAdapt accelerates local training in computationally constrained devices by leveraging layer offloading of deep neural networks (DNNs) to servers. Further, FedAdapt adopts reinforcement learning based optimization and clustering to adaptively identify which layers of the DNN should be offloaded for each individual device on to a server to tackle the challenges of computational heterogeneity and changing network bandwidth. Experimental studies are carried out on a lab-based testbed comprising five IoT devices. By offloading a DNN from the device to the server FedAdapt reduces the training time of a typical IoT device by over half compared to classic FL. The training time of extreme stragglers and the overall training time can be reduced by up to 57%. Furthermore, with changing network bandwidth, FedAdapt is demonstrated to reduce the training time by up to 40% when compared to classic FL, without sacrificing accuracy. FedAdapt can be downloaded from https://github.com/qub-blesson/FedAdapt.

Toward Decentralized and Collaborative Deep Learning Inference for Intelligent IoT Devices

2021

Deep learning technologies are empowering IoT devices with an increasing number of intelligent services. However, the contradiction between resource-constrained IoT devices and intensive computing makes it common to transfer data to the cloud center for executing all DNN inference, or dynamically allocate DNN computations between IoT devices and the cloud center. Existing approaches perform a strong dependence on the cloud center, and require the support of a reliable and stable network. Thus, it may directly cause unreliable or even unavailable service in extreme or unstable environments. We propose DeColla, a decentralized and collaborative deep learning inference system for IoT devices, which completely migrates DNN computations from the cloud center to the IoT device side, relying on the collaborative mechanism to accelerate the DNN inference that is difficult for an individual IoT device to accomplish. DeColla uses a parallel acceleration strategy via a DRL-based adaptive alloc...

ARES: Adaptive Resource-Aware Split Learning for Internet of Things

Computer Networks, 2022

Distributed training of Machine Learning models in edge Internet of Things (IoT) environments is challenging because of three main points. First, resource-constrained devices have large training times and limited energy budget. Second, resource heterogeneity of IoT devices slows down the training of the global model due to the presence of slower devices (stragglers). Finally, varying operational conditions, such as network bandwidth, and computing resources, significantly affect training time and energy consumption. Recent studies have proposed Split Learning (SL) for distributed model training with limited resources but its efficient implementation on the resource-constrained and decentralized heterogeneous IoT devices remains minimally explored. We propose Adaptive REsource-aware Splitlearning (ARES), a scheme for efficient model training in IoT systems. ARES accelerates local training in resource-constrained devices and minimizes the effect of stragglers on the training through device-targeted split points while accounting for time-varying network throughput and computing resources. ARES takes into account application constraints to mitigate training optimization tradeoffs in terms of energy consumption and training time. We evaluate ARES prototype on a real testbed comprising heterogeneous IoT devices running a widely-adopted deep neural network and dataset. Results show that ARES accelerates model training on IoT devices by up to 48% and minimizes the energy consumption by up to 61.4% compared to Federated Learning (FL) and classic SL, without sacrificing the model convergence and accuracy.

Edge-Cloud Computing for IoT Data Analytics: Embedding Intelligence in the Edge with Deep Learning

IEEE Transactions on Industrial Informatics, 2020

Rapid growth in numbers of connected devices including sensors, mobile, wearable, and other Internet of Things (IoT) devices, is creating an explosion of data that are moving across the network. To carry out machine learning (ML), IoT data are typically transferred to the cloud or another centralized system for storage and processing; however, this causes latencies and increases network traffic. Edge computing has the potential to remedy those issues by moving computation closer to the network edge and data sources. On the other hand, edge computing is limited in terms of computational power and thus is not well suited for ML tasks. Consequently, this paper aims to combine edge and cloud computing for IoT data analytics by taking advantage of edge nodes to reduce data transfer. In order to process data close to the source, sensors are grouped according to locations, and feature learning is performed on the close by edge node. For comparison reasons, similarity-based processing is also considered. Feature learning is carried out with deep learning: the encoder part of the trained autoencoder is placed on the edge and the decoder part is placed on the cloud. The evaluation was performed on the task of human activity recognition from sensor data. The results show that when sliding windows are used in the preparation step, data can be reduced on the edge up to 80% without significant loss in accuracy.

Toward Distributed, Global, Deep Learning Using IoT Devices (original) (raw)

Related papers