Piyush Yadav - Academia.edu (original) (raw)
Papers by Piyush Yadav
2021 IEEE International Conference on Big Data (Big Data)
The concept of ML model aggregation rather than data aggregation has gained much attention as it ... more The concept of ML model aggregation rather than data aggregation has gained much attention as it boosts prediction performance while maintaining stability and preserving privacy. In a non-ideal scenario, there are chances for a base model trained on a single device to make independent but complementary errors. To handle such cases, in this paper, we implement and release the code of 8 robust ML model combining methods that achieves reliable prediction results by combining numerous base models (trained on many devices) to form a central model that effectively limits errors, built-in randomness and uncertainties. We extensively test the model combining performance by performing 15 heterogeneous devices and 3 datasets based experiments that exemplifies how a complicated collective intelligence can be derived from numerous elementary intelligence learned by distributed, ubiquitous IoT devices.
2021 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/IOP/SCI), 2021
The majority of Internet of Things (IoT) devices are tiny embedded systems with a micro-controlle... more The majority of Internet of Things (IoT) devices are tiny embedded systems with a micro-controller unit (MCU) as its brain. The memory footprint (SRAM, Flash, and EEPROM) of such MCU-based devices is often very limited, restricting onboard Machine Learning (ML) model training for large trainsets with high feature dimensions. To cope with memory issues, the current edge analytics approaches train high-quality ML models on the cloud GPUs (uses large volume historical data), then deploy the deep optimized version of the resultant models on edge devices for inference. Such approaches are inefficient in concept drift situations where the data generated at the device level vary frequently, and trained models are clueless on how to behave if previously unseen data arrives. In this paper, we present Train++, an incremental training algorithm that trains ML models locally at the device level (e.g., on MCUs and small CPUs) using the full n-samples of high-dimensional data. Train++ transforms even the most resource-constrained MCU-based IoT edge devices into intelligent devices that can locally build their own knowledge base on-the-fly using the live data, thus creating smart self-learning and autonomous problem-solving devices. Train++ algorithm is extensively evaluated on 5 popular MCUboards, using 7 datasets of varying sizes and feature dimensions. A few exciting findings when analyzing the evaluation results are: (i) The proposed method reduces the onboard binary classifier training time by ≈ 10-226 sec across various commodity MCUs; (ii) Train++ can infer on MCUs for the entire test set in real-time of 1 ms; (iii) The accuracy improved by 5.15-7.3% since the incremental characteristic of Train++ enabled the loading of full n-samples of the high-dimensional datasets even on MCUs with only a few hundred kBs of memory.
Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems, 2021
Transmitting updates of high-dimensional models between client IoT devices and the central aggreg... more Transmitting updates of high-dimensional models between client IoT devices and the central aggregating server has always been a bottleneck in collaborative learning-especially in uncertain realworld IoT networks where congestion, latency, bandwidth issues are common. In this scenario, gradient quantization is an effective way to reduce bits count when transmitting each model update, but with a trade-off of having an elevated error floor due to higher variance of the stochastic gradients. In this paper, we propose Elas-tiCL, an elastic quantization strategy that achieves transmission efficiency plus a low error floor by dynamically altering the number of quantization levels during training on distributed IoT devices. Experiments on training ResNet-18, Vanilla CNN shows that ElastiCL can converge in much fewer transmitted bits than fixed quantization level, with little or no compromise on training and test accuracy. CCS CONCEPTS • Computing methodologies → Distributed algorithms.
Advances in Geographic Information Science, 2017
Sound visualization techniques for spatio-temporal data have potential to reveal interesting insi... more Sound visualization techniques for spatio-temporal data have potential to reveal interesting insights from GIS data. Visual analyticshence-has captured attention of GIS researchers in recent past. Further, developments in free tools such as Google Map API, Open Streep Maps provide easy access to geospatial data that can be leveraged by Visual Analytics. In this paper we propose such a system which utilizes free GIS APIs to visualize spatio-temporal data effectively. Our application allows user to create time slide bar control to connect time and position of various GIS objects on the map and further displays them in animated mode. The control can handle vector and raster data with equal ease. Because of the effective visualization it is very easy to understand some complex spatio-temporal patterns as exemplified in the paper.
Today cloud computing offers highly scalable and dynamic services over internet. It provides serv... more Today cloud computing offers highly scalable and dynamic services over internet. It provides services to user with more flexibility and less cost. But there are various security concerns on moving these applications to cloud. Lot of the technical security concerns have been raised in recent years like browser security, cloud malware injection attacks and various integrity and binding issues .Various personal and sensitive information of user is in the cloud which make it vulnerable. Privacy is very important in term of legal framework, data governance and user trust. This paper proposes the various security, privacy and trust issues and what the legal methodologies should be drafted and adopted so that these problems can be overcome.
2021 IEEE International Conference on Big Data (Big Data)
The concept of ML model aggregation rather than data aggregation has gained much attention as it ... more The concept of ML model aggregation rather than data aggregation has gained much attention as it boosts prediction performance while maintaining stability and preserving privacy. In a non-ideal scenario, there are chances for a base model trained on a single device to make independent but complementary errors. To handle such cases, in this paper, we implement and release the code of 8 robust ML model combining methods that achieves reliable prediction results by combining numerous base models (trained on many devices) to form a central model that effectively limits errors, built-in randomness and uncertainties. We extensively test the model combining performance by performing 15 heterogeneous devices and 3 datasets based experiments that exemplifies how a complicated collective intelligence can be derived from numerous elementary intelligence learned by distributed, ubiquitous IoT devices.
2021 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/IOP/SCI), 2021
The majority of Internet of Things (IoT) devices are tiny embedded systems with a micro-controlle... more The majority of Internet of Things (IoT) devices are tiny embedded systems with a micro-controller unit (MCU) as its brain. The memory footprint (SRAM, Flash, and EEPROM) of such MCU-based devices is often very limited, restricting onboard Machine Learning (ML) model training for large trainsets with high feature dimensions. To cope with memory issues, the current edge analytics approaches train high-quality ML models on the cloud GPUs (uses large volume historical data), then deploy the deep optimized version of the resultant models on edge devices for inference. Such approaches are inefficient in concept drift situations where the data generated at the device level vary frequently, and trained models are clueless on how to behave if previously unseen data arrives. In this paper, we present Train++, an incremental training algorithm that trains ML models locally at the device level (e.g., on MCUs and small CPUs) using the full n-samples of high-dimensional data. Train++ transforms even the most resource-constrained MCU-based IoT edge devices into intelligent devices that can locally build their own knowledge base on-the-fly using the live data, thus creating smart self-learning and autonomous problem-solving devices. Train++ algorithm is extensively evaluated on 5 popular MCUboards, using 7 datasets of varying sizes and feature dimensions. A few exciting findings when analyzing the evaluation results are: (i) The proposed method reduces the onboard binary classifier training time by ≈ 10-226 sec across various commodity MCUs; (ii) Train++ can infer on MCUs for the entire test set in real-time of 1 ms; (iii) The accuracy improved by 5.15-7.3% since the incremental characteristic of Train++ enabled the loading of full n-samples of the high-dimensional datasets even on MCUs with only a few hundred kBs of memory.
Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems, 2021
Transmitting updates of high-dimensional models between client IoT devices and the central aggreg... more Transmitting updates of high-dimensional models between client IoT devices and the central aggregating server has always been a bottleneck in collaborative learning-especially in uncertain realworld IoT networks where congestion, latency, bandwidth issues are common. In this scenario, gradient quantization is an effective way to reduce bits count when transmitting each model update, but with a trade-off of having an elevated error floor due to higher variance of the stochastic gradients. In this paper, we propose Elas-tiCL, an elastic quantization strategy that achieves transmission efficiency plus a low error floor by dynamically altering the number of quantization levels during training on distributed IoT devices. Experiments on training ResNet-18, Vanilla CNN shows that ElastiCL can converge in much fewer transmitted bits than fixed quantization level, with little or no compromise on training and test accuracy. CCS CONCEPTS • Computing methodologies → Distributed algorithms.
Advances in Geographic Information Science, 2017
Sound visualization techniques for spatio-temporal data have potential to reveal interesting insi... more Sound visualization techniques for spatio-temporal data have potential to reveal interesting insights from GIS data. Visual analyticshence-has captured attention of GIS researchers in recent past. Further, developments in free tools such as Google Map API, Open Streep Maps provide easy access to geospatial data that can be leveraged by Visual Analytics. In this paper we propose such a system which utilizes free GIS APIs to visualize spatio-temporal data effectively. Our application allows user to create time slide bar control to connect time and position of various GIS objects on the map and further displays them in animated mode. The control can handle vector and raster data with equal ease. Because of the effective visualization it is very easy to understand some complex spatio-temporal patterns as exemplified in the paper.
Today cloud computing offers highly scalable and dynamic services over internet. It provides serv... more Today cloud computing offers highly scalable and dynamic services over internet. It provides services to user with more flexibility and less cost. But there are various security concerns on moving these applications to cloud. Lot of the technical security concerns have been raised in recent years like browser security, cloud malware injection attacks and various integrity and binding issues .Various personal and sensitive information of user is in the cloud which make it vulnerable. Privacy is very important in term of legal framework, data governance and user trust. This paper proposes the various security, privacy and trust issues and what the legal methodologies should be drafted and adopted so that these problems can be overcome.