hardik Sharma - Academia.edu (original) (raw)

Papers by hardik Sharma

Research paper thumbnail of BitCom: A Commerce Model on Blockchain

2020 6th International Conference on Signal Processing and Communication (ICSC)

The supply chain is a complex framework of facilities working together to gain profits. Rapid cha... more The supply chain is a complex framework of facilities working together to gain profits. Rapid changes in technology have made the commerce a place of high competition and hugely affected by price fluctuation. BitCom is a proposed architecture bolstered by the emerging concept of blockchain. It aims to provide a clean and efficient supply chain to decrease system-wide costs. It is a framework deriving its roots form both- permissioned and permissionless blockchains to achieve a feasible solution to the problem of automating commerce. The model stays true to the roots of the supply chain while maintaining security. It uses flexible smart contracts for negotiations and transactions while permissionless entry is provided to the users. The concept of a credibility score has been introduced to help introduce trust in a decentralised network.

Research paper thumbnail of Increasing Performance of Boolean Retrieval Model by Data Parallelism Technique

Recent Developments in Artificial Intelligence and Communication Technologies

Information retrieval (IR) is to identify documents of non-uniform behaviorthat fulfill informati... more Information retrieval (IR) is to identify documents of non-uniform behaviorthat fulfill information requirements from the huge repository (maintained in computersystems). Different models have been defined to retrieve/fetch information. Forexample, the Boolean model, the Statistical model, which focuses on the vector spaceand probabilistic retrieval, and the Linguistic and Knowledge-based retrieval models.The Boolean model is defined as the “perfect match” model. If the queries are notaccurate, they retrieve/fetch some irrelevant documents. This is called the precision (p)rate, which is the proportion of the relevant retrieved documents. The Boolean methodprovides good techniques to elaborate or concise a query. The Boolean method workswell for the search process because of the clarity between the concepts. The Booleanretrieval model processes the queries in which terms of the queries are in the form ofBoolean expressions, that is, in which terms of the user query combined with AND(...

Research paper thumbnail of The impact of 3D stacking on GPU-accelerated deep neural networks: An experimental study

2016 IEEE International 3D Systems Integration Conference (3DIC)

In this work, we present a two-tier air-cooled thermal testbed composed of an NVIDIA Tesla K40 GP... more In this work, we present a two-tier air-cooled thermal testbed composed of an NVIDIA Tesla K40 GPU and a heater/thermometer top die. The top die has four independentlycontrollable heaters, which can emulate a wide range of components, ranging from low power memory to high-performance multi-core processor cores. The performance and temperature of the bottom-tier GPU on several deep neural network workloads is investigated as a function of increasing top-die power dissipation, and the implications for 3DIC cooling are discussed. Index Terms-Three-dimensional integrated circuits, deep neural networks, thermal management of electronics, thermal resistance.

Research paper thumbnail of Bit-Parallel Vector Composability for Neural Acceleration

2020 57th ACM/IEEE Design Automation Conference (DAC), 2020

Conventional neural accelerators rely on isolated self-sufficient functional units that perform a... more Conventional neural accelerators rely on isolated self-sufficient functional units that perform an atomic operation while communicating the results through an operand delivery-aggregation logic. Each single unit processes all the bits of their operands atomically and produce all the bits of the results in isolation. This paper explores a different design style, where each unit is only responsible for a slice of the bit-level operations to interleave and combine the benefits of bit-level parallelism with the abundant data-level parallelism in deep neural networks. A dynamic collection of these units cooperate at runtime to generate bits of the results, collectively. Such cooperation requires extracting new grouping between the bits, which is only possible if the operands and operations are vectorizable. The abundance of Data-Level Parallelism and mostly repeated execution patterns, provides a unique opportunity to define and leverage this new dimension of Bit-Parallel Vector Composability. This design intersperses bit parallelism within data-level parallelism and dynamically interweaves the two together. As such, the building block of our neural accelerator is a Composable Vector Unit that is a collection of Narrower-Bitwidth Vector Engines, which are dynamically composed or decomposed at the bit granularity. Using six diverse CNN and LSTM deep networks, we evaluate this design style across four design points: with and without algorithmic bitwidth heterogeneity and with and without availability of a high-bandwidth off-chip memory. Across these four design points, Bit-Parallel Vector Composability brings (1.4×to 3.5×) speedup and (1.1×to 2.7×) energy reduction. We also comprehensively compare our design style to the Nvidia's RTX 2080 TI GPU, which also supports INT-4 execution. The benefits range between 28.0×and 33.7×improvement in Performance-per-Watt.

Research paper thumbnail of Indian single pellet injection system for plasma fuelling studies

Research paper thumbnail of Secure Distributed Data Storage for Industrial Employee Health

International journal of scientific research in science, engineering and technology, Apr 6, 2019

Research paper thumbnail of Scale-out acceleration for machine learning

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

The growing scale and complexity of Machine Learning (ML) algorithms has resulted in prevalent us... more The growing scale and complexity of Machine Learning (ML) algorithms has resulted in prevalent use of distributed general-purpose systems. In a rather disjoint effort, the community is focusing mostly on high performance single-node accelerators for learning. This work bridges these two paradigms and offers CoSMIC, a full computing stack constituting language, compiler, system software, template architecture, and circuit generators, that enable programmable acceleration of learning at scale. CoSMIC enables programmers to exploit scale-out acceleration using FPGAs and Programmable ASICs (P-ASICs) from a high-level and mathematical Domain-Specific Language (DSL). Nonetheless, CoSMIC does not require programmers to delve into the onerous task of system software development or hardware design. CoSMIC achieves three conflicting objectives of efficiency, automation, and programmability, by integrating a novel multi-threaded template accelerator architecture and a cohesive stack that generates the hardware and software code from its high-level DSL. CoSMIC can accelerate a wide range of learning algorithms that are most commonly trained using parallel variants of gradient descent. The key is to distribute partial gradient calculations of the learning algorithms across the accelerator-augmented nodes of the scale-out system. Additionally, CoSMIC leverages the parallelizability of the algorithms to offer multi-threaded acceleration within each node. Multi-threading allows CoSMIC to efficiently exploit the numerous resources that are becoming available on modern FPGAs/P-ASICs by striking a balance between multi-threaded parallelism and singlethreaded performance. CoSMIC takes advantage of algorithmic properties of ML to offer a specialized system software that optimizes task allocation, role-assignment, thread management, and internode communication. We evaluate the versatility and efficiency of CoSMIC for 10 different machine learning applications from various domains. On average, a 16-node CoSMIC with UltraScale+ FPGAs offers 18.8⇥ speedup over a 16-node Spark system with Xeon processors while the programmer only writes 22-55 lines of code. CoSMIC offers higher scalability compared to the state-of-the-art Spark; scaling from 4 to 16 nodes with CoSMIC yields 2.7⇥ improvements whereas Spark offers 1.8⇥. These results confirm that the full-stack approach of CoSMIC takes an effective and vital step towards enabling scale-out acceleration for machine learning.

Research paper thumbnail of Campus Placement Prediction System Using Deep Neural Networks

Advances in Intelligent Systems and Computing, 2021

Research paper thumbnail of Geophysical Exploration of Aquifer Depth at Gopali, Kharagpur-I Block, Paschim Midnapore District, West Bengal, India

SSRN Electronic Journal, 2021

The overuse of water from aquifers has resulted in a decrease in the groundwater table. Gopali, a... more The overuse of water from aquifers has resulted in a decrease in the groundwater table. Gopali, a small village on the outskirts of Kharagpur, has some clean drinking water requirements. Hence, the goal of the study was to determine the depth of the aquifer in this location that would generate a large volume of water, so that a tube well could be drilled and provide enough water for the settlement. As a result, we identified seven favorable locations for Vertical Electrical Soundings in order to better understand the area's subsurface geology and assess the probability of an aquifer (VES1 to VES 7 respectively). The study dealt with the collection of resistivity data, assessing it, and survey the aquifer layers using this method. In this situation, the electrode spacing was determined using a Schlumberger electrode approach. IGIS DDR3 DC resistivity meter with 600m transverse extension (AB) was used to conduct the exploration at Gopali, block Kharagpur-I, Paschim Midnapore distr...

Research paper thumbnail of Face Mask Detection: A Real-Time Android Application Based on Deep Learning Modeling

2021 IEEE 12th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), 2021

The accelerated spread of the COVID-19 (coronavirus) disease has put stress on healthcare systems... more The accelerated spread of the COVID-19 (coronavirus) disease has put stress on healthcare systems. Some safety measures are provided, such as keeping social distance and wearing a mask, which can help curb transmission and save lives. This paper aims to detect whether a person is wearing a mask or not with video surveillance to enforce health and safety regulations in real-time. We propose a solution for face mask detection using two deep learning models, the MobileNetV2 and the Modified Convolutional Neural Network (MCNN). The trained models are converted to TensorFlow Lite to deploy an Android Application. Our models can achieve up to 99% accuracy. In this paper, an analysis of the number of individuals not wearing masks is provided by capturing the face and storing it on a mobile-backend-as-a-service. Our application can be adopted to increase health measures in real-time and control the spread of COVID-19.

Research paper thumbnail of Stability Analysis of Motorcycle by Varying Castor Angle and Trail - A Technical Review

International Journal for Scientific Research and Development, 2015

In today’s era where the population of motorcycles has increased highly compared to any other mea... more In today’s era where the population of motorcycles has increased highly compared to any other means of routine transportation, hence the steer ability, handling and stability become important aspects designing the vehicle. Motorcycles makeup a complex dynamic system that requires a careful analysis to understand their working and behaviour in varied conditions. The two wheelers are statically unstable and portray more complex behaviour dynamically. The relation of castor angle and trail has proven to be an intricate one yet very pivotal in defining stability of the motorcycle. However, stability is not only dependant on these two factors but also depends on the various other parameters.

Research paper thumbnail of Cancer-Gene Commonality Network based on Efficient Retrieval Methods

2019 International Conference on Computing, Power and Communication Technologies (GUCON), 2019

Over the years, it has been established by various researchers that cancer is a genetic disease. ... more Over the years, it has been established by various researchers that cancer is a genetic disease. Many of the current investigations are dedicated to finding new genes affecting cancerous cell development and growth. KEGG is one of the many repositories of diseases and their related genes that facilitate these researches. But KEGG is not updated timely and fails to account for the various aliases of the same gene. Hence, this paper provides a repository that incorporates the shortcomings of KEGG while also acting as a visualisation aid for linking various cancers together on a genetic level. We have developed an algorithm that collects cancers and genes reported in different research papers efficiently using machine learning methods. Our research has established genetic commonality between contrasting types of cancers. It aims to visualise a network of common genes that affect different types of cancers and prove that disparate types of cancers are in fact not as discrete as hoped. R...

Research paper thumbnail of Accelerated Deep Learning for the Edge-to-Cloud continuum: a Specialized Full Stack derived from Algorithms

Research paper thumbnail of Analysis of deficits of highway to propose for better geometry

International Journal of Advance Research, Ideas and Innovations in Technology, 2019

For the Design of any roadway, there are many governing parameters that affect the total efficien... more For the Design of any roadway, there are many governing parameters that affect the total efficiency output as perceived by the driver. The roadway should be designed for perfect Geometric features. This paper involves the analysis or in other words the complete finding the wear-about of the deficits of an existing roadway and to come up with a proposal for the improved Geometric proposal for the roadway using DMRB as the basis for the design. There is much research going on the same deficiency improvement related aspects and still needs a lot of further scope of improvement in order to achieve the complete safety of a proposed roadway. Considering the ongoing research on the geometric improvements on the existing roadways the optimistic user as well as Eco-friendly design is to opted for our site

Research paper thumbnail of A New Model for Emotion Prediction in Music

2020 6th International Conference on Signal Processing and Communication (ICSC), 2020

Music based sentiment analysis has various applications in the form of music recommendation syste... more Music based sentiment analysis has various applications in the form of music recommendation system, sales and advertisement etc. Various studies have dealt with lyrics and used Natural Language Processing to perform sentiment analysis. Others have directed their focus on the audio features to find relevant answers. But the biggest challenge faced while predicting music emotions is that no music depicts only a single emotion. Therefore, in this study, Russell’s scale is used to predict arousal and valence, rather than emotion. Audio feature selection via Multi-linear Regression is performed and comparative study is done between Linear Support Vector Machine, Decision Tree, Kernel SVM, K nearest neighbours (K-NN), Naive Bayes, Logistic Regression and Random Forest on the audio features. Moreover, a hybrid model based on Multi-Layer Perceptron is proposed to enhance the precision of the predictions. The data set of this research has been taken from PMEmo 2019 data.

Research paper thumbnail of Mixed-Signal Charge-Domain Acceleration of Deep Neural Networks through Interleaved Bit-Partitioned Arithmetic

Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques, 2020

Albeit low-power, mixed-signal circuitry suffers from significant overhead of Analog to Digital (... more Albeit low-power, mixed-signal circuitry suffers from significant overhead of Analog to Digital (A/D) conversion, limited range for information encoding, and susceptibility to noise. This paper aims to address these challenges by offering and leveraging the following mathematical insight regarding vector dot-product-the basic operator in Deep Neural Networks (DNNs). This operator can be reformulated as a wide regrouping of spatially parallel low-bitwidth calculations that are interleaved across the bit partitions of multiple elements of the vectors. As such, the computational building block of our accelerator becomes a wide bit-interleaved analog vector unit comprising a collection of low-bitwidth multiply-accumulate modules that operate in the analog domain and share a single A/D converter (ADC). This bit-partitioning results in a lower-resolution ADC while the wide regrouping alleviates the need for A/D conversion per operation, amortizing its cost across multiple bit-partitions of the vector elements. Moreover, the low-bitwidth modules require smaller encoding range and also provide larger margins for noise mitigation. We also utilize the switched-capacitor design for our bit-level reformulation of DNN operations. The proposed switchedcapacitor circuitry performs the regrouped multiplications in the charge domain and accumulates the results of the group in its capacitors over multiple cycles. The capacitive accumulation combined with wide bit-partitioned regrouping reduces the rate of A/D conversions, further improving the overall efficiency of the design. With such mathematical reformulation and its switched-capacitor implementation, we define one possible 3D-stacked microarchitecture, dubbed BiHiwe 1 , that leverages clustering and hierarchical design to best utilize power-efficiency of the mixed-signal domain and 3D stacking. We also build models for noise, computational nonidealities, and variations. For ten DNN benchmarks, BiHiwe delivers 5.5×speedup over a leading purely-digital 3D-stacked accelerator 1 BiHiwe: Bit-Partitioned and Interleaved Hierachy of Wide Acceleration through Electrical Charge This work is licensed under a Creative Commons Attribution International 4.0 License.

Research paper thumbnail of Planaria: Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks

2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020

Deep Neural Networks (DNNs) have reinvigorated real-world applications that rely on learning patt... more Deep Neural Networks (DNNs) have reinvigorated real-world applications that rely on learning patterns of data and are permeating into different industries and markets. Cloud infrastructure and accelerators that offer INFerence-as-a-Service (INFaaS) have become the enabler of this rather quick and invasive shift in the industry. To that end, mostly acceleratorbased INFaaS (Google's TPU [1], NVIDIA T4 [2], Microsoft Brainwave [3], etc.) has become the backbone of many real-life applications. However, as the demand for such services grows, merely scaling-out the number of accelerators is not economically cost-effective. Although multi-tenancy has propelled datacenter scalability, it has not been a primary factor in designing DNN accelerators due to the arms race for higher speed and efficiency. This paper sets out to explore this timely requirement of multitenancy through a new dimension: dynamic architecture fission. To that end, we define Planaria 1 that can dynamically fission (break) into multiple smaller yet full-fledged DNN engines at runtime. This microarchitectural capability enables spatially colocating multiple DNN inference services on the same hardware, offering simultaneous multi-tenant DNN acceleration. To realize this dynamic reconfigurability, we first devise breakable omnidirectional systolic arrays for DNN acceleration that allows omnidirectional flow of data. Second, it uses this capability and a unique organization of on-chip memory, interconnection, and compute resources to enable fission in systolic array based DNN accelerators. Architecture fission and its associated flexibility enables an extra degree of freedom for task scheduling, that even allows breaking the accelerator with regard to the server load, DNN topology, and task priority. As such, it can simultaneously co-locate DNNs to enhance utilization, throughput, QoS, and fairness. We compare the proposed design to PREMA [4], a recent effort that offers multi-tenancy by time-multiplexing the DNN accelerator across multiple tasks. We use the same frequency, the same amount of compute and memory resources for both accelerators. The results show significant benefits with (soft, medium, hard) QoS requirements, in throughput (7.4×, 7.2×, 12.2×), SLA satisfaction rate (45%, 15%, 16%), and fairness (2.1×, 2.3×, 1.9×).

Research paper thumbnail of Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network

2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), 2018

Hardware acceleration of Deep Neural Networks (DNNs) aims to tame their enormous compute intensit... more Hardware acceleration of Deep Neural Networks (DNNs) aims to tame their enormous compute intensity. Fully realizing the potential of acceleration in this domain requires understanding and leveraging algorithmic properties of DNNs. This paper builds upon the algorithmic insight that bitwidth of operations in DNNs can be reduced without compromising their classification accuracy. However, to prevent loss of accuracy, the bitwidth varies significantly across DNNs and it may even be adjusted for each layer individually. Thus, a fixed-bitwidth accelerator would either offer limited benefits to accommodate the worst-case bitwidth requirements, or inevitably lead to a degradation in final accuracy. To alleviate these deficiencies, this work introduces dynamic bit-level fusion/decomposition as a new dimension in the design of DNN accelerators. We explore this dimension by designing Bit Fusion, a bit-flexible accelerator, that constitutes an array of bit-level processing elements that dynamically fuse to match the bitwidth of individual DNN layers. This flexibility in the architecture enables minimizing the computation and the communication at the finest granularity possible with no loss in accuracy. We evaluate the benefits of Bit Fusion using eight real-world feed-forward and recurrent DNNs. The proposed microarchitecture is implemented in Verilog and synthesized in 45 nm technology. Using the synthesis results and cycle accurate simulation, we compare the benefits of Bit Fusion to two state-of-the-art DNN accelerators, Eyeriss [1] and Stripes [2]. In the same area, frequency, and process technology, Bit Fusion offers 3.9× speedup and 5.1× energy savings over Eyeriss. Compared to Stripes, Bit Fusion provides 2.6× speedup and 3.9× energy reduction at 45 nm node when Bit Fusion area and frequency are set to those of Stripes. Scaling to GPU technology node of 16 nm, Bit Fusion almost matches the performance of a 250-Watt Titan Xp, which uses 8-bit vector instructions, while Bit Fusion merely consumes 895 milliwatts of power.

Research paper thumbnail of Gauss Jordan method for balancing chemical equation for different materials

Materials Today: Proceedings, 2021

Abstract The Gauss Jordan method is used in this study to equalize chemical reactions using a sys... more Abstract The Gauss Jordan method is used in this study to equalize chemical reactions using a system of linear equations. One of the most common topics in Chemistry is balancing chemical reaction equations. Students typically struggle with balancing the chemical equation and find it difficult to understand; sometimes teachers struggle as well to balance chemical equations. The results of the equation balancing comply with the law on the conservation of matter and confirm that the existing methods for balance of chemical equations do not contradict each other. To solve the mathematical problem, the Gauss-Jordan method was used. Any chemical reaction can be handled using this method by certain reactants and products. we can add several other features to the python application which can check whether the chemical equation is already balanced or not, we can also categorise the type of the chemical reaction.

Research paper thumbnail of Influence of nutrient sources on growth, fruit quality and economics of guava under Chhattisgarh plain

Indian Journal of Horticulture, 2019

Research paper thumbnail of BitCom: A Commerce Model on Blockchain

2020 6th International Conference on Signal Processing and Communication (ICSC)

The supply chain is a complex framework of facilities working together to gain profits. Rapid cha... more The supply chain is a complex framework of facilities working together to gain profits. Rapid changes in technology have made the commerce a place of high competition and hugely affected by price fluctuation. BitCom is a proposed architecture bolstered by the emerging concept of blockchain. It aims to provide a clean and efficient supply chain to decrease system-wide costs. It is a framework deriving its roots form both- permissioned and permissionless blockchains to achieve a feasible solution to the problem of automating commerce. The model stays true to the roots of the supply chain while maintaining security. It uses flexible smart contracts for negotiations and transactions while permissionless entry is provided to the users. The concept of a credibility score has been introduced to help introduce trust in a decentralised network.

Research paper thumbnail of Increasing Performance of Boolean Retrieval Model by Data Parallelism Technique

Recent Developments in Artificial Intelligence and Communication Technologies

Information retrieval (IR) is to identify documents of non-uniform behaviorthat fulfill informati... more Information retrieval (IR) is to identify documents of non-uniform behaviorthat fulfill information requirements from the huge repository (maintained in computersystems). Different models have been defined to retrieve/fetch information. Forexample, the Boolean model, the Statistical model, which focuses on the vector spaceand probabilistic retrieval, and the Linguistic and Knowledge-based retrieval models.The Boolean model is defined as the “perfect match” model. If the queries are notaccurate, they retrieve/fetch some irrelevant documents. This is called the precision (p)rate, which is the proportion of the relevant retrieved documents. The Boolean methodprovides good techniques to elaborate or concise a query. The Boolean method workswell for the search process because of the clarity between the concepts. The Booleanretrieval model processes the queries in which terms of the queries are in the form ofBoolean expressions, that is, in which terms of the user query combined with AND(...

Research paper thumbnail of The impact of 3D stacking on GPU-accelerated deep neural networks: An experimental study

2016 IEEE International 3D Systems Integration Conference (3DIC)

In this work, we present a two-tier air-cooled thermal testbed composed of an NVIDIA Tesla K40 GP... more In this work, we present a two-tier air-cooled thermal testbed composed of an NVIDIA Tesla K40 GPU and a heater/thermometer top die. The top die has four independentlycontrollable heaters, which can emulate a wide range of components, ranging from low power memory to high-performance multi-core processor cores. The performance and temperature of the bottom-tier GPU on several deep neural network workloads is investigated as a function of increasing top-die power dissipation, and the implications for 3DIC cooling are discussed. Index Terms-Three-dimensional integrated circuits, deep neural networks, thermal management of electronics, thermal resistance.

Research paper thumbnail of Bit-Parallel Vector Composability for Neural Acceleration

2020 57th ACM/IEEE Design Automation Conference (DAC), 2020

Conventional neural accelerators rely on isolated self-sufficient functional units that perform a... more Conventional neural accelerators rely on isolated self-sufficient functional units that perform an atomic operation while communicating the results through an operand delivery-aggregation logic. Each single unit processes all the bits of their operands atomically and produce all the bits of the results in isolation. This paper explores a different design style, where each unit is only responsible for a slice of the bit-level operations to interleave and combine the benefits of bit-level parallelism with the abundant data-level parallelism in deep neural networks. A dynamic collection of these units cooperate at runtime to generate bits of the results, collectively. Such cooperation requires extracting new grouping between the bits, which is only possible if the operands and operations are vectorizable. The abundance of Data-Level Parallelism and mostly repeated execution patterns, provides a unique opportunity to define and leverage this new dimension of Bit-Parallel Vector Composability. This design intersperses bit parallelism within data-level parallelism and dynamically interweaves the two together. As such, the building block of our neural accelerator is a Composable Vector Unit that is a collection of Narrower-Bitwidth Vector Engines, which are dynamically composed or decomposed at the bit granularity. Using six diverse CNN and LSTM deep networks, we evaluate this design style across four design points: with and without algorithmic bitwidth heterogeneity and with and without availability of a high-bandwidth off-chip memory. Across these four design points, Bit-Parallel Vector Composability brings (1.4×to 3.5×) speedup and (1.1×to 2.7×) energy reduction. We also comprehensively compare our design style to the Nvidia's RTX 2080 TI GPU, which also supports INT-4 execution. The benefits range between 28.0×and 33.7×improvement in Performance-per-Watt.

Research paper thumbnail of Indian single pellet injection system for plasma fuelling studies

Research paper thumbnail of Secure Distributed Data Storage for Industrial Employee Health

International journal of scientific research in science, engineering and technology, Apr 6, 2019

Research paper thumbnail of Scale-out acceleration for machine learning

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

The growing scale and complexity of Machine Learning (ML) algorithms has resulted in prevalent us... more The growing scale and complexity of Machine Learning (ML) algorithms has resulted in prevalent use of distributed general-purpose systems. In a rather disjoint effort, the community is focusing mostly on high performance single-node accelerators for learning. This work bridges these two paradigms and offers CoSMIC, a full computing stack constituting language, compiler, system software, template architecture, and circuit generators, that enable programmable acceleration of learning at scale. CoSMIC enables programmers to exploit scale-out acceleration using FPGAs and Programmable ASICs (P-ASICs) from a high-level and mathematical Domain-Specific Language (DSL). Nonetheless, CoSMIC does not require programmers to delve into the onerous task of system software development or hardware design. CoSMIC achieves three conflicting objectives of efficiency, automation, and programmability, by integrating a novel multi-threaded template accelerator architecture and a cohesive stack that generates the hardware and software code from its high-level DSL. CoSMIC can accelerate a wide range of learning algorithms that are most commonly trained using parallel variants of gradient descent. The key is to distribute partial gradient calculations of the learning algorithms across the accelerator-augmented nodes of the scale-out system. Additionally, CoSMIC leverages the parallelizability of the algorithms to offer multi-threaded acceleration within each node. Multi-threading allows CoSMIC to efficiently exploit the numerous resources that are becoming available on modern FPGAs/P-ASICs by striking a balance between multi-threaded parallelism and singlethreaded performance. CoSMIC takes advantage of algorithmic properties of ML to offer a specialized system software that optimizes task allocation, role-assignment, thread management, and internode communication. We evaluate the versatility and efficiency of CoSMIC for 10 different machine learning applications from various domains. On average, a 16-node CoSMIC with UltraScale+ FPGAs offers 18.8⇥ speedup over a 16-node Spark system with Xeon processors while the programmer only writes 22-55 lines of code. CoSMIC offers higher scalability compared to the state-of-the-art Spark; scaling from 4 to 16 nodes with CoSMIC yields 2.7⇥ improvements whereas Spark offers 1.8⇥. These results confirm that the full-stack approach of CoSMIC takes an effective and vital step towards enabling scale-out acceleration for machine learning.

Research paper thumbnail of Campus Placement Prediction System Using Deep Neural Networks

Advances in Intelligent Systems and Computing, 2021

Research paper thumbnail of Geophysical Exploration of Aquifer Depth at Gopali, Kharagpur-I Block, Paschim Midnapore District, West Bengal, India

SSRN Electronic Journal, 2021

The overuse of water from aquifers has resulted in a decrease in the groundwater table. Gopali, a... more The overuse of water from aquifers has resulted in a decrease in the groundwater table. Gopali, a small village on the outskirts of Kharagpur, has some clean drinking water requirements. Hence, the goal of the study was to determine the depth of the aquifer in this location that would generate a large volume of water, so that a tube well could be drilled and provide enough water for the settlement. As a result, we identified seven favorable locations for Vertical Electrical Soundings in order to better understand the area's subsurface geology and assess the probability of an aquifer (VES1 to VES 7 respectively). The study dealt with the collection of resistivity data, assessing it, and survey the aquifer layers using this method. In this situation, the electrode spacing was determined using a Schlumberger electrode approach. IGIS DDR3 DC resistivity meter with 600m transverse extension (AB) was used to conduct the exploration at Gopali, block Kharagpur-I, Paschim Midnapore distr...

Research paper thumbnail of Face Mask Detection: A Real-Time Android Application Based on Deep Learning Modeling

2021 IEEE 12th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), 2021

The accelerated spread of the COVID-19 (coronavirus) disease has put stress on healthcare systems... more The accelerated spread of the COVID-19 (coronavirus) disease has put stress on healthcare systems. Some safety measures are provided, such as keeping social distance and wearing a mask, which can help curb transmission and save lives. This paper aims to detect whether a person is wearing a mask or not with video surveillance to enforce health and safety regulations in real-time. We propose a solution for face mask detection using two deep learning models, the MobileNetV2 and the Modified Convolutional Neural Network (MCNN). The trained models are converted to TensorFlow Lite to deploy an Android Application. Our models can achieve up to 99% accuracy. In this paper, an analysis of the number of individuals not wearing masks is provided by capturing the face and storing it on a mobile-backend-as-a-service. Our application can be adopted to increase health measures in real-time and control the spread of COVID-19.

Research paper thumbnail of Stability Analysis of Motorcycle by Varying Castor Angle and Trail - A Technical Review

International Journal for Scientific Research and Development, 2015

In today’s era where the population of motorcycles has increased highly compared to any other mea... more In today’s era where the population of motorcycles has increased highly compared to any other means of routine transportation, hence the steer ability, handling and stability become important aspects designing the vehicle. Motorcycles makeup a complex dynamic system that requires a careful analysis to understand their working and behaviour in varied conditions. The two wheelers are statically unstable and portray more complex behaviour dynamically. The relation of castor angle and trail has proven to be an intricate one yet very pivotal in defining stability of the motorcycle. However, stability is not only dependant on these two factors but also depends on the various other parameters.

Research paper thumbnail of Cancer-Gene Commonality Network based on Efficient Retrieval Methods

2019 International Conference on Computing, Power and Communication Technologies (GUCON), 2019

Over the years, it has been established by various researchers that cancer is a genetic disease. ... more Over the years, it has been established by various researchers that cancer is a genetic disease. Many of the current investigations are dedicated to finding new genes affecting cancerous cell development and growth. KEGG is one of the many repositories of diseases and their related genes that facilitate these researches. But KEGG is not updated timely and fails to account for the various aliases of the same gene. Hence, this paper provides a repository that incorporates the shortcomings of KEGG while also acting as a visualisation aid for linking various cancers together on a genetic level. We have developed an algorithm that collects cancers and genes reported in different research papers efficiently using machine learning methods. Our research has established genetic commonality between contrasting types of cancers. It aims to visualise a network of common genes that affect different types of cancers and prove that disparate types of cancers are in fact not as discrete as hoped. R...

Research paper thumbnail of Accelerated Deep Learning for the Edge-to-Cloud continuum: a Specialized Full Stack derived from Algorithms

Research paper thumbnail of Analysis of deficits of highway to propose for better geometry

International Journal of Advance Research, Ideas and Innovations in Technology, 2019

For the Design of any roadway, there are many governing parameters that affect the total efficien... more For the Design of any roadway, there are many governing parameters that affect the total efficiency output as perceived by the driver. The roadway should be designed for perfect Geometric features. This paper involves the analysis or in other words the complete finding the wear-about of the deficits of an existing roadway and to come up with a proposal for the improved Geometric proposal for the roadway using DMRB as the basis for the design. There is much research going on the same deficiency improvement related aspects and still needs a lot of further scope of improvement in order to achieve the complete safety of a proposed roadway. Considering the ongoing research on the geometric improvements on the existing roadways the optimistic user as well as Eco-friendly design is to opted for our site

Research paper thumbnail of A New Model for Emotion Prediction in Music

2020 6th International Conference on Signal Processing and Communication (ICSC), 2020

Music based sentiment analysis has various applications in the form of music recommendation syste... more Music based sentiment analysis has various applications in the form of music recommendation system, sales and advertisement etc. Various studies have dealt with lyrics and used Natural Language Processing to perform sentiment analysis. Others have directed their focus on the audio features to find relevant answers. But the biggest challenge faced while predicting music emotions is that no music depicts only a single emotion. Therefore, in this study, Russell’s scale is used to predict arousal and valence, rather than emotion. Audio feature selection via Multi-linear Regression is performed and comparative study is done between Linear Support Vector Machine, Decision Tree, Kernel SVM, K nearest neighbours (K-NN), Naive Bayes, Logistic Regression and Random Forest on the audio features. Moreover, a hybrid model based on Multi-Layer Perceptron is proposed to enhance the precision of the predictions. The data set of this research has been taken from PMEmo 2019 data.

Research paper thumbnail of Mixed-Signal Charge-Domain Acceleration of Deep Neural Networks through Interleaved Bit-Partitioned Arithmetic

Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques, 2020

Albeit low-power, mixed-signal circuitry suffers from significant overhead of Analog to Digital (... more Albeit low-power, mixed-signal circuitry suffers from significant overhead of Analog to Digital (A/D) conversion, limited range for information encoding, and susceptibility to noise. This paper aims to address these challenges by offering and leveraging the following mathematical insight regarding vector dot-product-the basic operator in Deep Neural Networks (DNNs). This operator can be reformulated as a wide regrouping of spatially parallel low-bitwidth calculations that are interleaved across the bit partitions of multiple elements of the vectors. As such, the computational building block of our accelerator becomes a wide bit-interleaved analog vector unit comprising a collection of low-bitwidth multiply-accumulate modules that operate in the analog domain and share a single A/D converter (ADC). This bit-partitioning results in a lower-resolution ADC while the wide regrouping alleviates the need for A/D conversion per operation, amortizing its cost across multiple bit-partitions of the vector elements. Moreover, the low-bitwidth modules require smaller encoding range and also provide larger margins for noise mitigation. We also utilize the switched-capacitor design for our bit-level reformulation of DNN operations. The proposed switchedcapacitor circuitry performs the regrouped multiplications in the charge domain and accumulates the results of the group in its capacitors over multiple cycles. The capacitive accumulation combined with wide bit-partitioned regrouping reduces the rate of A/D conversions, further improving the overall efficiency of the design. With such mathematical reformulation and its switched-capacitor implementation, we define one possible 3D-stacked microarchitecture, dubbed BiHiwe 1 , that leverages clustering and hierarchical design to best utilize power-efficiency of the mixed-signal domain and 3D stacking. We also build models for noise, computational nonidealities, and variations. For ten DNN benchmarks, BiHiwe delivers 5.5×speedup over a leading purely-digital 3D-stacked accelerator 1 BiHiwe: Bit-Partitioned and Interleaved Hierachy of Wide Acceleration through Electrical Charge This work is licensed under a Creative Commons Attribution International 4.0 License.

Research paper thumbnail of Planaria: Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks

2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020

Deep Neural Networks (DNNs) have reinvigorated real-world applications that rely on learning patt... more Deep Neural Networks (DNNs) have reinvigorated real-world applications that rely on learning patterns of data and are permeating into different industries and markets. Cloud infrastructure and accelerators that offer INFerence-as-a-Service (INFaaS) have become the enabler of this rather quick and invasive shift in the industry. To that end, mostly acceleratorbased INFaaS (Google's TPU [1], NVIDIA T4 [2], Microsoft Brainwave [3], etc.) has become the backbone of many real-life applications. However, as the demand for such services grows, merely scaling-out the number of accelerators is not economically cost-effective. Although multi-tenancy has propelled datacenter scalability, it has not been a primary factor in designing DNN accelerators due to the arms race for higher speed and efficiency. This paper sets out to explore this timely requirement of multitenancy through a new dimension: dynamic architecture fission. To that end, we define Planaria 1 that can dynamically fission (break) into multiple smaller yet full-fledged DNN engines at runtime. This microarchitectural capability enables spatially colocating multiple DNN inference services on the same hardware, offering simultaneous multi-tenant DNN acceleration. To realize this dynamic reconfigurability, we first devise breakable omnidirectional systolic arrays for DNN acceleration that allows omnidirectional flow of data. Second, it uses this capability and a unique organization of on-chip memory, interconnection, and compute resources to enable fission in systolic array based DNN accelerators. Architecture fission and its associated flexibility enables an extra degree of freedom for task scheduling, that even allows breaking the accelerator with regard to the server load, DNN topology, and task priority. As such, it can simultaneously co-locate DNNs to enhance utilization, throughput, QoS, and fairness. We compare the proposed design to PREMA [4], a recent effort that offers multi-tenancy by time-multiplexing the DNN accelerator across multiple tasks. We use the same frequency, the same amount of compute and memory resources for both accelerators. The results show significant benefits with (soft, medium, hard) QoS requirements, in throughput (7.4×, 7.2×, 12.2×), SLA satisfaction rate (45%, 15%, 16%), and fairness (2.1×, 2.3×, 1.9×).

Research paper thumbnail of Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network

2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), 2018

Hardware acceleration of Deep Neural Networks (DNNs) aims to tame their enormous compute intensit... more Hardware acceleration of Deep Neural Networks (DNNs) aims to tame their enormous compute intensity. Fully realizing the potential of acceleration in this domain requires understanding and leveraging algorithmic properties of DNNs. This paper builds upon the algorithmic insight that bitwidth of operations in DNNs can be reduced without compromising their classification accuracy. However, to prevent loss of accuracy, the bitwidth varies significantly across DNNs and it may even be adjusted for each layer individually. Thus, a fixed-bitwidth accelerator would either offer limited benefits to accommodate the worst-case bitwidth requirements, or inevitably lead to a degradation in final accuracy. To alleviate these deficiencies, this work introduces dynamic bit-level fusion/decomposition as a new dimension in the design of DNN accelerators. We explore this dimension by designing Bit Fusion, a bit-flexible accelerator, that constitutes an array of bit-level processing elements that dynamically fuse to match the bitwidth of individual DNN layers. This flexibility in the architecture enables minimizing the computation and the communication at the finest granularity possible with no loss in accuracy. We evaluate the benefits of Bit Fusion using eight real-world feed-forward and recurrent DNNs. The proposed microarchitecture is implemented in Verilog and synthesized in 45 nm technology. Using the synthesis results and cycle accurate simulation, we compare the benefits of Bit Fusion to two state-of-the-art DNN accelerators, Eyeriss [1] and Stripes [2]. In the same area, frequency, and process technology, Bit Fusion offers 3.9× speedup and 5.1× energy savings over Eyeriss. Compared to Stripes, Bit Fusion provides 2.6× speedup and 3.9× energy reduction at 45 nm node when Bit Fusion area and frequency are set to those of Stripes. Scaling to GPU technology node of 16 nm, Bit Fusion almost matches the performance of a 250-Watt Titan Xp, which uses 8-bit vector instructions, while Bit Fusion merely consumes 895 milliwatts of power.

Research paper thumbnail of Gauss Jordan method for balancing chemical equation for different materials

Materials Today: Proceedings, 2021

Abstract The Gauss Jordan method is used in this study to equalize chemical reactions using a sys... more Abstract The Gauss Jordan method is used in this study to equalize chemical reactions using a system of linear equations. One of the most common topics in Chemistry is balancing chemical reaction equations. Students typically struggle with balancing the chemical equation and find it difficult to understand; sometimes teachers struggle as well to balance chemical equations. The results of the equation balancing comply with the law on the conservation of matter and confirm that the existing methods for balance of chemical equations do not contradict each other. To solve the mathematical problem, the Gauss-Jordan method was used. Any chemical reaction can be handled using this method by certain reactants and products. we can add several other features to the python application which can check whether the chemical equation is already balanced or not, we can also categorise the type of the chemical reaction.

Research paper thumbnail of Influence of nutrient sources on growth, fruit quality and economics of guava under Chhattisgarh plain

Indian Journal of Horticulture, 2019