JI LI - Profile on Academia.edu (original) (raw)

Hardware Realization of Deep Learning System by JI LI

Recently, Deep Convolutional Neural Network (DCNN) has been recognized as the most effective mode... more Recently, Deep Convolutional Neural Network (DCNN) has been recognized as the most effective model for pattern recognition and classification tasks. With the fast growing Internet of Things (IoTs) and wearable devices, it becomes attractive to implement DCNNs in embedded and portable systems. However, novel computing paradigms are urgently required to deploy DCNNs that have huge power consumptions and complex topologies in systems with limited area and power supply. Recent works have demonstrated that Stochastic Computing (SC) can radically simplify the hardware implementation of arithmetic units and has the potential to bring the success of DCNNs to embedded systems. This paper introduces normalization and dropout, which are essential techniques for the state-of-the-art DCNNs, to the existing SC-based DCNN frameworks. In this work, the feature extraction block of DCNNs is implemented using an approximate parallel counter, a near-max pooling block and an SC-based rectified linear activation unit. A novel SC-based normalization design is proposed, which includes a square and summation unit, an activation unit and a division unit. The dropout technique is integrated into the training phase and the learned weights are adjusted during the hardware implementation. Experimental results on AlexNet with the ImageNet dataset show that the SC-based DCNN with the proposed normalization and dropout techniques achieves 3.26% top-1 accuracy improvement and 3.05% top-5 accuracy improvement compared with the SC-based DCNN without these two essential techniques, confirming the effectiveness of our normalization and dropout designs.

Proc. of ACM Great Lakes Symp. on VLSI (GLSVLSI), 2017

Recently, Deep Convolutional Neural Networks (DCNNs) have made tremendous advances, achieving clo... more Recently, Deep Convolutional Neural Networks (DCNNs) have made tremendous advances, achieving close to or even better accuracy than human-level perception in various tasks. Stochastic Computing (SC), as an alternate to the conventional binary computing paradigm, has the potential to enable massively parallel and highly scalable hardware implementations of DCNNs. In this paper, we design and optimize the SC based Softmax Regression function. Experiment results show that compared with a binary SR, the proposed SC-SR under longer bit stream can reach the same level of accuracy with the improvement of 295X, 62X, 2617X in terms of power, area and energy, respectively. Binary SR is suggested for future DCNNs with short bit stream length input whereas SC-SR is recommended for longer bit stream.

International Joint Conference on Neural Networks (IJCNN), 2017

—Recently, Deep Convolutional Neural Networks (DC-NNs) have made unprecedented progress, achievin... more —Recently, Deep Convolutional Neural Networks (DC-NNs) have made unprecedented progress, achieving the accuracy close to, or even better than human-level perception in various tasks. There is a timely need to map the latest software DCNNs to application-specific hardware, in order to achieve orders of magnitude improvement in performance, energy efficiency and compactness. Stochastic Computing (SC), as a low-cost alternative to the conventional binary computing paradigm, has the potential to enable massively parallel and highly scalable hardware implementation of DCNNs. One major challenge in SC based DCNNs is designing accurate nonlinear activation functions, which have a significant impact on the network-level accuracy but cannot be implemented accurately by existing SC computing blocks. In this paper, we design and optimize SC based neurons, and we propose highly accurate activation designs for the three most frequently used activation functions in software DCNNs, i.e, hyperbolic tangent, logistic, and rectified linear units. Experimental results on LeNet-5 using MNIST dataset demonstrate that compared with a binary ASIC hardware DCNN, the DCNN with the proposed SC neurons can achieve up to 61X, 151X, and 2X improvement in terms of area, power, and energy, respectively, at the cost of small precision degradation. In addition, the SC approach achieves up to 21X and 41X of the area, 41X and 72X of the power, and 198200X and 96443X of the energy, compared with CPU and GPU approaches, respectively, while the error is increased by less than 3.07%. ReLU activation is suggested for future SC based DCNNs considering its superior performance under a small bit stream length.

IEEE International Conference on Computer Design (ICCD), 2016

Deep Convolutional Neural Networks (DCNN), a branch of Deep Neural Networks which use the deep gr... more Deep Convolutional Neural Networks (DCNN), a branch of Deep Neural Networks which use the deep graph with multiple processing layers, enables the convolutional model to finely abstract the high-level features behind an image. Large-scale applications using DCNN mainly operate in high-performance server clusters , GPUs or FPGA clusters; it is restricted to extend the applications onto mobile/wearable devices and Internet-of-Things (IoT) entities due to high power/energy consumption. Stochastic Computing is a promising method to overcome this shortcoming used in specific hardware-based systems. Many complex arithmetic operations can be implemented with very simple hardware logic in the SC framework, which alleviates the extensive computation complexity. The exploration of network-wise optimization and the revision of network structure with respect to stochastic computing based hardware design have not been discussed in previous work. In this paper, we investigate Deep Stochastic Convolutional Neural Network (DSCNN) for DCNN using stochastic computing. The essential calculation components using SC are designed and evaluated. We propose a joint optimization method to collaborate components guaranteeing a high calculation accuracy in each stage of the network. The structure of original DSCNN is revised to accommodate SC hardware design's simplicity. Experimental Results show that as opposed to software inspired feature extraction block in DSCNN, an optimized hardware oriented feature extraction block achieves as higher as 59.27% calculation precision. And the optimized DSCNN can achieve only 3.48% network test error rate compared to 27.83% for baseline DSCNN using software inspired feature extraction block.

22nd Asia and South Pacific Design Automation Conference (ASP-DAC), 2017

—In recent years, Deep Convolutional Neural Network (DCNN) has become the dominant approach for a... more —In recent years, Deep Convolutional Neural Network (DCNN) has become the dominant approach for almost all recognition and detection tasks and outperformed humans on certain tasks. Nevertheless, the high power consumptions and complex topologies have hindered the widespread deployment of DCNNs, particularly in wearable devices and embedded systems with limited area and power budget. This paper presents a fully parallel and scalable hardware-based DCNN design using Stochastic Computing (SC), which leverages the energy-accuracy trade-off through optimizing SC components in different layers. We first conduct a detailed investigation of the Approximate Parallel Counter (APC) based neuron and multiplexer-based neuron using SC, and analyze the impacts of various design parameters, such as bit stream length and input number, on the energy/power/area/accuracy of the neuron cell. Then, from an architecture perspective, the influence of inaccuracy of neurons in different layers on the overall DCNN accuracy (i.e., software accuracy of the entire DCNN) is studied. Accordingly, a structure optimization method is proposed for a general DCNN architecture , in which neurons in different layers are implemented with optimized SC components, so as to reduce the area, power, and energy of the DCNN while maintaining the overall network performance in terms of accuracy. Experimental results show that the proposed approach can find a satisfactory DCNN configuration, which achieves 55X, 151X, and 2X improvement in terms of area, power and energy, respectively, while the error is increased by 2.86%, compared with the conventional binary ASIC implementation.

Proc. of Design Automation and Test in Europe (DATE), 2017

—Deep Convolutional Neural Networks (DCNNs) have been demonstrated as effective models for unders... more —Deep Convolutional Neural Networks (DCNNs) have been demonstrated as effective models for understanding image content. The computation behind DCNNs highly relies on the capability of hardware resources due to the deep structure. DCNNs have been implemented on different large-scale computing platforms. However, there is a trend that DCNNs have been embedded into lightweight local systems, which requires low power/energy consumptions and small hardware footprints. Stochastic Computing (SC) radically simplifies the hardware implementation of arithmetic units and has the potential to satisfy the small low-power needs of DCNNs. Local connectivities and down-sampling operations have made DCNNs more complex to be implemented using SC. In this paper, eight feature extraction designs for DCNNs using SC in two groups are explored and optimized in detail from the perspective of calculation precision, where we permute two SC implementations for inner-product calculation, two down-sampling schemes, and two structures of DCNN neurons. We evaluate the network in aspects of network accuracy and hardware performance for each DCNN using one feature extraction design out of eight. Through exploration and optimization, the accuracies of SC-based DCNNs are guaranteed compared with software implementations on CPU/GPU/binary-based ASIC synthesis, while area, power, and energy are significantly reduced by up to 776×, 190×, and 32835×.

22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2017

With the recent advance of wearable devices and Internet of Things (IoTs), it becomes attractive ... more With the recent advance of wearable devices and Internet of Things (IoTs), it becomes attractive to implement the Deep Convolutional Neural Networks (DCNNs) in embedded and portable systems. Currently, executing the software-based DCNNs requires high-performance servers, restricting the widespread deployment on embedded and mobile IoT devices. To overcome this obstacle, considerable research efforts have been made to develop highly-parallel and specialized DCNN accelerators using GPGPUs, FPGAs or ASICs. Stochastic Computing (SC), which uses a bit-stream to represent a number within [-1, 1] by counting the number of ones in the bit-stream, has high potential for implementing DCNNs with high scalability and ultra-low hardware footprint. Since multiplications and additions can be calculated using AND gates and multiplexers in SC, significant reductions in power (energy) and hardware footprint can be achieved compared to the conventional binary arithmetic implementations. The tremendous savings in power (energy) and hardware resources allow immense design space for enhancing scalability and robustness for hardware DCNNs. This paper presents SC-DCNN, the first comprehensive design and optimization framework of SC-based DCNNs, using a bottom-up approach. We first present the designs of function blocks that perform the basic operations in DCNN, including inner product, pooling, and activation function. Then we propose four designs of feature extraction blocks, which are in charge of extracting features from input feature maps, by connecting different basic function blocks with joint optimization. Moreover, the efficient weight storage methods are proposed to reduce the area and power (energy) consumption. Putting all together, with feature extraction blocks carefully selected, SC-DCNN is holistically optimized to minimize area and power (energy) consumption while maintaining high network accuracy. Experimental results demonstrate that the LeNet5 implemented in SC-DCNN consumes only 17 mm 2 area and 1.53 W power, achieves throughput of 781250 images/s, area efficiency of 45946 images/s/mm 2 , and energy efficiency of 510734 im-ages/J.

IoT by JI LI

The constantly advancing integration capability is paving the way to the construction of extremel... more The constantly advancing integration capability is paving the way to the construction of extremely large scale continuum of internet where entities, or " things " , from vastly varied domains are uniquely address-able and interacting seamlessly to form a giant networked system of systems, known as Internet-of-things (IoT). In contrast to such visionary networked system paradigm, prior research efforts on IoT are still very fragmented and confined to disjoint explorations in different application, architectural, security, services, protocol and economical domains, thus preventing the design exploration and optimization from a unified and global perspective. In this context, this survey article first proposes a mathematical modeling framework that is rich in expressivity to capture the IoT characteristics from a global perspective. Then a list of fundamental challenges in i) sensing , ii) decentralized computation, iii) robustness, iv) energy-efficiency and v) hardware security is identified and formulated based on the proposed modeling framework. The solutions are discussed to shed lights on future IoT system paradigm development.

Low Power FinFET Circuit Design by JI LI

Proc. of ACM Great Lakes Symp. on VLSI (GLSVLSI), 2017

Leakage power consumption has recently become a great concern for modern embedded systems. FinFET... more Leakage power consumption has recently become a great concern for modern embedded systems. FinFET technologies , power gating, and near-and super-threshold regimes can significantly reduce the power consumption. However, there lacks a comprehensive analysis of jointly applying the aforementioned power saving techniques. In this paper, we investigate the application of power gating to FinFET circuits operating in near-and super-threshold voltage regimes for embedded system applications. A joint optimization algorithm is proposed to determine the width/length, position and threshold type of the sleep transistor together with the operating voltage constrained to a certain deadline, and with the goal of minimizing energy per operation. Experimental results demonstrate that the proposed algorithm achieves up to 99.9% energy reductions when compared to the near-threshold approach without power gating and 95.3% when compared to deadline-free optimization.

Proc. of Design Automation and Test in Europe (DATE), 2015

—With the aggressive downscaling of the process technologies and importance of battery-powered sy... more —With the aggressive downscaling of the process technologies and importance of battery-powered systems, reducing leakage power consumption has become one of the most crucial design challenges for IC designers. This paper presents a device-circuit cross-layer framework to utilize fine-grained gate-length biased FinFETs for circuit leakage power reduction in the near-and super-threshold operation regimes. The impacts of Gate-Length Biasing (GLB) on circuit speed and leakage power are first studied using one of the most advanced technology nodes – a 7nm FinFET technology. Then multiple standard cell libraries using different leakage reduction techniques, such as GLB and Dual-VT, are built in multiple operating regimes at this technology node. It is demonstrated that, compared to Dual-VT, GLB is a more suitable technique for the advanced 7nm FinFET technology due to its capability of delivering a finer-grained trade-off between the leakage power and circuit speed, not to mention the lower manufacturing cost. The circuit synthesis results of a variety of ISCAS benchmark circuits using the presented GLB 7nm FinFET cell libraries show up to 70% leakage improvement with zero degradation in circuit speed in the near-and super-threshold regimes, respectively, compared to the standard 7nm FinFET cell library.

With the aggressive downscaling of process technologies and the importance of battery-powered sys... more With the aggressive downscaling of process technologies and the importance of battery-powered systems, reducing leakage power consumption has become a crucial design challenge for IC designers. In addition, the traditional bulk CMOS technologies face significant challenges related to short-channel effects and process variations. FinFET devices have attracted a lot of attention as an alternative to bulk CMOS in sub-32nm technology nodes. This paper presents a device-circuit cross-layer framework to utilize fine-grained gate-length biased FinFETs for circuit leakage power reduction in near-and super-threshold (VT) operation regimes. The impacts of cell-level and transistor-level Gate-Length Biasing (GLB) on circuit speed and leakage power are studied using a 7nm FinFET technology.

Smart Grid Papers by JI LI

Dynamic energy pricing policy introduces real-time power-consumption-reflective pricing in the sm... more Dynamic energy pricing policy introduces real-time power-consumption-reflective pricing in the smart grid in order to incentivise energy consumers to schedule electricity-consuming applications (tasks) more prudently to minimise electric bills. This has become a particularly interesting problem with the availability of photovoltaic (PV) power generation facilities and controllable energy storage systems. This study addresses the problem of concurrent task scheduling and storage management for residential energy consumers with PV and storage systems, in order to minimise the electric bill. A general type of dynamic pricing scenario is assumed where the energy price is both time-of-use and power dependent. Tasks are allowed to support suspend-now and resume-later operations. A negotiation-based iterative approach has been proposed. In each iteration, all tasks are ripped-up and rescheduled under a fixed storage charging/discharging scheme, and then the storage control scheme is derived based on the latest task scheduling. The concept of congestion is introduced to gradually adjust the schedule of each task, whereas dynamic programming is used to find the optimal schedule. A near-optimal storage control algorithm is effectively implemented. Experimental results demonstrate that the proposed algorithm can achieve up to 60.95% in the total energy cost reduction compared with various baseline methods.

Proc. of Online Green Communications Conference (Online GreenComm), 2014

Dynamic energy pricing is a promising technique in the Smart Grid that incentivizes energy consum... more Dynamic energy pricing is a promising technique in the Smart Grid that incentivizes energy consumers to consume electricity more prudently in order to minimize their electric bills meanwhile satisfying their energy requirements. This has become a particularly interesting problem with the introduction of residential photovoltaic (PV) power generation facilities. This paper addresses the problem of task scheduling of (a collection of) energy consumers with PV power generation facilities, in order to minimize the electricity bill. A general type of dynamic pricing scenario is assumed where the energy price is both time-of-use and total power consumption-dependent. A negotiation-based iterative approach has been proposed that is inspired by the state-of-the-art Field-Programmable Gate Array (FPGA) routing algorithms. More specifically, the negotiation-based algorithm is used to rip-up and reschedule all tasks in each iteration, and the concept of congestion is effectively introduced to dynamically adjust the schedule of each task based on the historical scheduling results as well as the (historical) total power consumption in each time slot. Experimental results demonstrate that the proposed algorithm achieves up to 51.8% improvement in electric bill reduction compared with baseline methods.

Asia and South Pacific Design Automation Conference (ASP-DAC), 2015

—Dynamic energy pricing is a promising technique in the Smart Grid to alleviate the mismatch betw... more —Dynamic energy pricing is a promising technique in the Smart Grid to alleviate the mismatch between electricity generation and consumption. Energy consumers are incentivized to shape their power demands, or more specifically, schedule their electricity-consuming applications (tasks) more prudently to minimize their electric bills. This has become a particularly interesting problem with the availability of residential photovoltaic (PV) power generation facilities and controllable energy storage systems. This paper addresses the problem of joint task scheduling and energy storage control for energy consumers with PV and energy storage facilities, in order to minimize the electricity bill. A general type of dynamic pricing scenario is assumed where the energy price is both time-of-use and power-dependent, and various energy loss components are considered including power dissipation in the power conversion circuitries as well as the rate capacity effect in the storage system. A negotiation-based iterative approach has been proposed for joint residential task scheduling and energy storage control that is inspired by the state-of-the-art Field-Programmable Gate Array (FPGA) routing algorithms. In each iteration, it rips-up and reschedules all tasks under a fixed storage control scheme, and then derives a new charging/discharging scheme for the energy storage based on the latest task scheduling. The concept of congestion is introduced to dynamically adjust the schedule of each task based on the historical results as well as the current scheduling status, and a near-optimal storage control algorithm is effectively implemented by solving convex optimization problem(s) with polynomial time complexity. Experimental results demonstrate the proposed algorithm achieves up to 64.22% in the total energy cost reduction compared with the baseline methods.

Soft Error Rate Evaluation Papers by JI LI

Proc. of Design Automation Conf. (DAC), 2016

Radiation-induced soft errors have posed an ever increasing reliability challenge as device dimen... more Radiation-induced soft errors have posed an ever increasing reliability challenge as device dimensions keep shrinking in advanced CMOS technology. Therefore, it is imperative to devise fast and accurate soft error rate (SER) estimation methods. Previous works mainly focus on improving the accuracy of the SER results, whereas the speed improvement is limited to partitioning and parallel processing. This paper presents an efficient SER estimation framework for combina-tional logic circuits in the presence of single-event transients (SETs). A novel top-down memoization algorithm is proposed to accelerate the propagation of SETs. Experimental results of a variety of benchmark circuits demonstrate that the proposed approach achieves up to 560.2X times speedup with less than 3% difference in terms of SER results compared with the baseline algorithm.

IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2016

With drastic device shrinking, low operating voltages , increasing complexities, and high speed o... more With drastic device shrinking, low operating voltages , increasing complexities, and high speed operations, radiation-induced soft errors have posed an ever increasing reliability challenge to both combinational and sequential circuits in advanced CMOS technologies. Therefore, it is imperative to devise efficient soft error rate (SER) estimation methods, in order to evaluate the soft error vulnerabilities for cost-effective robust circuit design. Previous works either analyze only SER in combinational circuits or evaluate soft error vulnerabilities in sequential elements. In this paper, a joint SER estimation framework is proposed, which considers single-event transients (SETs) in combinational logic and multiple cell upsets (MCUs) in sequential components. Various masking effects are considered in the combinational SER estimation process, and several typical radiation-hardened and non-hardened flip-flop structures are analyzed and compared as the sequential elements. A schematic and layout co-simulation approach is proposed to model the MCUs for redundant sequential storage structures. Experimental results of a variety of ISCAS benchmark circuits using the Nangate 45nm CMOS standard cell library demonstrate the difference in soft error resilience among designs using different sequential elements and the importance of modeling MCUs in redundant structures. Keywords—Soft error, hardened flip-flop, single-event upset, multiple cell upset.

Radiation-induced soft errors have posed an increasing reliability challenge to combinational and... more Radiation-induced soft errors have posed an increasing reliability challenge to combinational and sequential circuits in advanced CMOS technologies. Therefore, it is imperative to devise fast, accurate and scalable soft error rate (SER) estimation methods as part of cost-effective robust circuit design. This paper presents an efficient SER estimation framework for combinational and sequential circuits, which considers single-event transients (SETs) in combinational logic and multiple cell upsets (MCUs) in sequential elements. A novel top-down memoization algorithm is proposed to accelerate the propagation of SETs, and a general schematic and layout co-simulation approach is proposed to model the MCUs for redundant sequential storage structures. The feedback in sequential logic is analyzed with an efficient time frame expansion method. Experimental results on various ISCAS85 combinational benchmark circuits demonstrate that the proposed approach achieves up to 560.2X times speedup with less than 3% difference in terms of SER results compared with the baseline algorithm. The average runtime of the proposed framework on a variety of ISCAS89 benchmark circuits is 7.20s, and the runtime is 119.23s for the largest benchmark circuit with more than 3,000 flip-flops and 17,000 gates. CCS Concepts: r Hardware → Signal integrity and noise analysis; Transient errors and upsets; Additional Key Words and Phrases: Soft error, single-event upset, multiple cell upset, hardened flip-flop, algorithm ACM Reference Format: Ji Li and Jeffrey Draper, 2016. Accelerated Soft-Error-Rate (SER) estimation for combinational and sequential circuits.

Energy Harvesting by JI LI

This paper adopts a multi-source energy harvesting system to combine thermal, kinetic, and indoor... more This paper adopts a multi-source energy harvesting system to combine thermal, kinetic, and indoor Photovoltaic sources to provide a stable power supply. We derive the best power extraction policy and converter parameters optimization technique. Since energy harvesting prediction technique is important in assisting task scheduler in compensating the intermittent energy harvesting, we conduct a comprehensive investigation over different neural network-based prediction techniques for both single and multiple energy harvesting sources using measured harvesting traces. Experimental results demonstrate the effectiveness of the power extraction and converter parameters optimization techniques. Moreover, directly predicting the total output power achieves the highest prediction accuracy. Keywords—Non-volatile processor (NVP), neural network, energy harvesting, multiple energy sources.

Cloud Computing Papers by JI LI

—Cloud computing has become an attractive computing paradigm in both academia and industry. Throu... more —Cloud computing has become an attractive computing paradigm in both academia and industry. Through virtu-alization technology, Cloud Service Providers (CSPs) that own data centers can structure physical servers into Virtual Machines (VMs) to provide services, resources, and infrastructures to users. Profit-driven CSPs charge users for service access and VM rental, and reduce power consumption and electric bills so as to increase profit margin. The key challenge faced by CSPs is data center energy cost minimization. Prior works proposed various algorithms to reduce energy cost through Resource Provisioning (RP) and/or Task Scheduling (TS). However, they have scalability issues or do not consider TS with task dependencies, which is a crucial factor that ensures correct parallel execution of tasks. This paper presents DRL-Cloud, a novel Deep Reinforcement Learning (DRL)-based RP and TS system, to minimize energy cost for large-scale CSPs with very large number of servers that receive enormous numbers of user requests per day. A deep Q-learning-based two-stage RP-TS processor is designed to automatically generate the best long-term decisions by learning from the changing environment such as user request patterns and realistic electric price. With training techniques such as target network, experience replay, and exploration and exploitation, the proposed DRL-Cloud achieves remarkably high energy cost efficiency, low reject rate as well as low runtime with fast convergence. Compared with one of the state-of-the-art energy efficient algorithms, the proposed DRL-Cloud achieves up to 320% energy cost efficiency improvement while maintaining lower reject rate on average. For an example CSP setup with 5, 000 servers and 200, 000 tasks, compared to a fast round-robin baseline, the proposed DRL-Cloud achieves up to 144% runtime reduction.

International Symposium on Quality Electronic Design (ISQED), 2016

Cloud computing has drawn significant attention from both academia and industry as an emerging co... more Cloud computing has drawn significant attention from both academia and industry as an emerging computing paradigm where data, applications, or processing power are provided as services through the Internet. Cloud computing extends the existing computing infrastructure owned by the cloud service providers (CSPs) to achieve the economies of scale through virtualization and aggregated computing resources. End users, on the other hand, can reach these services through an elastic utility computing environment with minimal upfront investment. Nevertheless, pervasive use of cloud computing and the resulting rise in the number of data centers have brought forth concerns about energy consumption and carbon emission. Therefore, this paper addresses the problem of resource provisioning and task scheduling on a cloud platform under given service level agreements , in order to minimize the electric bills and maximize the profitability for the CSP. User task graphs and dependencies are randomly generated, whereas user requests for CPU and memory resources are extracted from the Google cluster trace. A general type of dynamic pricing scenario is assumed where the energy price is both time-of-use and total power consumption-dependent. A negotiation-based iterative approach has been proposed for the resource provisioning and task scheduling that is inspired by a routing algorithm. More specifically, in each iteration, decisions made in the previous iteration are ripped-up and re-decided, while a congestion model is introduced to dynamically adjust the resource provisioning decisions and the schedule of each task based on the historical results as well as the current state of affairs. Experimental results demonstrate that the proposed algorithm achieves up to 63% improvement in the total electrical energy bill of an exemplary data center compared to the baseline.

Proc. of ACM Great Lakes Symp. on VLSI (GLSVLSI), 2017

International Joint Conference on Neural Networks (IJCNN), 2017

IEEE International Conference on Computer Design (ICCD), 2016

22nd Asia and South Pacific Design Automation Conference (ASP-DAC), 2017

Proc. of Design Automation and Test in Europe (DATE), 2017

22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2017

Proc. of ACM Great Lakes Symp. on VLSI (GLSVLSI), 2017

Proc. of Design Automation and Test in Europe (DATE), 2015

Proc. of Online Green Communications Conference (Online GreenComm), 2014

Asia and South Pacific Design Automation Conference (ASP-DAC), 2015

Proc. of Design Automation Conf. (DAC), 2016

IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2016

International Symposium on Quality Electronic Design (ISQED), 2016

International Symposium on Quality Electronic Design (ISQED), 2017

—Cloud computing has become an attractive computing paradigm in recent years to offer on demand c... more —Cloud computing has become an attractive computing paradigm in recent years to offer on demand computing resources for users worldwide. Through Virtual Machine (VM) technologies, the cloud service providers (CSPs) can provide users the infrastructure, platform, and software with a quite low cost. With the drastically growing number of data centers, the energy efficiency has drawn a global attention as CSPs are faced with the high energy cost of data centers. Many previous works have contributed to improving the energy efficiency in data centers. However, the computational complexity may lead to unacceptable run time. In this paper, we propose a fast and energy-aware resource provisioning and task scheduling algorithm to achieve low energy cost with reduced computational complexity for CSPs. In our iterative algorithm, we divide the provisioning and scheduling to multiple steps which can effectively reduce the complexity and minimize the run time while achieving a reasonable energy cost. Experimental results demonstrate that compared to the baseline algorithm, the proposed algorithm can achieve up to 79.94% runtime improvement with an acceptable energy cost increase.