Hybrid dynamic energy and thermal management in heterogeneous embedded multiprocessor socs (original) (raw)

Package-Aware Scheduling of embedded workloads for temperature and Energy management on heterogeneous MPSoCs

2010 IEEE International Conference on Computer Design, 2010

In this paper, we present PASTEMP, a solution for Package Aware Scheduling for Thermal and Energy management using Multi-Parametric programming in heterogeneous embedded multiprocessor SoCs (MPSoCs). Based on the current thermal state of the system and current performance requirements of the workload, PASTEMP finds thermally safe and energy efficient voltage/frequency configurations for the cores on a MPSoC. The tasks are assigned to the cores depending on their performance demand and the current voltage/frequency of the core. The voltage/frequency settings of the cores are chosen through an optimization process which is based on the instantaneous thermal model we introduce to decouple the effect of package temperature from the temperature changes caused by the power consumption of the cores. To be able to find the best voltage/frequency settings at runtime, we use multi-parametric programming to separate the optimization into offline and online phases. According to our experimental results, compared to similar DTM techniques, PASTEMP results in up to 23% energy saving and 26% throughput improvement and reduces the deadline misses to more than a half while meeting all thermal constraints.

Thermal-aware application scheduling on device-heterogeneous embedded architectures

— The challenges of the Power Wall manifest in mobile and embedded processors due to their inherent thermal and form-factor constraints. The power dissipated over a fixed area, namely, the power density, directly affects acceptable core temperatures even for low-power devices. In this paper, we examine techniques to counter this power density increase with device and microarchitecture-level heterogeneity. We explore the design space in which various parameters such as frequency and micro-architectural complexity can be traded off against each other in order to achieve the optimal configuration for a fixed temperature limit. Since conventional CMOS technology based cores may not satisfy our performance and power requirements, especially under tight thermal constraints, we propose a heterogeneous CMOS-Tunnel FET multicore for obtaining the optimal operating points under power and thermal limitations. Using a profiling based static assignment scheme, we demonstrate the improvement obtained by coupling this device-level heterogeneity to architectural modifications. We also propose an instruction slack-based scheme to map applications on the heterogeneous multicore. Our schemes show an improvement of up to 47% performance and 30% energy above the best homogeneous configuration.

Temperature and energy aware scheduling of heterogeneous processors

2016 Ninth International Conference on Contemporary Computing (IC3), 2016

Modern computing requires faster and more powerful processing. Faster and more powerful processors have resulted in higher heat dissipation and power consumption. In this paper we present an offline algorithm called Temperature and Energy aware Dynamic Level Scheduling (TEDLS). It is able to schedule tasks in a heterogeneous environment with DVS enabled processors to minimize execution time, energy consumption and heat dissipation. We use a heat model to estimate the final temperature of a processor executing a task. This estimation of temperature is based on processor characteristics, which aids in choosing the cooler processors. Our simulation results have shown that the TEDLS algorithm not only results in processors having lower temperatures but also produces lower energy consumption as compared to the previous offline algorithms. The TEDLS algorithm also produces lower application execution time when the application size is small.

PROMETHEUS: A Proactive Method for Thermal Management of Heterogeneous MPSoCs

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2000

In this paper, we propose PROMETHEUS, a framework for proactive temperature aware scheduling of embedded workloads on single ISA (instruction set architecture) heterogeneous Multi-Processor Systems-on-Chip (MPSoC). It systematically combines temperature aware task assignment, task migration and dynamic voltage and frequency scaling (DVFS). PROMETHEUS is based on our novel low overhead temperature prediction technique, Tempo. In contrast to previous work, Tempo allows accurate estimation of potential thermal effects of future scheduling decisions without requiring any runtime adaptation. It reduces the maximum prediction error by up to an order of magnitude. Using Tempo, PROMETHEUS framework provides two temperature aware scheduling techniques which proactively avoid power states leading to future thermal emergencies while matching the performance needs to the workload requirements. The first technique, TempoMP, integrates Tempo with an online multi-parametric optimization method to guide decisions on task assignment, migration and setting core power states in a temperature aware fashion. Our second scheduling technique, TemPrompt uses Tempo in a heuristic algorithm which provides comparable efficiency at lower overhead. On average, these two techniques reduce the lateness of the tasks by 2.5X and energylateness product by 5X compared to the previous work.

Temperature-Aware Scheduling for Embedded Heterogeneous MPSoCs with Special Purpose IP Cores

2011 Proceedings of 20th International Conference on Computer Communications and Networks (ICCCN), 2011

Many embedded heterogeneous MPSoCs integrate general purpose cores along with special purpose cores. The power states of these cores are usually controlled by an internal hardware controller rather than the central operating system. In this paper, we propose a thermal management technique which reduces the performance penalty of central thermal management by considering these special purpose cores and the performance requirements of the tasks running on them. Our experimental results show that for the workloads with high priority special purpose tasks, our technique can reduce the occurence of thermal violations by at least 3X while improving the weighted execution time by up to 24%.

Thermal-aware scheduling for future chip multiprocessors

EURASIP Journal on Embedded …, 2007

The increased complexity and operating frequency in current single chip microprocessors is resulting in a decrease in the performance improvements. Consequently, major manufacturers offer chip multiprocessor (CMP) architectures in order to keep up with the expected performance gains. This architecture is successfully being introduced in many markets including that of the embedded systems. Nevertheless, the integration of several cores onto the same chip may lead to increased heat dissipation and consequently additional costs for cooling, higher power consumption, decrease of the reliability, and thermal-induced performance loss, among others. In this paper, we analyze the evolution of the thermal issues for the future chip multiprocessor architectures and show that as the number of on-chip cores increases, the thermal-induced problems will worsen. In addition, we present several scenarios that result in excessive thermal stress to the CMP chip or significant performance loss. In order to minimize or even eliminate these problems, we propose thermal-aware scheduler (TAS) algorithms. When assigning processes to cores, TAS takes their temperature and cooling ability into account in order to avoid thermal stress and at the same time improve the performance. Experimental results have shown that a TAS algorithm that considers also the temperatures of neighboring cores is able to significantly reduce the temperature-induced performance loss while at the same time, decrease the chip's temperature across many different operation and configuration scenarios.

System Level Power and Thermal Management on Embedded Processors

2012

Semiconductor scaling technology has led to a sharp growth in transistor counts. This has resulted in an exponential increase on both power dissipation and heat flux (or power density) in modern microprocessors. These microprocessors are integrated as the major components in many modern embedded devices, which offer richer features and attain higher performance than ever before. Therefore, power and thermal management have become the significant design considerations for modern embedded devices. Dynamic voltage/frequency scaling (DVFS) and dynamic power management (DPM) are two well-known hardware capabilities offered by modern embedded processors. However, the power or thermal aware performance optimization is not fully explored for the mainstream embedded processors with discrete DVFS and DPM capabilities. Many key problems have not been answered yet. What is the maximum performance that an embedded processor can achieve under power or thermal constraint for a periodic application? Does there exist an efficient algorithm for the power or thermal management problems with guaranteed quality bound? These questions are hard to be answered because the discrete settings of DVFS and DPM enhance the complexity of many power and thermal management problems, which are generally NP-hard. The dissertation presents a comprehensive study on these NP-hard power and thermal management problems for embedded processors with discrete DVFS and DPM capabilities. In the domain of power management, the dissertation addresses the power minimization problem for real-time schedules, the energy-constrained make-span minimization problem on homogeneous and heterogeneous chip multiprocessors (CMP) architectures, and the battery aware energy management problem with nonlinear battery discharging model. In the domain of thermal management, the work addresses several thermal-constrained performance maximization problems for periodic embedded applications. All the addressed problems are proved to be NP-hard or strongly NP-hard in the study. Then the work focuses on the design of the off-line optimal or polynomial time approximation algorithms as solutions in the problem design space. Several addressed NP-hard problems are tackled by dynamic programming with optimal solutions and pseudo-polynomial run time complexity. Because the optimal algorithms are not efficient in worst case, the fully polynomial time approximation algorithms are provided as more efficient solutions. Some efficient heuristic algorithms are also presented as solutions to several addressed problems. CHAPTER

Thermal-aware system-level modeling and management for Multi-Processor Systems-on-Chip

2011 IEEE International Symposium of Circuits and Systems (ISCAS), 2011

Multi-Processor Systems-on-Chip (MPSoCs) are penetrating the electronics market as a powerful, yet commercially viable, solution to answer the strong and steadily growing demand for scalable and high performance systems, at limited design complexity. However, it is critical to develop dedicated system-level design methodologies for multi-core architectures that seamlessly address their thermal modeling, analysis and management. In this work, we first formulate the problem of system-level thermal modeling and link it to produce a global thermal management formulation as a discrete-time optimal control problem, which can be solved using finite-horizon model-predictive control (MPC) techniques, while adapting to the actual time-varying unbalanced MPSoC workload requirements. Finally, we compare the system-level MPC-based thermal modeling and management approaches on an industrial 8-core MPSoC design and show their different trade-offs regarding performance while respecting operating temperature bounds.

TEEM: Online Thermal- and Energy-Efficiency Management on CPU-GPU MPSoCs

2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Heterogeneous Multiprocessor System-on-Chip (MPSoC) are progressively becoming predominant in most modern mobile devices. These devices are required to perform processing of applications within thermal, energy and performance constraints. However, most stock power and thermal management mechanisms either neglect some of these constraints or rely on frequency scaling to achieve energy-efficiency and temperature reduction on the device. Although this inefficient technique can reduce temporal thermal gradient, but at the same time hurts the performance of the executing task. In this paper, we propose a thermal and energy management mechanism which achieves reduction in thermal gradient as well as energy-efficiency through resource mapping and thread-partitioning of applications with online optimization in heterogeneous MPSoCs. The efficacy of the proposed approach is experimentally appraised using different applications from Polybench benchmark suite on Odroid-XU4 developmental platform. Results show 28% performance improvement, 28.32% energy saving and reduced thermal variance of over 76% when compared to the existing approaches. Additionally, the method is able to free more than 90% in memory storage on the MPSoC, which would have been previously utilized to store several task-to-thread mapping configurations.

Heterogeneity exploration for peak temperature reduction on multi-core platforms

Fifteenth International Symposium on Quality Electronic Design, 2014

As IC technology continues to evolve and more transistors are integrated into a single chip, high chip temperature due to high power density not only increases packaging/cooling cost, but also severely degrades reliability and the performance of computing systems. In the meantime, as transistor feature size continues to shrink, it becomes difficult to precisely control the manufacturing process. The manufacturing variations can cause significant differences from core to core and chip to chip. We believe that the heterogeneity due to manufacturing variations, if handled properly, can in fact improve the design objectives of real-time applications. In this paper, we study the problem on how to reduce the peak temperature of a real-time application by judiciously mirroring the physical architecture of an individual device to the logical architecture where the application was initially designed upon. We develop three computationally efficient algorithms for deploying applications to individual devices. Our simulation study has clearly shown that, by taking advantage of the uniqueness of each individual physical chip, the proposed approaches significantly reduce the peak temperature. The experiments also show that these approaches are efficient and have low operational cost.