Jack Sampson - Academia.edu (original) (raw)
Papers by Jack Sampson
— Nonvolatile processors (NVPs) are processors with integrated nonvolatile memory (NVM) to preser... more — Nonvolatile processors (NVPs) are processors with integrated nonvolatile memory (NVM) to preserve task-intermediate on-chip state during power emergencies. NVPs hide data backup and restoration from the executing software in order to provide an execution mode that will always eventually complete the current task. NVPs are emerging as a promising solution for energy harvesting scenarios, in which the available power supply is unstable and intermittent due to their ability to ensure that even short periods of sufficient power, on the order of tens of instructions, will result in net forward progress. The article explores the design space for an NVP across different architectures, different input power sources, and policies for maximizing forward progress in a framework calibrated using measured results from a fabricated NVP, and proposes a heterogeneous-microarchitecture solution that more efficiently capitalizes on ephemeral power surpluses.
For any given application, there is an optimal throughput point in the space of per-processor per... more For any given application, there is an optimal throughput point in the space of per-processor performance and the num ber of such processors given to that application. However, due to thermal, yield, and other constraints, not all of these optimal points can plausibly be constructed with a given tech nology. In this paper, we look at how emerging steep slope de vices, 3D circuit integration, and trends in process technology scaling will combine to shift the boundaries of both attainable performance, and the optimal set of technologies to employ to achieve it. We propose a heterogeneous-technology 3D archi tecture capable of operating efficiently at an expanded num ber of points in this larger design space and devise a hetero geneity and thermal aware scheduling algorithm to exploit its potential. Our heterogeneous mapping techniques are capa ble of producing speedups ranging from 17% for a high end server workloads running at around 900e to over 160% for embedded systems running below 60oe.
— Video applications are becoming ubiquitous in mobile and embedded systems. Wearable video syste... more — Video applications are becoming ubiquitous in mobile and embedded systems. Wearable video systems such as Google Glasses require capabilities for real-time video analytics and prolonged battery lifetimes. Further, the increasing resolution of image sensors in these mobile systems places an increasing demand on both the memory storage as well as the computational power. In this work, we present the Refresh Enabled Video Analytics (REVA) system, an embedded architecture for multi-object scene understanding and tackle the unique opportunities provided by real-time embedded video analytics applications for reducing the DRAM memory refresh energy. We compare our design with the existing design space and show savings of 88% in refresh power and 15% in total power, as compared to a standard DRAM refresh scheme.
— Steep Slope devices, with Heterojunction Tunnel FETs (TFETs) in particular, have been proposed ... more — Steep Slope devices, with Heterojunction Tunnel FETs (TFETs) in particular, have been proposed as a viable solution to overcome the subthreshold slope limitation in existing CMOS technology and achieve ultra-low voltage operation with acceptable performance. However, state-of-the-art FinFET technologies continue to demonstrate superior performance than steep slope devices in application domains demanding peak single threaded performance. In this context, we examine different computing paradigms where TFET technologies can be used, not just as a 'drop in' replacement, but as an additional parameter to augment the architectural design space. This greatly widens the scope of optimizations for performance and power. We investigate the tradeoffs between device and architectures in general purpose processors when performance, power and temperature are individually constrained. We also synthesize examples of domain-specific accelerators used in computer vision using in-house TFET standard cell libraries to demonstrate the energy benefits of designing TFET-based accelerators. We demonstrate that synthesizing these accelerators using TFETs reduces energy by over 6X in comparison to an equivalent iso-voltage CMOS-based design and by over 30% in comparison to an iso-performance CMOS design.
— Energy harvesting has been widely investigated as a promising method of providing power for ult... more — Energy harvesting has been widely investigated as a promising method of providing power for ultra-low-power applications. Such energy sources include solar energy, radio-frequency (RF) radiation, piezoelectricity, thermal gradients, etc. However, the power supplied by these sources is highly unreliable and dependent upon ambient environment factors. Hence, it is necessary to develop specialized systems that are tolerant to this power variation, and also capable of making forward progress on the computation tasks. The simulation platform in this paper is calibrated using measured results from a fabricated nonvolatile processor and used to explore the design space for a nonvolatile processor with different architectures, different input power sources, and policies for maximizing forward progress.
— The challenges of the Power Wall manifest in mobile and embedded processors due to their inhere... more — The challenges of the Power Wall manifest in mobile and embedded processors due to their inherent thermal and form-factor constraints. The power dissipated over a fixed area, namely, the power density, directly affects acceptable core temperatures even for low-power devices. In this paper, we examine techniques to counter this power density increase with device and microarchitecture-level heterogeneity. We explore the design space in which various parameters such as frequency and micro-architectural complexity can be traded off against each other in order to achieve the optimal configuration for a fixed temperature limit. Since conventional CMOS technology based cores may not satisfy our performance and power requirements, especially under tight thermal constraints, we propose a heterogeneous CMOS-Tunnel FET multicore for obtaining the optimal operating points under power and thermal limitations. Using a profiling based static assignment scheme, we demonstrate the improvement obtained by coupling this device-level heterogeneity to architectural modifications. We also propose an instruction slack-based scheme to map applications on the heterogeneous multicore. Our schemes show an improvement of up to 47% performance and 30% energy above the best homogeneous configuration.
— Nonvolatile processors (NVPs) are processors with integrated nonvolatile memory (NVM) to preser... more — Nonvolatile processors (NVPs) are processors with integrated nonvolatile memory (NVM) to preserve task-intermediate on-chip state during power emergencies. NVPs hide data backup and restoration from the executing software in order to provide an execution mode that will always eventually complete the current task. NVPs are emerging as a promising solution for energy harvesting scenarios, in which the available power supply is unstable and intermittent due to their ability to ensure that even short periods of sufficient power, on the order of tens of instructions, will result in net forward progress. The article explores the design space for an NVP across different architectures, different input power sources, and policies for maximizing forward progress in a framework calibrated using measured results from a fabricated NVP, and proposes a heterogeneous-microarchitecture solution that more efficiently capitalizes on ephemeral power surpluses.
For any given application, there is an optimal throughput point in the space of per-processor per... more For any given application, there is an optimal throughput point in the space of per-processor performance and the num ber of such processors given to that application. However, due to thermal, yield, and other constraints, not all of these optimal points can plausibly be constructed with a given tech nology. In this paper, we look at how emerging steep slope de vices, 3D circuit integration, and trends in process technology scaling will combine to shift the boundaries of both attainable performance, and the optimal set of technologies to employ to achieve it. We propose a heterogeneous-technology 3D archi tecture capable of operating efficiently at an expanded num ber of points in this larger design space and devise a hetero geneity and thermal aware scheduling algorithm to exploit its potential. Our heterogeneous mapping techniques are capa ble of producing speedups ranging from 17% for a high end server workloads running at around 900e to over 160% for embedded systems running below 60oe.
— Video applications are becoming ubiquitous in mobile and embedded systems. Wearable video syste... more — Video applications are becoming ubiquitous in mobile and embedded systems. Wearable video systems such as Google Glasses require capabilities for real-time video analytics and prolonged battery lifetimes. Further, the increasing resolution of image sensors in these mobile systems places an increasing demand on both the memory storage as well as the computational power. In this work, we present the Refresh Enabled Video Analytics (REVA) system, an embedded architecture for multi-object scene understanding and tackle the unique opportunities provided by real-time embedded video analytics applications for reducing the DRAM memory refresh energy. We compare our design with the existing design space and show savings of 88% in refresh power and 15% in total power, as compared to a standard DRAM refresh scheme.
— Steep Slope devices, with Heterojunction Tunnel FETs (TFETs) in particular, have been proposed ... more — Steep Slope devices, with Heterojunction Tunnel FETs (TFETs) in particular, have been proposed as a viable solution to overcome the subthreshold slope limitation in existing CMOS technology and achieve ultra-low voltage operation with acceptable performance. However, state-of-the-art FinFET technologies continue to demonstrate superior performance than steep slope devices in application domains demanding peak single threaded performance. In this context, we examine different computing paradigms where TFET technologies can be used, not just as a 'drop in' replacement, but as an additional parameter to augment the architectural design space. This greatly widens the scope of optimizations for performance and power. We investigate the tradeoffs between device and architectures in general purpose processors when performance, power and temperature are individually constrained. We also synthesize examples of domain-specific accelerators used in computer vision using in-house TFET standard cell libraries to demonstrate the energy benefits of designing TFET-based accelerators. We demonstrate that synthesizing these accelerators using TFETs reduces energy by over 6X in comparison to an equivalent iso-voltage CMOS-based design and by over 30% in comparison to an iso-performance CMOS design.
— Energy harvesting has been widely investigated as a promising method of providing power for ult... more — Energy harvesting has been widely investigated as a promising method of providing power for ultra-low-power applications. Such energy sources include solar energy, radio-frequency (RF) radiation, piezoelectricity, thermal gradients, etc. However, the power supplied by these sources is highly unreliable and dependent upon ambient environment factors. Hence, it is necessary to develop specialized systems that are tolerant to this power variation, and also capable of making forward progress on the computation tasks. The simulation platform in this paper is calibrated using measured results from a fabricated nonvolatile processor and used to explore the design space for a nonvolatile processor with different architectures, different input power sources, and policies for maximizing forward progress.
— The challenges of the Power Wall manifest in mobile and embedded processors due to their inhere... more — The challenges of the Power Wall manifest in mobile and embedded processors due to their inherent thermal and form-factor constraints. The power dissipated over a fixed area, namely, the power density, directly affects acceptable core temperatures even for low-power devices. In this paper, we examine techniques to counter this power density increase with device and microarchitecture-level heterogeneity. We explore the design space in which various parameters such as frequency and micro-architectural complexity can be traded off against each other in order to achieve the optimal configuration for a fixed temperature limit. Since conventional CMOS technology based cores may not satisfy our performance and power requirements, especially under tight thermal constraints, we propose a heterogeneous CMOS-Tunnel FET multicore for obtaining the optimal operating points under power and thermal limitations. Using a profiling based static assignment scheme, we demonstrate the improvement obtained by coupling this device-level heterogeneity to architectural modifications. We also propose an instruction slack-based scheme to map applications on the heterogeneous multicore. Our schemes show an improvement of up to 47% performance and 30% energy above the best homogeneous configuration.