Partitioned cache architectures for reduced NBTI-induced aging (original) (raw)
Related papers
Investigating the impact of NBTI on different power saving cache strategies
2010
The occupancy of caches has tended to be dominated by the logic bit value0'approximately 75% of the time. Periodic bit flipping can reduce this to 50%. Combining cache power saving strategies with bit flipping can lower the effective logic bit value0'occupancy ratios even further. We investigate how Negative Bias Temperature Instability (NBTI) affects different power saving cache strategies employing symmetric and asymmetric 6-transistor (6T) and 8T Static Random Access Memory (SRAM) cells. We ...
Buffering of frequent accesses for reduced cache aging
Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI, 2011
Previous works have shown that typical power management knobs such as voltage scaling or power gating can also be exploited to reduce aging phenomena caused by Negative Bias Temperature Instability (NBTI). We propose a scheme for power-managed caches that allows to significantly improving the aging of the cache thanks to the use of a small buffer that stores a copy of the lines that are most critical for aging, that is, the ones with the least opportunity of being power-managed; by using the buffer instead of the cache when accessing these critical lines, the original cache is preserved and its lifetime is significantly prolonged. As a side effect, this scheme improves total power since the less energy-hungry buffer is accessed most of the time. Experimental analysis shows this scheme allows to achieve significant (> 3x on average) lifetime extensions for the cache, with a concurrent energy saving between 18 and 24%, depending on cache size.
Sleepy-LRU: extending the lifetime of non-volatile caches by reducing activity of age bits
The Journal of Supercomputing
Emerging non-volatile memories (NVMs) are known as promising alternatives to SRAMs in on-chip caches. However, their limited write endurance is a major challenge when NVMs are employed in these highly frequently written caches. Early wear-out of NVM cells makes the lifetime of the caches extremely insufficient for nowadays computational systems. Previous studies only addressed the lifetime of data part in the cache. This paper first demonstrates that the age bits field of the cache replacement algorithm is the most frequently written part of a cache block and its lifetime is shorter than that of data part by more than 27×. Second, it investigates the effect of age bits wear-out on the cache operation and shows that the performance is severely degraded after even a small portion of age bits become non-operational. Third, a novel cache replacement algorithm, so-called Sleepy-LRU, is proposed to reduce the write activity of the age bits with negligible overheads. The evaluations show that Sleepy-LRU extends the lifetime of instruction and data caches to 3.63× and 3.00× , respectively, with an average of 0.06% performance overhead. In addition, Sleepy-LRU imposes no area and power consumption overhead.
Cache aging reduction with improved performance using dynamically re-sizable cache
Design, Automation & Test in Europe Conference & Exhibition (DATE), 2014, 2014
Aging of transistors is a limiting factor for long term reliability of devices in sub-100nm technologies. It's a worst-case metric where the lifetime of a device is determined by the earliest failing component. Impact is more serious on memory arrays, where failure of a single SRAM cell would cause the failure of the whole system. Previous works have shown that partitioning based strategies based on power management techniques can effectively control aging effects and can extend lifetime of the cache significantly. However, such a benefit comes as a tradeoff with performance which reduces proportionally as the time elapses. To address this problem and provide a single solution to concurrently improve aging, energy and performance of the cache, we propose an architectural solution based on the dynamically re-sizable cache [5] and cache partitioning approaches. By this strategy, cache is dynamically re-sized and reconfigured whenever a cache block becomes unreliable. Coupling such aging mitigation technique along with dynamically re-sizable cache approach provides on average 30% lifetime improvement with less than 0.4x degradation in performance whereas, in previous solutions, performance degradation sometimes goes upto 10x.
Aging effects of leakage optimizations for caches
Besides static power consumption, sub-90nm devices have to account for NBTI effects, which are one of the major concerns about system reliability. Some of the factors that regulate power consumption also impact NBTI-induced aging effects; however, to which extent traditional low-power techniques can mitigate NBTI issues has not been investigated thoroughly. This is especially true for cache memories, which are the target of this work. We show how leakage optimization techniques can also be leveraged to extend the lifetime a cache. Experimental analysis points out that, while achieving a total energy reduction up to 80%, managing static power can also provide a 5x factor on lifetime extension.
Energy-optimal caches with guaranteed lifetime
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design - ISLPED '12, 2012
This work addresses the aging of the memory sub-system due to NBTI (Negative Bias Temperature Instability) in systems that have to provide a guaranteed level of service, and specifically, a guaranteed lifetime. Our approach leverages a novel cache architecture in which a smart joint use of redundancy and power management allows us to obtain caches that meet a desired lifetime target with minimal energy consumption. This is made possible by exploiting the possibility of putting the cache sub-block used as for redundancy into a deep low-power state, thus allowing more energy saving than a regular architecture. Sacrificing a portion of the cache for aging mitigation only marginally affects performance thanks to the non-linear dependency of miss rate versus cache size, which allows to find the best cache size that maximizes the objective. Simulation results show that it is possible to meet the target lifetime by achieving energy reductions (measured over the lifetime of the system) ranging from 3X to 10X (2X to 8X) for a lifetime target of 15 (25) years, with marginal miss rate overhead.
iRMW: A low-cost technique to reduce NBTI-dependent parametric failures in L1 data caches
2014 IEEE 32nd International Conference on Computer Design (ICCD), 2014
Negative bias temperature instability (NBTI) is a major cause of concern for chip designers because of its inherent ability to drastically reduce silicon reliability over the lifetime of the processor. Coupled with statistical variations of process parameters, it can potentially render systems dysfunctional in certain scenarios. Data caches suffer the most from such phenomenon because of the unbalanced duty cycle ratio of SRAM cells and maximum intrinsic susceptibility to process variations. In this paper, we propose a novel NBTI-aware technique, invert-Read-Modify-Write (iRMW) that can improve the functional yield of the data cache significantly over its lifetime. Using architecture-level benchmarks, we first analyse the impact of activity factor and workload variation on NBTI-induced failures in data caches. iRMW is then used as a means to balance the duty cycle by alternating between recovery and stress cycle upon successive read accesses to the cache line. The highly transient nature of the data stored in L1 data cache aides this process of recovery upon using iRMW. A unique feature of iRMW is its intelligent use of low-leakage & NBTI-tolerant embedded-DRAM cells as an alternative to SRAM-cells for storing important state information. Our experiments conducted using SPEC2006 and PhysicsBench workloads show that on-average the cache failure probability can be reduced by 22%, 33% and 36% after two, four and eight years of processor usage respectively. In addition to being extremely power-frugal, use of eDRAM reduces total area footprint of iRMW tremendously.
Proceedings of the The 51st Annual Design Automation Conference on Design Automation Conference - DAC '14, 2014
NVM has commonly been used to address increasingly large last-level caches (LLCs) requirements by reducing leakage. However, frequent data-writing operations result in increased energy consumption. In this context, a promising memory technology, Non-volatile SRAM (nvSRAM), enables normal and standby operation modes which can be used to store various types of data. However, nvSRAM suffers from high dynamic energy usage due to frequent switching between operation modes. In this paper, we propose a redundant store elimination (RSE) scheme which, on average, discards 94% of needless bitwrite operations. Moreover, we present a retention-aware cache management policy to reduce data updates of cache blocks, based on the correlation between data lifetime and cache types. Experimental results demonstrate that our proposal can improve energy consumption of SRAM-based and RRAM-based LLCs by 57% and 31%, respectively.
Performance and Power Solutions for Caches Using 8T SRAM Cells
2012 45th Annual IEEE/ACM International Symposium on Microarchitecture Workshops, 2012
Voltage scaling can reduce power dissipation significantly. SRAM cells (which are traditionally implemented using six-transistor cells) can limit voltage scaling due to stability concerns. Eight-transistor (8T) cells were proposed to enhance cell stability under voltage scaling. 8T cells, however, suffer from costly write operations caused by the column selection issue. Previous work has proposed Read-Modify-Write (RMW) to address this issue at the expense of an increase in cache access frequency. In this work, we introduce two microarchitectural solutions to address this inefficiency. Our solutions rely on grouping write accesses and bypassing read accesses made to the same cache set. We reduce cache access frequency up to 55%.
An Approach for Adaptive DRAM Temperature and Power Management
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2010
With rising capacities and higher accessing frequencies, highperformance DRAMs are providing increasing memory access bandwidth to the processors. However, the increasing DRAM performance comes with the price of higher power consumption and temperature in DRAM chips. Traditional low power approaches for DRAM systems focus on utilizing low power modes, which is not always suitable for high performance systems. Existing DRAM temperature management techniques, on the other hand, utilize generic temperature management methods inherited from those applied on processor cores. These methods reduce DRAM temperature by controlling the number of DRAM accesses, similar to throttling the processor core, which incurs significant performance penalty. In this paper, we propose a customized low power technique for high performance DRAM systems, namely the Page Hit Aware Write Buffer (PHA-WB). The PHA-WB improves DRAM page hit rate by buffering write operations that may incur page misses. This approach reduces DRAM system power consumption and temperature without any performance penalty. Our proposed Throughput-Aware PHA-WB (TAP) dynamically configures the write buffer for different applications and workloads, thus achieves the best trade off between DRAM power reduction and buffer power overhead. Our experiments show that a system with TAP could reduce the total DRAM power consumption by up to 18.36% (8.64% on average). The steady-state temperature can be reduced by as much as 5.10°C and by 1.93°C on average across eight representative workloads.