Ac-Dimm (original) (raw)
Related papers
ReMAM: Low Energy Resistive Multi-Stage Associative Memory for Energy Efficient Computing
The Internet of things (IoT) significantly increases the volume of computations and the number of running applications on processors, from mobiles to servers. Big data computation requires massive parallel processing and acceleration. In parallel processing, associative memories represent a promising solution to improve energy efficiency by eliminating redundant computations. However, the tradeoff between memory size and search energy consumption limits their applications. In this paper, we propose a novel low energy Resistive Multi-stage Associative Memory (ReMAM) architecture, which significantly reduces the search energy consumption by employing selective row activation and in-advance precharging techniques. ReMAM splits the search in the Ternary Content Addressable Memory (TCAM) to a number of shorter searches in consecutive stages. Then, it selectively activates TCAM rows at each stage based on the hits of previous stages, thus enabling energy saving. The proposed in-advance precharging technique mitigates the delay of the sequential TCAM search and limits the number of precharges to two low-cost steps. Our experimental evaluation on AMD Southern Island GPUs shows that ReMAM reduces energy consumption by 38.2% on average, which is 1.62X larger than using GPGPU with conventional single-stage associative memory.
Resistive Configurable Associative Memory for Approximate Computing
Modern computing machines are increasingly characterized by large scale parallelism in hardware (such as GP-GPUs) and advent of large scale and innovative memory blocks. Parallelism enables expanded performance tradeoffs whereas memories enable reuse of computational work. To be effective, however, one needs to ensure energy efficiency with minimal reuse overheads. In this paper, we describe a resistive con-figurable associative memory (ReCAM) that enables selective approximation and asymmetric voltage overscaling to manage delivered efficiency. The ReCAM structure matches an input pattern with pre-stored ones by applying an approximate search on selected bit indices (bitline-configurable) or selective pre-stored patterns (row-configurable). To further reduce energy, we explore proper ReCAM sizing, various configurable search operations with low overhead voltage overscaling, and different ReCAM update policies. Experimental result on the AMD Southern Islands GPUs for eight applications shows bitline-configurable and row-configurable ReCAM achieve on average to 43.6% and 44.5% energy savings with an acceptable quality loss of 10%.
EE-TCAM: An Energy-Efficient SRAM-Based TCAM on FPGA
Electronics, 2018
Ternary content-addressable memories (TCAMs) are used to design high-speed search engines. TCAM is implemented on application-specific integrated circuit (native TCAMs) and field-programmable gate array (FPGA) (static random-access memory (SRAM)-based TCAMs) platforms but both have the drawback of high power consumption. This paper presents a pre-classifier-based architecture for an energy-efficient SRAM-based TCAM. The first classification stage divides the TCAM table into several sub-tables of balanced size. The second SRAM-based implementation stage maps each of the resultant TCAM sub-tables to a separate row of configured SRAM blocks in the architecture. The proposed architecture selectively activates at most one row of SRAM blocks for each incoming TCAM word. Compared with the existing SRAM-based TCAM designs on FPGAs, the proposed design consumes significantly reduced energy as it activates a part of SRAM memory used for lookup rather than the entire SRAM memory as in the prev...
Approximate Computing using Multiple-Access Single-Charge Associative Memory
Memory-based computing using associative memory is a promising way to reduce the energy consumption of important classes of streaming applications by avoiding redundant computations. A set of frequent patterns that represent basic functions are pre-stored in Ternary Content Addressable Memory (TCAM) and reused. The primary limitation to using associative memory in modern parallel processors is the large search energy required by TCAMs. In TCAMs, all rows that match, except hit rows, precharge and discharge for every search operation, resulting in high energy consumption. In this paper, we propose a new Multiple-Access Single-Charge (MASC) TCAM architecture which is capable of searching TCAM contents multiple times with only a single precharge cycle. In contrast to previous designs, the MASC TCAM keeps the match-line voltage of all miss-rows high and uses their charge for the next search operation, while only the hit rows discharge. We use periodic refresh to control the accuracy of the search. We also implement a new type of approximate associative memory by setting longer refresh times for MASC TCAMs, which yields search results within 1-2 bit Hamming distances of the exact value. To further decrease the energy consumption of MASC TCAM and reduce the area, we implement MASC with crossbar TCAMs. Our evaluation on AMD Southern Island GPU shows that using MASC (crossbar MASC) associative memory can improve the average floating point units energy efficiency by 33.4%, 38.1%, and 36.7% (37.7%, 42.6%, and 43.1%) for exact matching, selective 1-HD and 2-HD approximations respectively, providing an acceptable quality of service (PSNR>30dB and average relative error<10%). This shows that MASC (crossbar MASC) can achieve 1.77X (1.93X) higher energy savings as compared to the state of the art implementation of GPGPU that uses voltage overscaling on TCAM.
CAP: Configurable Resistive Associative Processor for Near-Data Computing
IEEE International Symposium on Quality Electronic Design (ISQED)
Internet of Things is capable of generating huge amount of data, causing high overhead in terms of energy and performance if run on traditional CPUs and GPUs. This inefficiency comes from the limited cache size and memory bandwidth which result in large amount of data movement through memory hierarchy. In this paper, we propose a configurable associative processor, called CAP, which accelerates computation using multiple parallel memory-based cores capable of approximate or exact matching. CAP is integrated next to the main memory so it fetches the data directly from DRAM. To exploit data locality, CAMs adaptively split into highly and less frequent components and update at runtime. To further improve the CAP efficiency, we integrate a novel signature-based associative memory (SIGAM) beside each processing cores, to store highly frequent patterns and in runtime retrieve them in exact or approximate modes. Our experimental evaluations show that the CAP in approximate (exact) mode can achieve 9.4x and 5.3x (7.2x and 4.2x) energy improvement, and 4.lx and l.3x speeds up compare to AMO GPU and ASIC CMOS-based designs while providing acceptable quality of service.
An Efficient, Low Power 256X8 T-SRAM Architecture
High-speed lookup operations are performed by Ternary Content addressable memories. But TCAMs are limited due to low storage density, relatively access time, low scalability, complex circuitry, and are very expensive in comparison with static random access memories (SRAMs).The benefits of SRAM are availed by configuring an additional logic to enable SRAM to behave like a TCAM. T-SRAM is proposed novel memory architecture that emulates the TCAM functionality with SRAM. T-SRAM logically partitions the classical TCAM table along columns and rows into hybrid TCAM sub tables, which are then processed to map on their corresponding memory blocks .A 256x8 T-SRAM is implemented that consumes 0.024 W.