The feasibility of using compression to increase memory system performance (original) (raw)

Improving system performance with compressed memory

Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001, 2001

This paper summarizes our research on implementing a compressed memory in computer systems. The basic premise is that the throughput for applications whose working set size does not fit in main memory degrades significantly due to an increase in the number of page faults. Hence we propose compressing memory pages that need to be paged out and storing them in memory. This hides the large latencies associated with a disk access, since the page has to be merely uncompressed when a page fault occurs. Our implementation is in the form of a device driver for Linux. We show results with some applications from the SPEC 2000 CPU benchmark suite and a computing kernel. It is seen that speed-ups ranging from 5 % to 250 % can be obtained, depending on the paging behavior of the application.

cache compression for microprocessor performance

—Computer systems and micro architecture researchers have proposed using hardware data compression units within the memory hierarchies of microprocessors in order to improve performance, energy efficiency, and functionality. However, most past work, and all work on cache compression, has made unsubstantiated assumptions about the performance, power consumption, and area overheads of the proposed compression algorithms and hardware. In this work, I present a lossless compression algorithm that has been designed for fast on-line data compression, and cache compression in particular. The algorithm has a number of novel features tailored for this application, including combining pairs of compressed lines into one cache line and allowing parallel compression of multiple words while using a single dictionary and without degradation in compression ratio. We reduced the proposed algorithm to a register transfer level hardware design, permitting performance, power consumption, and area estimation.

Performance of Hardware Compressed Main Memory

2000

A novel memory subsystem called Memory Expansion Technology (MXT) has been built for compressing main memory contents. This allows effectively a memory expansion that presents a "real" memory larger than the physically available memory. This paper provides an overview of the architecture and OS support and in-depth analysis of the performance impact of memory compression using the SPEC2000 benchmarks. Our

Hardware compressed main memory: operating system support and performance evaluation

IEEE Transactions on Computers, 2001

AbstractÐA new memory subsystem, called Memory Xpansion Technology (MXT), has been built for compressing main memory contents. MXT effectively doubles the physically available memory transparently to the CPUs, input/output devices, device drivers, and application software. An average compression ratio of two or greater has been observed for many applications. Since compressibility of memory contents varies dynamically, the size of the memory managed by the operating system is not fixed. In this paper, we describe operating system techniques that can deal with such dynamically changing memory sizes. We also demonstrate the performance impact of memory compression using the SPEC CPU2000 and SPECweb99 benchmarks. Results show that the hardware compression of memory has a negligible performance penalty compared to a standard memory for many applications. For memory starved applications and benchmarks such as SPECweb99, memory compression improves the performance significantly. Results also show that the memory contents of many applications can be compressed, usually by a factor of two to one.

Compresso: Pragmatic Main Memory Compression

2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2018

Today, larger memory capacity and higher memory bandwidth are required for better performance and energy efficiency for many important client and datacenter applications. Hardware memory compression provides a promising direction to achieve this without increasing system cost. Unfortunately, current memory compression solutions face two significant challenges. First, keeping memory compressed requires additional memory accesses, sometimes on the critical path, which can cause performance overheads. Second, they require changing the operating system to take advantage of the increased capacity, and to handle incompressible data, which delays deployment. We propose Compresso, a hardware memory compression architecture that minimizes memory overheads due to compression, with no changes to the OS. We identify new data-movement trade-offs and propose optimizations that reduce additional memory movement to improve system efficiency. We propose a holistic evaluation for compressed systems. Our results show that Compresso achieves a 1.85x compression for main memory on average, with a 24% speedup over a competitive hardware compressed system for single-core systems and 27% for multi-core systems. As compared to competitive compressed systems, Compresso not only reduces performance overhead of compression, but also increases performance gain from higher memory capacity. I. I N T R O D U C T I O N Memory compression can improve performance and reduce cost for systems with high memory demands, such as those used for machine learning, graph analytics, databases, gaming, and autonomous driving. We present Compresso, the first compressed main-memory architecture that: (1) explicitly optimizes for new trade-offs between compression mechanisms and the additional data movement required for their implementation, and (2) can be used without any modifications to either applications or the operating system. Compressing data in main memory increases its effective capacity, resulting in fewer accesses to secondary storage, thereby boosting performance. Fewer I/O accesses also improve tail latency [1] and decrease the need to partition tasks across nodes just to reduce I/O accesses [2, 3]. Additionally, transferring compressed cache lines from memory requires fewer bytes, thereby reducing memory bandwidth usage. The saved bytes may be used to prefetch other data [4, 5], or may

An analytical model for software-only main memory compression

Proceedings of the 3rd workshop on Memory performance issues in conjunction with the 31st international symposium on computer architecture - WMPI '04, 2004

Many applications with large data spaces that cannot run on a typical workstation (due to page faults) call for techniques to expand the effective memory size. One such technique is memory compression. Understanding what applications under what conditions can benefit from main memory compression is complicated due to various tradeoffs and the dynamic characteristics of applications. For instance, a large area to store compressed data increases the effective memory size considerably but also decreases the amount of memory that can hold uncompressed data. This paper presents an analytical model that states the conditions for a compressed-memory system to yield performance improvements. Parameters of the model are the compression algorithm efficiency, the amount of data being compressed, and the application memory access pattern. Such a model can be used by an operating system to compute the size of the compressed-memory level that can improve an application's performance.

Operating system support for fast hardware compression of main memory contents

2000

A novel computer system hardware has been built for compressing main memory contents. This presents to the operating systems an expanded real memory larger than the physically available memory. Two to one or better compression ratio has been observed for most applications. As the compression ratio of applications dynamically changes so does the real memory size that is managed by the OS. In this paper, we describe and evaluate the operating system techniques developed for compressed memory systems that can deal with such dynamically changing memory size conditions.

Transparent Dual Memory Compression Architecture

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2017

The increasing memory requirements of big data applications have been driving the precipitous growth of memory capacity in server systems. To maximize the efficiency of external memory, HW-based memory compression techniques have been proposed to increase effective memory capacity. Although such memory compression techniques can improve the memory efficiency significantly, a critical trade-off exists in the HW-based compression techniques. As the memory blocks need to be decompressed as quickly as possible to serve cache misses, latency-optimized techniques apply compression at the cacheline granularity, achieving the decompression latency of less than a few cycles. However, such latency-optimized techniques can lose the potential high compression ratios of capacity-optimized techniques, which compress larger memory blocks with longer latency algorithms. Considering the fundamental trade-off in the memory compression, this paper proposes a transparent dual memory compression (DMC) architecture, which selectively uses two compression algorithms with distinct latency and compression characteristics. Exploiting the locality of memory accesses, the proposed architecture compresses less frequently accessed blocks with a capacity-optimized compression algorithm, while keeping recently accessed blocks compressed with a latencyoptimized one. Furthermore, instead of relying on the support from the virtual memory system to locate compressed memory blocks, the study advocates a HW-based translation between the uncompressed address space and compressed physical space. This OS-transparent approach eliminates conflicts between compression efficiency and large page support adopted to reduce TLB misses. The proposed compression architecture is applied to the Hybrid Memory Cube (HMC) with a logic layer under the stacked DRAMs. The experimental results show that the proposed compression architecture provides 54% higher compression ratio than the state-of-the-art latency-optimized technique, with no performance degradation over the baseline system without compression.

Effective algorithms for cache-level compression

Proceedings of the 11th Great Lakes symposium on VLSI, 2001

Compression at the cache level has the potential to increase microprocessor performance by decreasing the cache miss rate and increasing the e ective bandwidth by transmitting compressed data. This paper presents four compression algorithms that would be suitable for use in a compressed cache architecture and shows the results of using them to compress SPEC95 benchmarks. These algorithms exhibit a 7.8% to 99.8% improvement in compression ratio over an algorithm known to be e ective for cache compression.