Application-specific file prefetching for multimedia programs (original) (raw)

Implementation and performance of integrated application-controlled file caching, prefetching, and disk scheduling

ACM Transactions on Computer Systems, 1996

As the performance gap between disks and microprocessors continues to increase, effective utilization of the file cache becomes increasingly important. Application-controlled file caching and prefetching can apply application-specific knowledge to improve file cache management. However, supporting application-controlled file caching and prefetching is nontrivial because caching and prefetching need to be integrated carefully, and the kernel needs to allocate cache blocks among processes appropriately. This article presents the design, implementation, and performance of a file system that integrates application-controlled caching, prefetching, and disk scheduling. We use a two-level cache management strategy. The kernel uses the LRU-SP (Least-Recently-Used with Swapping and Placeholders) policy to allocate blocks to processes, and each process integrates application-specific caching and prefetching based on the controlled-aggressive policy, an algorithm previously shown in a theoretical sense to be nearly optimal. Each process also improves its disk access latency by submitting its prefetches in batches so that the requests can be scheduled to optimize disk access performance. Our measurements show that this combination of techniques greatly improves the performance of the file system. We measured that the running time is reduced by 3% to 49% (average 26%) for single-process workloads and by 5% to 76% (average 32%) for multiprocess workloads.

Design and Implementation of a Predictive File Prefetching Algorithm

We have previously shown that the patterns in which files are accessed offer information that can accurately predict upcoming file accesses. Most modern caches ignore these patterns, thereby failing to use information that enables significant reductions in I/O latency. While prefetching heuristics that expect sequential accesses are often effective methods to reduce I/O latency, they cannot be applied across files, because the abstraction of a file has no intrinsic concept of a successor. This limits the ability of modern file systems to prefetch. Here we presents our implementation of a predictive prefetching system, that makes use of file access patterns to reduce I/O latency. Previously we developed a technique called Partitioned Context Modeling (PCM) [13] that efficiently models file accesses to reliably predict upcoming requests. We present our experiences in implementing predictive prefetching based on file access patterns. From the lessons learned we developed of a new technique Extended Partitioned Context Modeling (EPCM), which has even better performance. We have modified the Linux kernel to prefetch file data based on Partitioned Context Modeling and Extended Partitioned Context Modeling. With this implementation we examine how a prefetching policy, that uses such models to predict upcoming accesses, can result in large reductions in I/O latencies. We tested our implementation with four different application-based benchmarks and saw I/O latency reduced by 31% to 90% and elapsed time reduced by 11% to 16%. Ý tmk@cips.nokia.com.

Practical prefetching techniques for multiprocessor file systems

Distributed and Parallel Databases, 1993

Improvements in the processing speed of multiprocessors are outpacing improvements in the speed of disk hardware. Parallel disk I/O subsystems have been proposed as one way to dose the gap between processor and disk speeds. In a previous paper we showed that prefetching and caching have the potentT"al to deliver the performance benefits of parallel file systems to parallel applications. In this paper we describe experiments with practical prefetching policies that base decisions only on on-line reference history, and that can be implemented efficiently. We also test the ability of those policies across a range of architectural parameters.

Implementation and Performance of Integrated Application-Controlled Caching, Prefetching and Disk S

1995

As the performance gap between disks and microprocessors continues to increase, effective utilization of the file cache becomes increasingly important. Application-controlled file caching and prefetching can apply application-specific knowledge to improve file cache management. However, supporting application-controlled file caching and prefetching is nontrivial because caching and prefetching need to be integrated carefully, and the kernel needs to allocate cache blocks among processes appropriately. This article presents the design, implementation, and performance of a file system that integrates application-controlled caching, prefetching, and disk scheduling. We use a two-level cache management strategy. The kernel uses the LRU-SP (Least-Recently-Used with Swapping and Placeholders) policy to allocate blocks to processes, and each process integrates application-specific caching and prefetching based on the controlled-aggressive policy, an algorithm previously shown in a theoretical sense to be nearly optimal. Each process also improves its disk access latency by submitting its prefetches in batches so that the requests can be scheduled to optimize disk access performance. Our measurements show that this combination of techniques greatly improves the performance of the file system. We measured that the running time is reduced by 3% to 49% (average 26%) for single-process workloads and by 5% to 76% (average 32%) for multiprocess workloads.

Adaptive prefetching for device-independent file I/O

Multimedia Computing and Networking 1998, 1997

Device independent I O has been a holy grail to operating system designers since the early days of UNIX. Unfortunately, existing operating systems fall short of this goal for multimedia applications. Techniques such as caching and sequential read-ahead can help mask I O latency in some cases, but in others they increase latency and add substantial jitter. Multimedia applications, such as video players, are sensitive t o v agaries in performance since I O latency and jitter a ect the quality of presentation. Our solution uses adaptive prefetching to reduce both latency and jitter. Applications submit le access plans to the prefetcher, which then generates I O requests to the operating system and manages the bu er cache to isolate the application from variations in device performance. Our experiments show device independence can be achieved: an MPEG video player sees the same latency when reading from a local disk or an NFS server. Moreover, our approach reduces jitter substantially.

Practical Prefetching Techniques for Parallel File Systems

Conference on Parallel andDistributed Information Systems, 1991

Improvements in the processing speed of multiprocessorsare outpacing improvements in the speed ofdisk hardware. Parallel disk I/O subsystems have beenproposed as one way to close the gap between processorand disk speeds. In a previous paper we showedthat prefetching and caching have the potential to deliverthe performance benefits of parallel file systems toparallel applications. In this paper we describe experimentswith practical

Improving Data Prefetching Efficacy in Multimedia Applications

The workload of multimedia applications has a strong impact on cache memory performance, since the locality of memory references embedded in multimedia programs differs from that of traditional programs. In many cases, standard cache memory organization achieves poorer performance when used for multimedia. A widely-explored approach to improve cache performance is hardware prefetching, which allows the pre-loading of data in the cache before they are referenced. However, existing hardware prefetching approaches are unable to exploit the potential improvement in performance, since they are not tailored to multimedia locality. In this paper we propose novel effective approaches to hardware prefetching to be used in image processing programs for multimedia. Experimental results are reported for a suite of multimedia image processing programs including MPEG-2 decoding and encoding, convolution, thresholding, and edge chain coding.

A combined hardware/software solution for stream prefetching in multimedia applications

Proc. of SPIE Multimedia …, 1998

Prefetch techniques may, in general, be applied to reduce the miss rate of a processor's data cache and thereby improve the overall performance of the processor. More in particular, stream prefetch techniques can be applied to prefetch data streams that are often encountered in multimedia applications. Stream prefetch techniques exploit the fact that data from such streams are often accessed in a regular fashion. Implementing a stream prefetch technique involves two issues, viz. stream detection and stream prefetching. Both problems can be solved both in hardware and in software. This paper presents a combined hardware/software stream prefetch technique. A special stream-prefetch instruction is introduced to alert the hardware that load instructions access a data stream. Subsequently, prefetching is handled by the hardware automatically in such a way that the rate at which the data is prefetched is synchronized with the rate at which the prefetched data is processed by the application. These kinds of stream prefetch techniques have been proposed earlier but use instruction addresses for synchronization. The technique that is introduced in this paper uses a different synchronization mechanism that does not suffer from drawbacks of instruction address synchronization.

Reducing Seek Overhead with Application-Directed Prefetching

An analysis of performance characteristics of modern disks finds that prefetching can improve the performance of nonsequential read access patterns by an order of magnitude or more, far more than demonstrated by prior work. Using this analysis, we design prefetching algorithms that make effective use of primary memory, and can sometimes gain additional speedups by reading unneeded data. We show when additional prefetching memory is most critical for performance. A contention controller automatically adjusts prefetching memory usage, preserving the benefits of prefetching while sharing available memory with other applications. When implemented in a library with some kernel changes, our prefetching system improves performance for some workloads of the GIMP image manipulation program and the SQLite database by factors of 4.9x to 20x.

Temporal analysis of cache prefetching strategies for multimedia applications

2001

Prefetching is a widely adopted technique for improving performance of cache memories. Performances are typically affected by the design parameters, such as cache size and associativity, but also by the type of locality embodied in the programs. In particular multimedia tools and programs handling images and video are characterized by a bi-dimensional spatial locality that could be greatly exploited by the inclusion of prefetching in the cache architecture. In this paper we compare some prefetching techniques for multimedia programs (such as MPEG compression, image processing, visual object segmentation) by performing a detailed evaluation of the memory access time. The goal is to prove that a significant speedup can be achieved by using either standard prefecthing techniques (such as OBL or adaptive prefetching) or some innovative and imageoriented prefetching methods, like the neighbor prefetching described in the paper. Performance are measured with the PRIMA trace-driven simulator.