PDS: A Performance Database Server (original) (raw)

Conventional benchmarks as a sample of the performance spectrum

Proceedings of the Thirty-First Hawaii International Conference on System Sciences

Most benchmarks are smaller than actual application programs. One reason is to improve benchmark universality by demanding resources every computer is likely to have. But users dynamically increase the size of application programs to match the power available, whereas most benchmarks are static and of a size appropriate for computers available when the benchmark was created; this is particularly true for parallel computers. Thus, the benchmark overstates computer performance, since smaller problems spend more time in cache. Scalable benchmarks, such as HINT, examine the full spectrum of performance through various memory regimes, and express a superset of the information given by any particular fixedsize benchmark. Using 5,000 experimental measurements, we have found that performance on the NAS Parallel Benchmarks, SPEC, LINPACK, and other benchmarks is predicted accurately by subsets of HINT performance curve. Correlations are typically better than 0.995, and predicted ranking is often perfect.

Pilot: A Framework that Understands How to Do Performance Benchmarks the Right Way

Carrying out even the simplest performance benchmark requires considerable knowledge of statistics and computer systems, and painstakingly following many error-prone steps, which are distinct skill sets yet essential for getting statistically valid results. As a result, many performance measurements in peer-reviewed publications are flawed. Among many problems, they fall short in one or more of the following requirements: accuracy, precision, comparability, repeatability, and control of overhead. This is a serious problem because poor performance measurements misguide system design and optimization. We propose a collection of algorithms and heuristics to automate these steps. They cover the collection, storing, analysis, and comparison of performance measurements. We also implement these methods as a readily-usable open source software framework called Pilot, which can help to reduce human error and shorten benchmark time. Evaluation of Pilot with various benchmarks show that it can reduce the cost and complexity of running benchmarks, and can produce better measurement results.

Overview of Common Parallel Benchmark Applications and Suites

Journal of Applied Computer Science & Mathematics, 2022

Abstract – In the field of computing, there has been a growing interest in comparing supercomputers in terms of performance and how scalable computer systems are. New generation of massively parallel systems can now match the computing power of traditional shared memory and multiprocessor architectures. To be considered ideally parallel, a computer system must be linearly scalable and capable of deal with progressively more complex problems by increasing the number of processors while also maintaining machine efficiency and keeping execution time constant. In distributed systems, time-critical and high- performance nodes are required to meet the growing demands for storing, processing, and retrieving massive amounts of data while keeping the server's throughput in mind. In this paper, we present some of the nine (9) benchmark suites aimed at evaluating the performance of highly parallel supercomputer systems. These performances can come in the form of teams of speedups, scalability, or efficiency.

Universal benchmark suites

MASCOTS '99. Proceedings of the Seventh International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, 1999

We present concepts, design, and use of universal benchmark suites. Such suites consist of benchmark programs that represent all relevant characteristic types of workload. The execution times of individual benchmarks and appropriate weights can be used to compute global performance indicators that reflect a spectrum of specific compound workloads. All such global indicators can be obtained from the same universal benchmark suite. This approach substantially reduces the cost of benchmarking.

lmbench: Portable tools for performance analysis

1996

lmbench is a micro-benchmark suite designed to focus attention on the basic building blocks of many common system applications, such as databases, simulations, software development, and networking. In almost all cases, the individual tests are the result of analysis and isolation of a customer's actual performance problem. These tools can be, and currently are, used to compare different system implementations from different vendors. In several cases, the benchmarks have uncovered previously unknown bugs and design flaws. The results have shown a strong correlation between memory system performance and overall performance. lmbench includes an extensible database of results from systems current as of late 1995.

SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance

OpenMP Shared Memory Parallel Programming, 2001

We present a new benchmark suite for parallel computers. SPEComp targets mid-size parallel servers. It includes a number of science/engineering and data processing applications. Parallelism is expressed in the OpenMP API. The suite includes two data sets, Medium and Large, of approximately 1.6 and 4 GB in size. Our overview also describes the organization developing SPEComp, issues in creating OpenMP parallel benchmarks, the benchmarking methodology underlying SPEComp, and basic performance characteristics.

Towards using and improving the NAS parallel benchmarks

Proceedings of the 2010 Workshop on Parallel Programming Patterns, 2010

The NAS parallel benchmarks, originally developed by NASA for evaluating performance of their high-performance computers, have been regarded as one of the most widely used benchmark suites for side-by-side comparisons of high-performance machines. However, even though the NAS parallel benchmarks have grown tremendously in the last two decades, documentation is lagging behind because of rapid changes and additions to the collection of benchmark codes primarily due to rapid innovation of parallel architectures. Consequently, the learning curve for beginning graduate students, researchers, or software systems engineers to pick up these benchmarks is typically huge. In this paper, we document and assess the NAS parallel benchmark suite by identifying parallel patterns within the NAS benchmark codes. We believe that such documentation of the benchmarks will allow researchers as well as those in industry to understand, use and modify these codes more effectively.

Systematic Construction, Execution, and Reproduction of Complex Performance Benchmarks

Cloud Computing – CLOUD 2019, 2019

In this work, we present the next generation of the Elba toolkit available under a Beta release, showing how we have used it for experimental research in computer systems using RUBBoS, a well-known n-tier system benchmark, as example. In particular, we show how we have leveraged milliScope-Elba toolkit's monitoring and instrumentation framework-to collect log data from benchmark executions at unprecedented fine-granularity, as well as how we have specified benchmark workflows with WED-Make-a declarative workflow language whose main characteristic is to facilitate the declaration of dependencies. We also show how to execute WED-Makefiles (i.e., workflow specifications written with WED-Make), and how we have successfully reproduced the experimental verification of the millibottleneck theory of performance bugs in multiple cloud environments and systems.