Evaluating the efficacy of statistical simulation for design space exploration (original) (raw)
Related papers
EMSim: An Extensible Simulation Environment for Studying High Performance Microarchitectures
Modern microprocessors achieve high p erformance through the use of speculative execution and mechanisms to exploit instruc- tion level parallelism. Performance evaluation of such architec- tures is generally made using d etailed, cycle-by-cycle simula- tion. Since detailed simulation is s low, the design o f r ecent simulators has been focused on developing fast simulation en- gines. However, these optimized simulators are difficult t o modify or extend. In addition, intensive benchmarking is re- quired to v alidate simulation performance results. This task consumes a significant amount of time even if very fast simula- tors are used. This paper presents a novel simulation environment to study high p erformance microarchitectures. This environment con- sists of an extensible simulator for superscalar architectures and a group o f utilities to p erform benchmarking in p arallel. The new simulator developed has features that are not found in other simulators reported in the literatur...
A statistical performance model of the opteron processor
ACM SIGMETRICS Performance Evaluation Review, 2011
Cycle-accurate simulation is the dominant methodology for processor design space analysis and performance prediction. However, with the prevalence of multi-core, multi-threaded architectures, this method has become highly impractical as the sole means for design due to its extreme slowdowns. We have developed a statistical technique for modeling multicore processors that is based on Monte Carlo methods. Using this method, processor models of contemporary architectures can be developed and applied to performance prediction, bottleneck detection, and limited design space analysis. To date, we have accurately modeled the IBM Cell, the Intel Itanium, and the Sun Niagara 1 and Niagara 2 processors [23, 22, 8]. In this paper, we present a work in progress which is applying this methodology to an out-of-order execution processor. We present the initial single-core model and results for the AMD Barcelona (Opteron) processor.