System-Level Performance Analysis in SystemC (original) (raw)

High-performance timing simulation of embedded software

Proceedings of the 45th annual conference on Design automation - DAC '08, 2008

This paper presents an approach for cycle-accurate simulation of embedded software by integration in an abstract SystemC model. Compared to existing simulation-based approaches, we present a hybrid method that resolves performance issues by combining the advantages of simulation-based and analytical approaches. In a first step, cycle-accurate static execution time analysis is applied at each basic block of a cross-compiled binary program using static processor models. After that, the determined timing information is back-annotated into SystemC for fast simulation of all effects that can not be resolved statically. This allows the consideration of data dependencies during run-time and the incorporation of branch prediction and cache models by efficient source code instrumentation. The major benefit of our approach is that the generated code can be executed very efficiently on the simulation host with approximately 90% of the speed of the untimed software without any code instrumentation.

Transformation of SDL specifications for system-level timing analysis

Proceedings of the tenth international symposium on Hardware/software codesign - CODES '02, 2002

Complex embedded systems are typically specified using multiple domain-specific languages. After code-generation, the implementation is simulated and tested. Validation of non-functional properties, in particular timing, remains a problem because full test coverage cannot be achieved for realistic designs. The alternative, formal timing analysis, requires a system representation based on key application and architecture properties. These properties must first be extracted from a system specification to enable analysis. In this paper we present a suitable transformation of SDL specifications for system-level timing analysis. We show ways to vary modeling accuracy in order to apply available formal techniques. A practical approach utilizing a recently developed system model is presented.

A SW performance estimation framework for early system-level-design using fine-grained instrumentation

Proceedings of the Design Automation & Test in Europe Conference, 2006

The increasing demands of high-performance in embedded applications under shortening time-to-market has prompted system architects in recent time to opt for Multi-Processor Systems-on-Chip (MP-SoCs) employing several programmable devices. The programmable cores provide a high amount of flexibility and reusability, and can be optimized to the requirements of the application to deliver highperformance as well. Since application software forms the basis of such designs, the need to tune the underlying SoC architecture for extracting maximum performance from the software code has become imperative.

A QEMU and SystemC-Based Cycle-Accurate ISS for Performance Estimation on SoC Development

IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems, 2011

In this paper, we present a fast cycle-accurate instruction set simulator (CA-ISS) for system-on-chip development based on QEMU and SystemC. Even though most state-of-the-art commercial tools have tried very hard to provide all the levels of details to satisfy the different requirements of the software designer, the hardware designer, and even the system architect, the hardware/software co-simulation speed is dramatically slow when co-simulating the hardware models at the register-transfer level (RTL) with a full-fledged operating system (OS). Our experimental results show that the combination of QEMU and SystemC can make the co-simulation at the CA level much faster than the conventional RTL simulation, even with a full-fledged operating system up and running. Furthermore, the statistics indicate that with every instruction executed and every memory accessed since power-on traced at the CA level, it takes 28m15.804s on average to boot up a full-fledged Linux kernel, even on a personal computer. Compared to the kernel boot time reported by Xilinx and SiCortex, the proposed CA-ISS is about 6.09 times faster compared to “SystemC without trace” of Xilinx and about 30.32 times faster compared to “SystemC models converted from RTL” of SiCortex. The main contributions of this paper are threefold: 1) a hardware/software co-simulation environment capable of running a full-fledged OS at the early stage of the electronic system level design flow at an acceptable simulation speed is proposed; 2) a virtual platform constructed using the proposed CA-ISS as the processor model can be used to estimate the performance of a target system from system perspective, which all the previous works, such as QEMU-SystemC, do not provide; and 3) such a virtual platform also provides the modeling capability from the transaction level down to the CA level or the other way around.

A high level mixed hardware/software modeling framework for rapid performance estimation

10th IEEE International NEWCAS Conference, 2012

Multi-core processors are more and more present in the embedded and real-time world. This paper introduces a code generator software applied to the benchmarking of embedded platforms. This solution creates an application runnable on embedded multicore platform and compliant with POSIX or Xenomai interface. The execution outputs a trace of the execution of each thread of the benchmark. It is also used to verify the latency of interruption and the preemption for real-time platforms.

Software timing analysis using HW/SW cosimulation and instruction set simulator

1998

Abstract Timing analysis for checking satisfaction of constraints is a crucial problem in real-time system design. In some current approaches, the delay of software modules is precalculated by a software performance estimation method, which is not accurate enough for hard real-time systems and complicated designs. In this paper, we present an approach to integrate a clock-cycle-accurate instruction set simulator (ISS) with a fast event-based system simulator.

Automatic generation of fast timed simulation models for operating systems in SoC design

Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition

To enable fast and accurate evaluation of HW/SW implementation choices of on-chip communication, we present a method to automatically generate timed OS simulation models. The method generates the OS simulation models with the simulation environment as a virtual processor. Since the generated OS simulation models use final OS code, the presented method can mitigate the OS code equivalence problem. The generated model also simulates different types of processor exceptions. This approach provides two orders of magnitude higher simulation speedup compared to the simulation using instruction set simulators for SW simulation.

Hybrid Timing Analysis of Modern Processor Pipeline via Hardware/Software Interactions

Embedded systems are often subject to constraints that require determinism to ensure that task deadlines are met. Such systems are referred to as real-time systems. Schedulability analysis provides a firm basis to ensure that tasks meet their deadlines for which knowledge of worst-case execution time (WCET) bounds is a critical piece of information. Static timing analysis techniques are used to derive these WCET bounds. A limiting factor for designing real-time systems is the class of processors that can be used. Typically, modern, complex processor pipelines cannot be used in real-time systems design. Contemporary processors with their advanced architectural features, such as out-of-order execution, branch prediction, speculation, prefetching, etc., cannot be statically analyzed to obtain tight WCET bounds for tasks. The main reason is that these features introduce non-determinism to task execution that surfaces in full only at runtime. In this paper, we introduce a new paradigm to perform timing analysis of tasks for real-time systems running on modern processor architectures. We propose minor enhancements to the processor architecture to enable this process. These features, on interaction with software modules, are able to obtain tight, accurate timing analysis results for modern processors. We also briefly present analysis techniques that, combined with our timing analysis methods, reduce the complexity of worst-case estimations for loops. To the best of our knowledge, this method of constant interactions between hardware and software to calculate WCET bounds for out-of-order processors is the first of its kind.