Hugues Cassé - Profile on Academia.edu (original) (raw)

Papers by Hugues Cassé

arXiv (Cornell University), Jul 15, 2022

Worst-Case Execution Time (WCET) is a key component for the verification of critical real-time ap... more Worst-Case Execution Time (WCET) is a key component for the verification of critical real-time applications. Yet, even the simplest microprocessors implement pipelines with concurrently-accessed resources, such as the memory bus shared by fetch and memory stages. Although their in-order pipelines are, by nature, very deterministic, the bus can cause out-of-order accesses to the memory and, therefore, timing anomalies: local timing effects that can have global effects but that cannot be easily composed to estimate the global WCET. To cope with this situation, WCET analyses have to generate important overestimations in order to preserve safety of the computed times or have to explicitly track all possible executions. In the latter case, the presence of out-of-order behavior leads to a combinatorial blowup of the number of pipeline states for which efficient state abstractions are difficult to design. This paper proposes instead a compact and exact representation of the timings in the pipeline, using eXecution Decision Diagram (XDD) [1]. We show how XDD can be used to model pipeline states all along the execution paths by leveraging the algebraic properties of XDD. This computational model allows to compute the exact temporal behavior at control flow graph level and is amenable to efficiently and precisely support WCET calculation in presence of out-of-order bus accesses. This model is finally experimented on the TACLe benchmark suite and we observe good performance making this approach appropriate for industrial applications.

Case study: Performance and WCET analysis for parallelised avionic applications with ODC²

In a hard Real-Time (HRT) domain such as avionics, the high application performance is as importa... more In a hard Real-Time (HRT) domain such as avionics, the high application performance is as important as delivering a predictable execution time. More precisely, the performance is defined by the application Worst-Case Execution Time (WCET). A common practice to boost the application performance in general purpose computing is by parallelisation and parallel execution on a shared memory multicore processor. Hence, local caches, used for bridging the long memory latency, need to allow coherent accesses to shared data. Conventional cache coherence protocols impede a suitable timing analysis because of multiple reasons. In this paper, we introduce an avionics case study to analyse the applicability of the earlier proposed On-Demand Coherent Cache (ODC2). We experiment with a 3D Path Planning (3DPP) application executed on a multicore processor. By varying the number of cores and the level of application parallelism, we compare and analyse observed average case execution times (ACET) of the 3DPP application with ODC2, Uncached (bypassing the cache for shared data), and Cache Flush (software-triggered cache invalidation) configurations. The ACET results of the 3DPP application suggest that ODC2 significantly outperforms the Uncached configuration by 1.53 times and Cache Flush by 2.15 times. Furthermore, we study the WCET speedup of the 3DPP application by applying a static analysis OTAWA tool. In terms of worst-case performance, the ODC2 achieves a speedup of 1.63 compared to Uncached and 3.17 compared to Cache Flush configurations.

The analysis of the worst-case execution times is necessary in the design of critical real-time s... more The analysis of the worst-case execution times is necessary in the design of critical real-time systems. To get sound and precise times, the WCET analysis for these systems must be performed on binary code and based on static analysis. OTAWA, a tool providing WCET computation, uses the Sim-nML language to describe the instruction set and XML files to describe the microarchitecture. The latter information is usually inadequate to describe real architectures and, therefore, requires specific modifications, currently performed by hand, to allow correct time calculation. In this paper, we propose to extend Sim-nML in order to support the description of modern microarchitecture features along the instruction set description and to seamlessly derive the time calculation. This time computation is specified as a constraint solving problem that is automatically synthesized from the extended Sim-nML. Thanks to its declarative aspect, this approach makes easier and modular the description of complex features of microprocessors while maintaining a sound process to compute times.

Lecture Notes in Computer Science, 2010

The analysis of worst-case execution times has become mandatory in the design of hard real-time s... more The analysis of worst-case execution times has become mandatory in the design of hard real-time systems: it is absolutely necessary to know an upper bound of the execution time of each task to determine a task schedule that insures that deadlines will all be met. The OTAWA toolbox presented in this paper has been designed to host algorithms resulting from research in the domain of WCET analysis so that they can be combined to compute tight WCET estimates. It features an abstraction layer that decouples the analyses from the target hardware and from the instruction set architecture, as well as a set of functionalities that facilitate the implementation of new approaches.

Implicit Path Enumeration Technique (IPET) is currently largely used to compute Worst Case Execut... more Implicit Path Enumeration Technique (IPET) is currently largely used to compute Worst Case Execution Time (WCET) by modeling control flow and architecture using integer linear programming (ILP). As precise architecture effects requires a lot of constraints, the super-linear complexity of the ILP solver makes computation times bigger and bigger. In this paper, we propose to split the control flow of the program into smaller parts where a local WCET can be computed faster-as the resulting ILP system is smaller-and to combine these local results to get the overall WCET without loss of precision. The experimentation in our tool OTAWA with lp_solve solver has shown an average computation improvement of 6.5 times.

Worst-Case Execution Time Analysis, 2009

Validation of embedded hard real-time systems requires the computation of the Worst Case Executio... more Validation of embedded hard real-time systems requires the computation of the Worst Case Execution Time (WCET). Although these systems make more and more use of Components Off The Shelf (COTS), the current WCET computation methods are usually applied to whole programs: these analysis methods require access to the whole system code, that is incompatible with the use of COTS. In this paper, after discussing the specific cases of the loop bounds estimation and the instruction cache analysis, we show in a generic way how static analysis involved in WCET computation can be pre-computed on COTS in order to obtain component partial results. These partial results can be distributed with the COTS, in order to compute the WCET in the context of a full application. We describe also the information items to include in the partial result, and we propose an XML exchange format to represent these data. Additionally, we show that the partial analysis enables us to reduce the analysis time while introducing very little pessimism.

Following the successful WCET Tool Challenges in 2006 and 2008, the third event in this series wa... more Following the successful WCET Tool Challenges in 2006 and 2008, the third event in this series was organized in 2011, again with support from the ARTIST DESIGN Network of Excellence. Following the practice established in the previous Challenges, the WCET Tool Challenge 2011 (WCC'11) defined two kinds of problems to be solved by the Challenge participants with their tools, WCET problems, which ask for bounds on the execution time, and flow-analysis problems, which ask for bounds on the number of times certain parts of the code can be executed. The benchmarks to be used in WCC'11 were debie1, PapaBench, and an industrial-strength application from the automotive domain provided by Daimler. Two default execution platforms were suggested to the participants, the ARM7 as "simple target" and the MPC5553/5554 as a "complex target," but participants were free to use other platforms as well. Ten tools participated in WCC'11: aiT, Astrée, Bound-T, FORTAS, METAMOC, OTAWA, SWEET, TimeWeaver, TuBound and WCA.

HAL (Le Centre pour la Communication Scientifique Directe), Jan 25, 2006

In this article, we present OTAWA, a framework for computing the Worst Case Execution Time of a p... more In this article, we present OTAWA, a framework for computing the Worst Case Execution Time of a program. From its design, it provides an extensible and open architecture whose objective is the implementation of existing and future static analyses for WCET computation. Inspired by existing generic tools, it is based on an architecture abstraction layers where hooked annotations store specific analyses information. Computing the WCET is viewed as performing a chain of analyses that use and produce annotations until getting the WCET evaluation. Finally, the efficiency of the framework, in term of development productivity, is evaluated by two case studies that show some pitfalls that we are currently fixing but also the success of the approach.

The current Worst Case Execution Time (WCET) computation methods are usually applied to whole pro... more The current Worst Case Execution Time (WCET) computation methods are usually applied to whole programs, this may drive to scalability limitations as the program size becomes bigger. A solution could be to split programs into components that could support separated partial analyses to decrease the computation time. The componentization is also consistent with the more and more frequent use of Component Off The Shelf (COTS). Consequently, we need algorithms to perform analyses on component-wise applications. In this paper, we focus on the partial analysis of set-associative instruction caches, based on the categorization method described by M. Alt et al. We have first evaluated A. Rakib et al.'s approach to this problem and we have shown that, while correct, this approach can be greatly improved by a better estimation of the component effect on the cache. The version we have developed addresses the identified shortcomings and the experimentation results have been evaluated according to two criteria: (1) overestimation of the WCET and (2) computation time gain against the whole program analysis approach.

Verification of SimNML instruction set description using co-simulation (2nd RISC-V Meeting, Paris, 01/10/19-03/10/19)

(slides)International audienceThe TRACES team at IRIT has developed a description of the RISC-V i... more (slides)International audienceThe TRACES team at IRIT has developed a description of the RISC-V instruction set in SimNML, which is an Architecture Description Language (ADL). GLISS automatically convert this description into a library supporting, among others, a runnable Instruction Set Simulator. This presentation exposes the validation of our RISC-V description by parallely running and checking the generated simulator with a different source of execution implementing the RISC-V (different simulator or real microprocessor). This work contributes to the confidence we can have into static analysis tools working on program binary representation. In such tools, the instruction set support is a boring and error-prone task whose validity is hard to assert. On the opposite, the SimNML description provides a golden model that is easier to write and that can be tested to detect errors. Once a sufficient level of confidence is obtained about the description, it can be processed automatically to derive properties useful for static analyses work

IEEE Transactions on Computers, 2023

We present MINOTAuR, an open-source RISC-V core designed to be timing predictable, i.e. free of t... more We present MINOTAuR, an open-source RISC-V core designed to be timing predictable, i.e. free of timing anomalies: this property enables a compositional timing analysis in a multicore context. MINOTAuR features speculative execution: thanks to a specific design of its pipeline, we formally prove that speculation does not break timing predictability while sensibly increasing performance. We propose architectural extensions that enable the use of a return address stack and of any cache replacement policy, which we implemented in the MINOTAuR core. We show that a trade-off can be found between the efficiency of these components and the overhead they incur on the die area consumption, and that using them yields a performance equivalent to that of the baseline RISC-V Ariane core, while also enforcing timing predictability.

This article presents the results of experimenting our OTAWA tool to compute WCETs on a real auto... more This article presents the results of experimenting our OTAWA tool to compute WCETs on a real automotive embedded application. First, we analyze the application (C source generated from Simulink models) and exhibit specific properties and their implication on the WCET computation. Then, two very different embedded processor architectures are tested and in both cases we show (1) how their specific features are supported by OTAWA and (2) how to configure them to maximize both performances and determinism 1 .

We present MINOTAuR, a timing predictable open source RISC-V core based on the Ariane core [28]. ... more We present MINOTAuR, a timing predictable open source RISC-V core based on the Ariane core [28]. We first modify Ariane in order to make it timing predictable following the approach used to design the SIC processor [12]. We prove that the instruction parallelism in the Ariane core does not prevent from enforcing timing predictability. We further relax restrictions by enabling a limited amount of speculative execution and we are still able to formally prove that the core is timing predictable. Experimental results show that the performance is reduced by only 10% on average compared to the original Ariane core.

These last years, many researchers have proposed solutions to estimate the Worst-Case Execution T... more These last years, many researchers have proposed solutions to estimate the Worst-Case Execution Time of a critical application when it is run on modern hardware. Several schemes commonly implemented to improve performance have been considered so far in the context of static WCET analysis: pipelines, instruction caches, dynamic branch predictors, execution cores supporting out-of-order execution, etc. Comparatively, components that are external to the processor have received lesser attention. In particular, the latency of memory accesses is generally considered as a fixed value. Now, modern DRAM devices support the open page policy that reduces the memory latency when successive memory accesses address the same memory row. This scheme, also known as row buffer, induces variable memory latencies, depending on whether the access hits or misses in the row buffer. In this paper, we propose an algorithm to take the open page policy into account when estimating WCETs for a processor with an instruction cache. Experimental results show that WCET estimates are refined thanks to the consideration of tighter memory latencies instead of pessimistic values.

HAL (Le Centre pour la Communication Scientifique Directe), Jan 6, 2008

Le développement d'un simulateur de processeur est long et fastidieux. Découpler la partie foncti... more Le développement d'un simulateur de processeur est long et fastidieux. Découpler la partie fonctionnelle (émulation) de la partie structure (analyse des temps de traitement) permet de réutiliser plus facilement du code existant (principalement le code d'émulation, les jeux d'instructions évoluant moins vite que les architectures matérielles). Dans ce contexte, plusieurs équipes ont proposé des solutions pour une génération automatique de la partie fonctionnelle d'un simulateur à partir d'une description plus ou moins formelle du jeu d'instructions. S'il est relativement aisé de générer automatiquement un émulateur pour l'architecture DLX, il s'avère plus compliqué de réaliser un générateur supportant à la fois des architectures de type CISC, RISC ou VLIW et produisant un code efficace. Dans cet article, nous décrivons plusieurs techniques mises en oeuvre dans l'outil GLISS que nous avons développé et qui se veut aussi « polyvalent » que possible.

IEEE Transactions on Computers

2007 International Symposium on Industrial Embedded Systems, 2007

Verification of SimNML instruction set description using co-simulation

HAL (Le Centre pour la Communication Scientifique Directe), Oct 1, 2019

Improving the First-Miss Computation in Set-Associative Instruction Caches