Control flow speculation in multiscalar processors (original) (raw)

Control-flow speculation through value prediction for superscalar processors

AbstractÐIn this paper, we introduce a new branch predictor that predicts the outcome of branches by predicting the value of their inputs and performing an early computation of their results according to the predicted values. The design of a hybrid predictor comprising the above branch predictor and a correlating branch predictor is presented. We also propose a new selector that chooses the most reliable prediction for each branch. This selector is based on the path followed to reach the branch. Results for immediate updates show significant misprediction rate reductions with respect to a conventional hybrid predictor for different size configurations. In addition, the proposed hybrid predictor with a size of 8 KB achieves the same accuracy as a conventional one of 64 KB. Performance evaluation for a dynamically scheduled superscalar processor, with realistic updates, shows a speed up of 8 percent despite its higher latency (up to four cycles).

Control speculation in multithreaded processors through dynamic loop detection

Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture, 1998

This paper presents a mechanism to dynamically detect the loops that are executed in a program. This technique detects the beginning and the termination of the iterations and executions of the loops without compiler/user intervention. We propose to apply this dynamic loop detection to the speculation of multiple threads of control dynamically obtained from a sequential program. Based on the highly predictable behavior of the loops, the history of the past executed loops is used to speculate the future instruction sequence. The overall objective is to dynamically obtain coarse grain parallelism (at the thread level) that can be exploited by a multithreaded architecture. We show that for a 4-context multithreaded processor, the speculation mechanism provides around 2.6 concurrent threads in average.

Control-flow speculation through value prediction

Ieee Transactions on Computers, 2001

AbstractÐIn this paper, we introduce a new branch predictor that predicts the outcome of branches by predicting the value of their inputs and performing an early computation of their results according to the predicted values. The design of a hybrid predictor comprising the above branch predictor and a correlating branch predictor is presented. We also propose a new selector that chooses the most reliable prediction for each branch. This selector is based on the path followed to reach the branch. Results for immediate updates show significant misprediction rate reductions with respect to a conventional hybrid predictor for different size configurations. In addition, the proposed hybrid predictor with a size of 8 KB achieves the same accuracy as a conventional one of 64 KB. Performance evaluation for a dynamically scheduled superscalar processor, with realistic updates, shows a speed up of 8 percent despite its higher latency (up to four cycles).

Compiler analysis for trace-level speculative multithreaded architectures

… between Compilers and …, 2005

Trace-Level Speculative Multithreaded Processors exploit trace-level speculation by means of two threads working cooperatively. One thread, called the speculative thread, executes instructions ahead of the other by speculating on the result of several traces. The other thread executes speculated traces and verifies the speculation made by the first thread. In this paper, we propose a static program analysis for identifying candidate traces to be speculated. This approach identifies large regions of code whose live-output values may be successfully predicted. We present several heuristics to determine the best opportunities for dynamic speculation, based on compiler analysis and program profiling information. Simulation results show that the proposed trace recognition techniques achieve on average a speed-up close to 38% for a collection of SPEC2000 benchmarks.

Towards a compiler framework for thread-level speculation

2011

Speculative parallelization techniques allow to extract parallelism of fragments of code that can not be analyzed at compile time. However, research on software-based, thread-level speculation will greatly benefit from an appropriate compiler framework for easy prototyping and further development of new techniques. This paper presents an experimental XML-based compilation framework to handle speculative parallelization of C code. The framework extends Cetus, a source-to-source C compiler, to build an XML tree based on the Cetus Internal Representation of the source code. Other modules of our framework rely on XPath and XSLT capabilities to process the XML tree generated, to perform analysis on the use of variables and to augment the original code for software-based, speculative parallel execution. The use of the current version of our framework allows a fast prototyping of new analysis and transformation solutions, with a reduction of around 83% on the number of code lines needed with respect to the direct use of Cetus for the same purpose. To show the possibilities of this framework, we present an automatically-generated classification of loops for several SPEC CPU2006 C benchmarks. This classification is useful to better understand the potential benefits derived from the use of speculative parallelization techniques. The development framework presented here is freely available under request.

Softspec: Software-Based Speculative Parallelism via Stride Prediction

Master's thesis, MIT, 1999

Abstract: We introduce Softspec, an all-software, speculation based approach to automatic parallelization of sequential applications. Softspec parallelizes loops containing stride-predictable memory references, without resorting to complex compiler analyses, special hardware support. By ...

Role of Multiblocks in Control Flow Prediction using Parallel Register Sharing Architecture

International Journal of Computer Applications, 2010

In this paper we present control flow prediction (CFP) in parallel register sharing architecture to achieve high degree of ILP. The main idea behind this concept is to use a step beyond the prediction of common branch and permitting the architecture to have the information about the CFG (Control Flow Graph) components of the program to have better branch decision for ILP. The navigation bandwidth of prediction mechanism depends upon the degree of ILP. It can be increased by increasing control flow prediction at compile time. By this the size of initiation is increased that allows the overlapped execution of multiple independent flow of control. The multiple branch instruction can also be allowed. These are intermediate steps to be taken in order to increase the size of dynamic window to achieve a high degree of instruction level parallelism exploitation.

Control flow prediction for dynamic ILP processors

1993

Abstract We introduce a technique to enhance the ability of dynamic ILP processors to exploit (speculatively executed) parallelism. Existing branch prediction mechanisms used to establish a dynamic window from which ILP can be extracted are limited in their abilities to:(i) create a large, accurate dynamic window,(ii) initiate a large number of instructions into this window in every cycle, and (iii) traverse multiple branches of the control flow graph per prediction.

Compiler-Assisted Dynamic Predicated Execution of Complex Control-Flow Structures

2006

Even after decades of research in branch prediction, branch predictors still remain imperfect, which results in signic ant performance loss in aggressive processors that support large instruction windows and deep pipelines. This paper proposes a new processor architecture for handling hard-to-predict branches, the diverge-merge processor. The goal of this paradigm is to eliminate branch mispredictions due to hard-to-predict dynamic branches by dynamically predicating them. To achieve this without incurring large hardware cost and complexity, the compiler identi es branches that are suitable for dynamic predication called diverge branches. The compiler also selects a controlow merge (or reconvergence) point corresponding to each diverge branch to aid dynamic predication. If a diverge branch is hard-to-predict at run-time, the microarchitecture dynamically predicates the instructions between the diverge branch and the corresponding merge point by r st executing one path after the bran...

Dynamic branch prediction and control speculation

International Journal of High Performance Systems Architecture, 2007

Branch prediction schemes have become an integral part of today's superscalar processors. They are one of the key issues in enhancing the performance of processors. Pipeline stalls due to conditional branches are one of the most significant impediments to realise the performance potential of superscalar processors. Many schemes for branch prediction, that can effectively and accurately predict the outcome of branch instructions have been proposed. In this paper, an overview of some dynamic branch prediction schemes for superscalar processors are presented. . His main research interests are in algorithms, computational complexity and parallel computing.