Bridging the domains of high-level and logic synthesis (original) (raw)

High-level synthesis with behavioral level multi-cycle path analysis

High-level synthesis (HLS) tools generate register transfer level (RTL) hardware descriptions through a process of resource allocation, scheduling and binding. Intuitively, RTL quality influences the logic synthesis quality. Specifically, the achievable clock rate, area, and latency in clock cycles will be determined by the RTL description. However, not all paths should receive equal logic synthesis effort-multi-cycle paths represent an opportunity to spend logic synthesis effort elsewhere to achieve better design quality. In this paper, we perform multi-cycle optimisation on chained functional operations. We couple HLS and logic synthesis synergistically so multi-cycle paths can be identified and optimised coherently across both behavioral and logic levels. In addition, we perform multi-cycle path analysis at the behavioral level efficiently. We prove that our technique examines all reachable circuit state and finds multi-cycle paths including control flow and guarding conditions that improve the flexibility and power of the technique. Compared to LegUp, we achieve average 55% execution time improvement, 29% area improvement, and 68% time-area product improvement targeting FPGA architecture.

Data-flow transformations for critical path time reduction in high-level DSP synthesis

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 1993

Iterative, deterministic, digital signal processing algorithms can be represented by synchronous data-flow graphs. Data-flow graphs are used for scheduling and resource allocation during highlevel VLSI synthesis. Every data-flow graph has an associated critical path time which limits the achievable iteration period in critical-pathbased scheduling techniques. Unfolding, retiming, and pipelining transformations unravel hidden concurrency within data-flow graphs to reduce their critical path times.

High-level synthesis under I/O Timing and Memory constraints

2005 IEEE International Symposium on Circuits and Systems, 2005

The design of complex Systems-on-Chips implies to take into account communication and memory access constraints for the integration of dedicated hardware accelerator. In this paper, we present a methodology and a tool that allow the High-Level Synthesis of DSP algorithm, under both I/O timing and memory constraints. Based on formal models and a generic architecture, this tool helps the designer to find a reasonable trade-off between both the required I/O timing behavior and the internal memory access parallelism of the circuit. The interest of our approach is demonstrated on the case study of a FFT algorithm.

Optimizing memory hierarchy allocation with loop transformations for high-level synthesis

2012

For the majority of computation-intensive application systems, off-chip memory bandwidth is a critical bottleneck for both performance and power consumption. The efficient utilization of limited on-chip memory resources plays a vital role in reducing the off-chip memory accesses. This paper presents an efficient approach for optimizing the on-chip memory allocation by loop transformations in the imperfectly nested loops. We analytically model the on-chip buffer size and off-chip bandwidth after affine loop transformation, loop fusion/distribution and code motion. Branch-and-bound and knapsack reuse techniques are proposed to reduce the computation complexity in finding optimal solutions. Experimental results show that our scheme can save 40% of onchip memory size with the same bandwidth consumption compared to the previous approaches.

Loop Based Scheduling for High Level Synthesis

This paper describes a new loop based scheduling algorithm. The algorithm aims at reducing the runtime processing complexity of path based scheduling techniques. It partitions the control flow graph of the input specification into subgraphs before scheduling the different paths of each subgraph. Benchmark tests as well as simulation results on the scheduling algorithm indicate that the proposed algorithm results in sizeable reduction in runtime.

Coordinated transformations for high-level synthesis of high performance microprocessor blocks

Proceedings of the 39th conference on Design automation - DAC '02, 2002

High performance microprocessor designs are partially characterized by functional blocks consisting of a large number of operations that are packed into very few cycles (often single-cycle) with little or no resource constraints but tight bounds on the cycle time. Extreme parallelization, conditional and speculative execution of operations is essential to meet the processor performance goals. However, this is a tedious task for which classical high-level synthesis (HLS) formulations are inadequate and thus rarely used. In this paper, we present a new methodology for application of HLS targeted to such microprocessor functional blocks that can potentially speed up the design space exploration for microprocessor designs. Our methodology consists of a coordinated set of source-level and finegrain parallelizing compiler transformations that targets these behavioral descriptions, specifically loop constructs in them and enables efficient chaining of operations and high-level synthesis of the functional blocks. As a case study in understanding the complexity and challenges in the use of HLS, we walk the reader through the detailed design of an instruction length decoder drawn from the Pentium R -family of processors. The chief contribution of this paper is formulation of a domain-specific methodology for application of high-level synthesis techniques to a domain that rarely, if ever, finds use for it.

Timing analysis in high-level synthesis

IEEE/ACM International Conference on Computer-Aided Design, 1992

This papeT presents a comprehensive timing model GOT behavioral-level specifications and algotithms for timing analysis in high-level synthesis. It is based on a timing network which models the dda flow as well as the contTo1 flow in the behavioral input specification. The delay values GOT the network modules aTe created by invoking the same logic synthesis PTOCedUTe applied after behavioTa1 synthesis. The timing network is built only once GOT a given behavioral deSCTiptiOn. SeveTal parameters are used to explore di#eTent scheduling possibilities as well as different optimization modes (area, delay), without changing the network. The use of th$ timing model in conjunction with a path-based schedding algorithm is presented. Results fop several benchmarks attested the accwacy of this approach.

Behavioural transformation to improve circuit performance in high-level synthesis

2005

Early scheduling algorithms usually adjusted the clock cycle duration to the execution time of the slowest operation. This resulted in large slack times wasted in those cycles executing faster operations. To reduce the wasted times multi-cycle and chaining techniques have been employed. While these techniques have produced successful designs, its effectiveness is often limited due to the area increment that may derive from chaining, and the extra latencies that may derive from multicycling. In this paper we present an optimization method that solves the time-constrained scheduling problem by transforming behavioural specifications into new ones whose subsequent synthesis substantially improves circuit performance. Our proposal breaks up some of the specification operations, allowing their execution during several possibly unconsecutive cycles, and also the calculation of several data-dependent operation fragments in the same cycle. To do so, it takes into account the circuit latency and the execution time of every specification operation. The experimental results carried out show that circuits obtained from the optimized specification are on average 60% faster than those synthesized from the original specification, with only slight increments in the circuit area.

A PIPELINE SCHEDULING ALGORITHM FOR HIGH-LEVEL SYNTHESIS

In proc. of IFAC Workshop on Programmable Devices and Systems Conference PDS’03, Ostrava,, 2003

Scheduling is the most important task in high-level synthesis process, while pipelining is highly important for realising high-performance digital components. This paper presents a pipeline list-based scheduling algorithm, which performs forward and backward pipelining. The forward priority function is based on incorporating some information extracted from data flow graph (DFG) structure to guide the scheduler to find near-optimal/optimal schedules quickly. The algorithm has a flexible procedure cycle, which allows designers to make efficient area-performance trade-offs by using different strategies employed. Designers can choose between doing forward / backward pipelining with or without resource sharing combined with clock cycle selection, pipe stage delay determination. Experimental results with standard benchmarks show the effectiveness of the proposed algorithm.