Physical Design Variation in Relative Timed Asynchronous Circuits (original) (raw)
Related papers
Relative Placement in Timed Asynchronous Design
2016
Timed asynchronous circuits can be implemented using commercial computer-aided design (CAD) tools. Relative timing methodology is applied for interpreting the complex timing of asynchronous circuits into CAD tools that can be understood with minimum and maximum timing constraints. Typical synchronous placement algorithm neglect timing information which is critical for asynchronous methodology. Relative timing constraints are employed during placement of the modules with the help of relative placement methodology. This paper explicitly adopts relative placement, supported by commercial CAD tools to optimize a design for its area, wire-length, distance, power, and performance.
A Transistor-Level Placement Tool for Asynchronous Circuits
Although asynchronous circuits are accepted as low-power, low-EMI and high-performance circuits, the roadblock to wide acceptance of asynchronous design methodology is poor CAD support, especially physical design tool. There are few academic design tools for asynchronous circuit design and synthesis, but there is neither a published tool nor a published document on physical design of these circuits.
Power aware setup timing optimization in physical design of ASICs
Setup timing optimization is a very important and challenging step of the physical design of Application Specific Integrated Circuits (ASICs). Many techniques are available to help the designer to close the design's setup timing. Although, all these techniques have the same objective, which is to resolve the existing setup timing violations, each one has a different power footprint. In this paper, we measured the impact of each optimization technique on power. We ran each optimization transform at different flow stages on a 100 industrial designs from different process technologies. We measured the ratio of Δpower/Δsetup_timing after legalization and global routing to include not only the power added directly by the setup timing optimization, but also the power induced indirectly by placement and global routing perturbation. Experimental results showed that by taking into account the impact on power consumption of each optimization technique, including placement legalization and the global routing, a power reduction of 7.3% on average could be achieved with no timing impact.
CAD directions for high performance asynchronous circuits
Proceedings of the 36th ACM/IEEE conference on Design automation conference - DAC '99, 1999
This paper describes a novel methodology for high performance asynchronous design based on timed circuits and on CAD support for their synthesis using Relative Timing. This methodology was developed for a prototype iA32 instruction length decoding and steering unit called RAPPID ("Revolving Asynchronous Pentium R Processor Instruction Decoder") that was fabricated and tested successfully. Silicon results show significant advantages -in particular, performance of 2.5-4.5 instructions per nS -with manageable risks using this design technology. RAPPID achieves three times faster performance and half the latency dissipating only half the power and requiring a minor area penalty as a comparable 400MHz clocked circuit.
A network-flow approach to timing-driven incremental placement for ASICs
Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design - ICCAD '06, 2006
We present a novel incremental placement methodology called FlowPlace for significantly reducing critical path delays of placed standard-cell circuits. FlowPlace includes: a) a timing-driven (TD) analytical global placer TAN that uses accurate delay functions and minimizes a combination of linear and quadratic objective functions; b) a network flow based detailed placer TIF that has new and effective techniques for performing TD incremental placement and satisfying rowlength (white space) constraints. We have obtained results on three sets of benchmarks: i) TD versions of the ibm benchmark suite that we have constructed; ii) benchmarks used in TD-Dragon; iii) the Faraday benchmarks. Results show that starting with Dragon-placed circuits, we are able to obtain up to 34% and an average of 18% improvement in critical path delays, at an average of 17.5% of the run-time of the Dragon placer. Starting with a state-of-the-art TD placer TD-Dragon, for the TD-Dragon benchmarks we obtain up to about 10% and an average of 4.3% delay improvement with 12% of TD-Dragon's run times; this is significant as we are extracting performance improvements from a performanceoptimized layout. Wire length deterioration on the average over all benchmark suites is less than 8%. * This work was supported by NSF grant CCR-0204097. We also gratefully acknowledge the permission of Artisan Components, Inc. for the use of the cell-timing libraries of the TD-Dragon benchmarks, and also the help of the TD-Dragon authors in this regard.
A Bundled-Data Asynchronous Circuit Synthesis Flow Using a Commercial EDA Framework
DSD 2015
Contemporary silicon technology enables integrating billions of transistors and allows the creation of complex systems-on-chip. At the same time, strict power dissipation budgets and growing interest in high performance battery-powered devices drive the need for energy-efficient high performance circuits. Bundled-data asynchronous circuits are good candidates for high performance low power systems, as they operate with average-case delays and present reduced switching activity when compared to other asynchronous templates. The correct operation of bundled-data circuits relies on constraints that describe the timing relationships between data and control signals. However, commercial EDA frameworks do not offer an encompassing support to ensure the closure of such constraints, making implementation challenging. This paper proposes a synthesis flow to enable the description and enforcement of relative timing constraints at both logic and physical synthesis levels, using the Synopsys framework and a set of in-house scripts. Two case studies illustrate the flow: a pipelined multiplier and a network on chip input buffer FIFO, the latter comprising a non-linear pipeline and complex control circuits. Both case studies target the STMicroelectronics 28nm FDSOI technology, and validation occurs with post-layout simulations. Overall, the flow provides an automatic approach to meet relative timing constraints in a template-agnostic manner for bundled-data circuits design.
A Novel Net Weighting Algorithm for Power and Timing-Driven Placement
Nowadays, many new low power ASICs applications have emerged. This new market trend made the designer's task of meeting the timing and routability requirements within the power budget more challenging. One of the major sources of power consumption in modern integrated circuits (ICs) is the Interconnect. In this paper, we present a novel Power and Timing-Driven global Placement (PTDP) algorithm. Its principle is to wrap a commercial timing-driven placer with a nets weighting mechanism to calculate the nets weights based on their timing and power consumption. The new calculated weight is used to drive the placement engine to place the cells connected by the critical power or timing nets close to each other and hence reduce the parasitic capacitances of the interconnects and, by consequence, improve the timing and power consumption of the design. This approach not only improves the design power consumption but facilitates also the routability with only a minor impact on the timing closure of a few designs. The experiments carried on 40 industrial designs of different nodes, sizes, and complexities and demonstrate that the proposed algorithm is able to achieve significant improvements on Quality of Results (QoR) compared with a commercial timing driven placement flow. We effectively reduce the interconnect power by an average of 11.5% that leads to a total power improvement of 5.4%, a timing improvement of 9.4%, 13.7%, and of 3.2% in Worst Negative Slack (WNS), Total Negative Slack (TNS), and total wirelength reduction, respectively.
A new LP based incremental timing driven placement for high performance designs
2006 43rd ACM/IEEE Design Automation Conference, 2006
In this paper, we propose a new linear programming based timing driven placement framework for high performance designs. Our LP framework is mainly net-based, but it takes advantage of the path-based delay sensitivity with limited-stage slew propagation, thus it enjoys certain hybrid feature of net and path-based timing driven placement. Our LP formulation considers not only cells on the critical paths, but also cells that are logically adjacent to the critical paths (i.e., the criticality ad jacency network) in a unified manner. We further present a timing and regularity aware legalization method, which is important to preserve timing and regular structures for high performance designs. Our algorithm has been tested on a set of 65nm industry circuits from a multi-GHz microprocessor, and shown to achieve much better timing (on average 20ps worst slack reduction, which is significant for multi-GHz designs) even on carefully hand-tuned circuits.
Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis
13th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC'07), 2007
Future deep sub-micron technologies will be characterized by large parametric variations, which could make asynchronous design an attractive solution for use on large scale. However, the investment in asynchronous CAD tools does not approach that in synchronous ones. Even when asynchronous tools leverage existing synchronous toolflows, they introduce large area and speed overheads. This paper proposes several heuristic and optimal algorithms, based on timing interval analysis, for improving existing asynchronous CAD solutions by optimizing area. The optimized circuits are 2.4 times smaller for an optimal algorithm and 1.8 times smaller for a heuristic one than the existing solutions. The optimized circuits are also shown to be resilient to large parametric variations, yielding better average-case latencies than their synchronous counterparts.
An Optimized Power Performance and Area in ASIC Physical Design
International Journal of Electronics, Electrical and Computational System, 2017
For Moore’s law to continue to be pragmatically valid, new process technologies must provide more than the projected increases in density, chip capacity, chip level performance or, performance vs. power-improvement which has been increasingly difficult to achieve. This paper deals with study and implementation of practices to get better PPA in ASIC physical design which is applicable to all digital circuits, both combinational and sequential. Various Place and Route techniques are used to achieve this using Cadence’s SOCE. The general Place and Route flow involves Floor planning, Power planning, Placement, CTS, Routing, Parasitic extraction, Timing and Power analysis. Apart from these stages, there are intermediate stages which allow for timing optimizations. A reference block is chosen and multiple experiments are performed with flow variations at each of Place and Route stage targeting PPA and frequency of 1.4GHz using 14nm technology. Generalized flow is tweaked to achieve the better PPA. All experimental data captured and concluded.