Power-aware placement (original) (raw)

An Efficient Timing and Clock Tree Aware Placement Flow with Multibit Flip-Flops for Power Reduction

Communications in computer and information science, 2017

Power consumption has become a bottleneck for modern system-on-chip (SoC) designs. With the advancement towards the deep sub-micron technology, the SoC design consists of components that prompt to a higher power density. In VLSI designs, the performance of an integrated circuit (IC) is governed by the frequency of the clock at which it operates, thus clocking is the major source of power dissipation in a design. Designing clock network is a critical task for high-performance circuits as it directly impacts clock skew, jitter, chip power and area of SoC under process variations. Multi-bit flip-flops (MBFFs) have appeared as a low-power solution for the nanometer technology. The number of clock sinks reduces during clock tree synthesis (CTS) with the application of MBFFs. As a result, the clock network shows increment in core utilization, improvement in routing, reduction in power consumption and timing violations. The clock insertion delay (CID) is another key metric of clock network and decreasing CID results in shorter clock network, less impact on crosstalk, less impact of process variation, and reduction in hold penalties. This work introduces a novel placement strategy in integration with the electronic design automation (EDA) tool for MBFF generation having the prerequisite knowledge of clock tree architecture. The strategy irrespective of traditional placement flow consists of MBFFs that are generated by replacing single-bit FFs iteratively during placement. FF merging and MBFF generation algorithm have been proposed. The approach is made timingaware with useful skew optimization. Experiment results show improvement in chip power by 44%, core density by 11.3% and clock power by 10.4%. In addition to the above, another algorithm for minimizing the CID of the design has been proposed. This algorithm splits up the clock tree sinks with maximum CID to a separate pool, after the deep analysis of the clock tree structure. It also takes into account the floorplan of the chip, placement pin and the macro placement changes on the sinks. The results show that the average CID reduces by 9.2%. Certificate This is to certify that the thesis titled "An Efcient Timing and Clock Tree Aware Placement ow with Multibit Flip-Flop Generation for Power Reduction" submitted by Jasmine Kaur Gulati to Indraprastha Institute of Information Technology, Delhi for the award of the Master of Technology in Electronics and Communication & Engineering is an original research work carried out by her under my guidance and supervision. The results enclosed in the thesis have not been submitted in any other university or institute for the reward of any other degree.

Navigating Register Placement for Low Power Clock Network Design

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, 2005

In modern VLSI designs, the increasingly severe power problem requests to minimize clock routing wirelength so that both power consumption and power supply noise can be alleviated. In contrast to most of traditional works that handle this problem only in clock routing, we propose to navigate standard cell register placement to locations that enable further less clock routing wirelength and power. To minimize adverse impacts to conventional cell placement goals such as signal net wirelength and critical path delay, the register placement is carried out in the context of a quadratic placement. The proposed technique is particularly effective for the recently popular prescribed skew clock routing. Experiments on benchmark circuits show encouraging results.

A Novel Net Weighting Algorithm for Power and Timing-Driven Placement

Nowadays, many new low power ASICs applications have emerged. This new market trend made the designer's task of meeting the timing and routability requirements within the power budget more challenging. One of the major sources of power consumption in modern integrated circuits (ICs) is the Interconnect. In this paper, we present a novel Power and Timing-Driven global Placement (PTDP) algorithm. Its principle is to wrap a commercial timing-driven placer with a nets weighting mechanism to calculate the nets weights based on their timing and power consumption. The new calculated weight is used to drive the placement engine to place the cells connected by the critical power or timing nets close to each other and hence reduce the parasitic capacitances of the interconnects and, by consequence, improve the timing and power consumption of the design. This approach not only improves the design power consumption but facilitates also the routability with only a minor impact on the timing closure of a few designs. The experiments carried on 40 industrial designs of different nodes, sizes, and complexities and demonstrate that the proposed algorithm is able to achieve significant improvements on Quality of Results (QoR) compared with a commercial timing driven placement flow. We effectively reduce the interconnect power by an average of 11.5% that leads to a total power improvement of 5.4%, a timing improvement of 9.4%, 13.7%, and of 3.2% in Worst Negative Slack (WNS), Total Negative Slack (TNS), and total wirelength reduction, respectively.

Timing-Aware Power-Noise Reduction in Placement

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2000

We describe a placement-level decoupling capacitance (decap) insertion technique whose objective is to reduce power noise, taking into account circuit timing. Our approach consists of prediction and correction steps. Before placement, we estimate the power noise of each cell considering switching frequency of cells that, after placement, will most likely be in the neighborhood. If a frequently switching cell has neighbors that switch infrequently, it is unlikely that this cell will suffer from a power-noise problem. Based on the cell power-noise estimation, we add decap padding to each cell. Then, we invoke a standard cell placement tool and perform power grid analysis. We eliminate the power grid noise by gate sizing. Our technique can allocate decaps to improve power noise, power consumption, and timing. We propose two gate-sizing algorithms. The first one uses a sequence of linear programs (SLP) formulation, and the second one uses a budgeting-based heuristic algorithm. The SLP algorithm can produce better power-noise results than the heuristic, at the expense of runtime. Experimental results show that our techniques can effectively reduce power noise and still meet timing constraints.

Low power clock buffer planning methodology in F-D placement for large scale circuit design

2008 Asia and South Pacific Design Automation Conference, 2008

Traditionally, clock network layout is performed after cell placement. Such methodology is facing a serious problem in nanometer IC designs where people tend to use huge clock buffers for robustness against variations. That is, clock buffers are often placed far from ideal locations to avoid overlap with logic cells. As a result, both power dissipation and timing are degraded. In order to solve this problem, we propose a low power clock buffer planning methodology which is integrated with cell placement. A Bin-Divided Grouping algorithm is developed to construct virtual buffer tree, which can explicitly model the clock buffers in placement. The virtual buffer tree is dynamically updated during the placement to reflect the changes of latch locations. To reduce power dissipation, latch clumping is incorporated with the clock buffer planning. The experimental results show that our method can reduce clock power significantly by 21% on average.

Power aware setup timing optimization in physical design of ASICs

Setup timing optimization is a very important and challenging step of the physical design of Application Specific Integrated Circuits (ASICs). Many techniques are available to help the designer to close the design's setup timing. Although, all these techniques have the same objective, which is to resolve the existing setup timing violations, each one has a different power footprint. In this paper, we measured the impact of each optimization technique on power. We ran each optimization transform at different flow stages on a 100 industrial designs from different process technologies. We measured the ratio of Δpower/Δsetup_timing after legalization and global routing to include not only the power added directly by the setup timing optimization, but also the power induced indirectly by placement and global routing perturbation. Experimental results showed that by taking into account the impact on power consumption of each optimization technique, including placement legalization and the global routing, a power reduction of 7.3% on average could be achieved with no timing impact.

An Optimized Power Performance and Area in ASIC Physical Design

International Journal of Electronics, Electrical and Computational System, 2017

For Moore’s law to continue to be pragmatically valid, new process technologies must provide more than the projected increases in density, chip capacity, chip level performance or, performance vs. power-improvement which has been increasingly difficult to achieve. This paper deals with study and implementation of practices to get better PPA in ASIC physical design which is applicable to all digital circuits, both combinational and sequential. Various Place and Route techniques are used to achieve this using Cadence’s SOCE. The general Place and Route flow involves Floor planning, Power planning, Placement, CTS, Routing, Parasitic extraction, Timing and Power analysis. Apart from these stages, there are intermediate stages which allow for timing optimizations. A reference block is chosen and multiple experiments are performed with flow variations at each of Place and Route stage targeting PPA and frequency of 1.4GHz using 14nm technology. Generalized flow is tweaked to achieve the better PPA. All experimental data captured and concluded.

System Level Clock Tree Synthesis for Power Optimization

2007 Design, Automation & Test in Europe Conference & Exhibition, 2007

The clock tree is the interconnect net on Systems-on-Chip (SoCs) with the heaviest load and consumes up to 40% of the overall power budget. Substantial savings of the overall power dissipations are possible by optimizing the clock tree. Although these savings are already relevant at systemlevel, only little effort has been made to consider the clock tree at higher levels of abstraction. This paper shows how the clock-tree can be integrated into system-level power estimation and optimization. A clock tree routing algorithm is chosen, adapted to the system-level and then integrated into an algorithmic-level power optimization tool. Experimental results demonstrate the importance of the clock tree for system-level power optimization.

Navigating registers in placement for clock network minimization

2005

The progress of VLSI technology is facing two limiting factors: power and variation. Minimizing clock network size can lead to reduced power consumption, less power supply noise, less number of clock buffers and therefore less vulnerability to variations. Previous works on clock network minimization are mostly focused on clock routing and the improvements are often limited by the input register placement. In this work, we propose to navigate registers in cell placement for further clock network size reduction. To solve the conflict between clock network minimization and traditional placement goals, we suggest the following techniques in a quadratic placement framework: (1) Manhattan ring based register guidance; (2) center of gravity constraints for registers; (3) pseudo pin and net; (4) register cluster contraction. These techniques work for both zero skew and prescribed skew designs in both wirelength driven and timing driven placement. Experimental results show that our method can reduce clock net wirelength by 16%~33% with no more than 0.5% increase on signal net wirelength compared with conventional approaches.

Power-aware clock tree planning

2004

Modern processors and SoCs require the adoption of poweroriented design styles, due to the implications that power consumption may have on reliability, cost and manufacturability of integrated circuits featuring nanometric technologies. And the power problem is further exacerbated by the increasing demand of devices for mobile, battery-operated systems, for which reduced power dissipation is mandatory. A large fraction of the power consumed by a synchronous circuit is due to the clock distribution network. This is for two reasons: First, the clock nets are long and heavily loaded. Second, they are subject to a high switching activity. The problem of automatically synthesizing a power efficient clock tree has been addressed recently in a few research contributions. In this paper, we introduce a methodology in which low-power clock trees are obtained through aggressive exploitation of the clock-gating technology. Distinguishing features of the methodology are: (i) The capability of calculating powerful clock-gating conditions that go beyond the simple topological search of the RTL source code. (ii) The capability of determining the clock tree logical structure starting from an RTL description. (iii) The capability of including in the cost function that drives the generation of the clock tree structure both functional (i.e., clock activation conditions) and physical (i.e., floorplanning) information. (iv) The capability of generating a clock tree structure that can be synthesized and routed using standard, commercially-available back-end tools. We illustrate the methodology for power-aware RTL clock tree planning, we provide details on the fundamental algorithms that support it and information on how such a methodology can be integrated into an industrial design flow. The results achieved on several benchmarks, as well as on a real design case demonstrate the feasibility and the potential of the proposed approach.