Jason (Jingsheng) Cong | University of California, Los Angeles (original) (raw)

Papers by Jason (Jingsheng) Cong

This paper discusses the challenges and opportunities for design innovations in the future IC des... more This paper discusses the challenges and opportunities for design innovations in the future IC designs, especially in the areas of interconnect design and high-degree of on-chip integration. Section 2 presents a number of design challenges in these areas, with quantitative measurements derived based on the technology projection in NTRS'97. Section 3 discusses opportunities and possible directions for design technology innovation to meet various design challenges in the road ahead

ICCAD-2005. IEEE/ACM International Conference on Computer-Aided Design, 2005.

Heat dissipation is one of the most serious challenges in 3-D IC designs. One effective way of re... more Heat dissipation is one of the most serious challenges in 3-D IC designs. One effective way of reducing circuit temperature is to introduce thermal through-the-silicon (TTS) vias. In this paper, we extended the TTS-via planning in a multilevel routing framework as in [7], but use a much enhanced TTS-via planning algorithm. We formulate the TTSvia minimization problem with temperature constraints as a constrained nonlinear programming problem (NLP) based on the thermal resistive model and develop an efficient heuristic algorithm, named m-ADVP, which solves a sequence of simplified via planning subproblems in alternating direction in a multilevel framework. The vertical via distribution is formulated as a convex programming problem, and the horizontal via planning is based on two efficient techniques: path counting and heat propagation. Experimental results show that the m-ADVP algorithm is more than 200× faster than the direct solution to the NPL formulation for via planning with very similar solution quality (within 1% of TS-vias count). However, compared to a recent work of multilevel TS-via planning algorithm based on temperature profiling [7], our algorithm can reduce the total TS-via number by over 68% for the same required temperature with similar runtime.

Field Programmable Logic and Application, 2004

Traditional placement algorithms for FPGAs are normally carried out on a fixed clustering solutio... more Traditional placement algorithms for FPGAs are normally carried out on a fixed clustering solution of a circuit. The impact of clustering on wirelength and delay of the placement solutions is not well quantified. In this paper, we present an algorithm named SCPlace that performs simultaneous clustering and placement to minimize both the total wirelength and longest path delay. We also incorporate a recently proposed path counting-based net weighting scheme [16]. Our algorithm SCPlace consistently outperforms the state-of-the-art FPGA placement flow (T-VPack + VPR) with an average reduction of up to 36% in total wirelength and 31% in longest path delay.

ICCAD-2003. International Conference on Computer Aided Design (IEEE Cat. No.03CH37486), 2003

This work studies the optimality and stability of timing-driven placement algorithms. The contrib... more This work studies the optimality and stability of timing-driven placement algorithms. The contributions of this work include two parts: 1) We develop an algorithm for generating synthetic examples with known optimal delay for timing driven placement (T-PEKO). The examples generated by our algorithm can closely match the characteristics of real circuits. 2) Using these synthetic examples with known optimal solutions, we studied the optimality of several timing-driven placement algorithms for FPGAs by comparing their solutions with the optimal solutions, and their stability by varying the number of longest paths in the examples. Our study shows that with a single longest path, the delay produced by these algorithms is from 10% to 18% longer than the optima on the average, and from 34% to 53% longer in the worst case. Furthermore, their solution quality deteriorates as the number of longest paths increases. For examples with more than 5 longest paths, their delay is from 23% to 35% longer than the optima on the average, and is from 41% to 48% longer in the worst case.

Proceedings of the 2003 conference on Asia South Pacific design automation - ASPDAC, 2003

In this paper we study the large-scale mixed-size placement problem where there is a significant ... more In this paper we study the large-scale mixed-size placement problem where there is a significant size variation between big and small placeable objects (the ratio can be as large as 10,000). We develop a multi-level optimization algorithm, MPG-MS, for this problem which can efficiently handle both large-scale designs and large size variations. Compared with the recently published work [1] on large-scale mixed macro and standard cell placement benchmarks for wirelength minimization, our method can achieve 13% wirelength reduction on average with comparable runtime.

Proceedings of the 19th international symposium on Physical design, 2010

Existing 3D placement techniques are mainly used for standardcell circuits, while mixed-size plac... more Existing 3D placement techniques are mainly used for standardcell circuits, while mixed-size placement is needed to support highlevel functional units and intellectual property (IP) blocks. In this paper we present an analytical 3D placement method that is capable of placing mixed-size circuits. A multiple-stepsize scheme for the analytical solver is proposed to handle standard cells and macros differently for stability and efficiency. To relieve the difficulty of legalization, 3D floorplan-based initial solutions are used to guide the analytical solver. As far as we know, this is the first work that reports 3D placement results for mixed-size circuits. Our experiments show that the multiple-stepsize scheme is better than single-stepsize schemes in both quality and runtime. The experimental results on the ICCAD'04 mixed-size benchmarks show that the 4-tier 3D mixed-size placement can reduce the wirelength by 27% on average compared to 2D placement. The results also show that the 3D mixed-size placement achieves 5.3% shorter wirelength on average than the pseudo 3D placement with similar amount of through-silicon vias (TS vias).

Proceedings of ASP-DAC '97: Asia and South Pacific Design Automation Conference, 1997

This paper presents an overview of recent advances on modeling and layout optimization of devices... more This paper presents an overview of recent advances on modeling and layout optimization of devices and interconnects for high-performance VLSI circuit design under the deep submicron technology. First, we review a number of interconnect and driver/gate delay models, which are most useful to guide the layout optimization. Then, we summarize the available performance optimization techniques for VLSI device and interconnect layout, including driver and transistor sizing, transistor ordering, interconnect topology optimization, optimal wire sizing, optimal buffer placement, and simultaneous topology construction, buffer insertion, buffer and wire sizing. The efficiency and impact of these techniques will be discussed in the tutorial.

Proceedings of ASP-DAC'95/CHDL'95/VLSI'95 with EDA Technofair

Most existing placement algorithms consider only connectivity information during the placement pr... more Most existing placement algorithms consider only connectivity information during the placement process, and ignore other information available from the higher levels of design process. In this paper, we exploit the use of signal flow and logic dependency in standard cell placement by using the maximum fanout-free cone (MFFC) decomposition technique. We developed a containment tree based algorithm for splitting large MFFCs into smaller ones to get clusters with restricted sizes. We also developed a placement algorithm, named MFFC-TW, which first clusters the circuit based on MFFC decomposition and then feeds the clustered circuit to the Timberwolf6.0 placement package. Very promising experimental results were obtained.

Proceedings of the 30th international on Design automation conference - DAC '93, 1993

In this paper, we study the interconnect design problem under a distributed RC delay model. We st... more In this paper, we study the interconnect design problem under a distributed RC delay model. We study the impact of technology factors on the interconnect designs and present general formulations of the interconnect topology design and wiresizing problems. We show that interconnect topology optimization can be achieved by computing optimal generalized rectilinear Steiner arborescences and we present an efficient algorithm which yields optimal or near-optimal solutions. We reveal several important properties of optimal wire width assignments and present a polynomial time optimal wiresizing algorithm. Extensive experimental results indicate that our approach significantly outperforms other routing methods for high-performance 1(2 and MCM designs. Our interconnect designs reduce the interconnection delays by up to 66% as compared to those by the best known Steiner tree algorithm.

Proceedings of the 2002 international symposium on Physical design, 2002

In this paper, we shall present the progress and results of the ongoing project at UCLA on synthe... more In this paper, we shall present the progress and results of the ongoing project at UCLA on synthesis and optimization under physical hierarchy. First, we shall motivate our approach by pointing out the limitations of the existing approach to interconnect planning based on early RTL floorplanning following logic hierarchy. Then, we shall discuss the technical challenges for synthesis under the physical hierarchy, including handling high computational complexity from the flattened logic hierarchy, needs of retiming and pipelining over global interconnects, and extension of existing synthesis operations. Finally, we shall outline our approaches to overcome these technical challenges.

Proceedings of the 41st annual conference on Design automation - DAC '04, 2004

2007 Asia and South Pacific Design Automation Conference, 2007

3D IC technologies can help to improve circuit performance and lower power consumption by reducin... more 3D IC technologies can help to improve circuit performance and lower power consumption by reducing wirelength. Also, 3D IC technology can be used to realize heterogeneous system-on-chip design, by integrating different modules together with less interference with each other. In this paper, we propose a novel thermal-aware 3D cell placement approach, named T3Place, based on transforming a 2D placement with good wirelength to a 3D placement, with the objectives of half-perimeter wirelength, through-the-silicon (TS) via number and temperature. T3Place is composed of two steps, transformation from a 2D placement to a 3D placement and the refinement of the resulting 3D placement. We proposed and compared several different transformation techniques, including local stacking transformation (LST), folding-2, folding-4 and window-based stacking/folding transformation, and concluded that (i) LST can generate 3D placements with the least wirelength, (ii) the folding-based transformations result in 3D placements with the fewest TS vias, and (iii) the window-based stacking/folding transformations provide good TS via number and wirelength tradeoffs. For example, with four device layers, LST can reduce the wirelength by over 2 compared to the initial 2D placement, while window-based stacking/folding can provide over 10 variation in terms of the TS via number, thus adaptive to different manufacturing ability for TS via density. Moreover, we proposed a novel relaxed conflict-net (RCN) graph-based layer assignment method to further refine the 3D placements. Compared to LST results, thermal-aware RCN graph-based layer assignment algorithm (r = 10%) can further reduce the maximum on-chip temperature by 37%, with only 6% TS via number increase and 8% wirelength increase.

Proceedings of the 2005 conference on Asia South Pacific design automation - ASP-DAC '05, 2005

3-D IC has a great potential for improving circuit performance and degree of integration. It is a... more 3-D IC has a great potential for improving circuit performance and degree of integration. It is also an attractive platform for system-on-chip or system-in-package solutions. A critical issue in 3-D circuit design is heat dissipation. In this paper we propose an efficient 3-D multilevel routing approach that includes a novel through-the-silicon via (TS-via) planning algorithm. The proposed approach features an adaptive lumped resistive thermal model and a two-step multilevel TSvia planning scheme. Experimental results show that with multilevel TS-via planning, the thermal-driven approach can reduce the maximum temperature to the required temperature with reasonable wirelength increase. Compared to a post processing approach for dummy TS-via insertion, to achieve the same required temperature, our approach uses 80% fewer TS-vias. To our knowledge, this proposed approach is the first thermal-driven 3-D routing algorithm.

Three Dimensional System Integration, 2010

Modern three-dimensional (3D) designs, in which the active devices are placed in multiple layers ... more Modern three-dimensional (3D) designs, in which the active devices are placed in multiple layers using 3D integration technologies, are helping to maintain the validity of Moore's law in today's nano era. However, progress in commercial 3D ICs has been slow due to multiple reasons. One of them is the lack of appropriate physical design (layout) tools that take the new constraints arising from the third dimension into account. In this paper, an overview of physical design's challenges in the new 3-dimensional context are presented. Specifically, we investigate the physical design steps of floorplanning, placement, routability prediction and routing. New 3D-tailored design methodologies are presented that are capable of addressing 3D-specific design challenges.

Series on Integrated Circuits and Systems, 2007

, except for brief excerpts in connection with reviews or scholarly analysis. Use in connection w... more , except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now know or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.

Proceedings of the 2005 international symposium on Physical design, 2005

Automatic circuit placement has received renewed interest recently given the rapid increase of ci... more Automatic circuit placement has received renewed interest recently given the rapid increase of circuit complexity, increase of interconnect delay, and potential sub-optimality of existing placement algorithms [13]. In this paper we present a generalized force-directed algorithm embedded in mPL2's [12] multilevel framework. Our new algorithm, named mPL5, produces the shortest wirelength among all published placers with very competitive runtime on the IBM circuits used in [29]. The new contributions and enhancements are: (1) We develop a new analytical placement algorithm using a density constrained minimization formulation which can be viewed as a generalization of the force-directed method in [16]; (2) We analyze and identify the advantages of our new algorithm over the force-directed method; (3) We successfully incorporate the generalized force-directed algorithm into a multilevel framework which significantly improves wirelength and speed. Compared to Capo9.0, our algorithm mPL5 produces 8% shorter wirelength and is 2X faster. Compared to Dragon3.01, mPL5 has 3% shorter wirelength and is 12X faster. Compared to Fengshui5.0, it has 5% shorter wirelength and is 2X faster. Compared to the ultrafast placement algorithm: FastPlace, mPL5 produces 8% shorter wirelength but is 6X slower. A fast mode of mPL5 (mPL5-fast) can produce 1% shorter wirelength than Fast-Place1.0 and is only 2X slower. Moreover, mPL5-fast has demonstrated better scalability than FastPlace1.0.

Proceedings of the 40th conference on Design automation - DAC '03, 2003

Multiple clock cycles are needed to cross the global interconnects for multi-gigahertz designs in... more Multiple clock cycles are needed to cross the global interconnects for multi-gigahertz designs in nanometer technologies. For synchronous designs, this requires retiming and pipelining on global interconnects. In this paper, we present a practical solution for simultaneous retiming and multilevel global placement for performance optimization, based on the theory and algorithms of sequential timing analysis (Seq-TA). We extend the Seq-TA to handle gates/clusters with multiple outputs and integrate it into a multilevel optimization framework for simultaneous retiming and placement. We also develop two speed-up techniques which enable the Seq-TA to be efficiently integrated into a simulated annealing-based multilevel coarse placement for large-scale designs. Experimental results show that (i) retiming can improve the performance (delay) by 14% on average when it is applied after placement; (ii) our approach for simultaneous retiming and placement can outperform the two-step approach (placement followed by retiming) by 10% on average in terms of delay minimization.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 1997

routing problems for delay minimization consider the connection of a single source node to a numb... more routing problems for delay minimization consider the connection of a single source node to a number of sink nodes, with the objective of minimizing the delay from the source to all sinks, or a set of critical sinks. In this paper, we study the problem of routing nets with multiple sources, such as those found in signal busses. This new model assumes that each node in a net may be a source, a sink, or both. The objective is to optimize the routing topology to minimize the total weighted delay between all node pairs (or a subset of critical node pairs). We present a heuristic algorithm for the multiple-source performancedriven routing tree problem based on efficient construction of minimumdiameter minimum-cost Steiner trees. Experimental results on random nets with submicrometer CMOS IC and MCM technologies show an average of 12.6% and 21% reduction in the maximum interconnect delay, when compared with conventional minimum Steiner tree based topologies. Experimental results on multisource nets extracted from an Intel processor show as much as a 16.1% reduction in the maximum interconnect delay, when compared with conventional minimum Steiner tree based topologies.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 1993

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2004

Placement is an important step in the overall IC design process in DSM technologies, as it define... more Placement is an important step in the overall IC design process in DSM technologies, as it defines the on-chip interconnects, which have become the bottleneck in determining circuit performance. The rapidly increasing design complexity, combined with the demand for the capability of handling nearly flattened designs for physical hierarchy generation, poses significant challenges to existing placement algorithms. There are very few studies on understanding the optimality and scalability of placement algorithms, due to the limited sizes of existing benchmarks and limited knowledge of optimal solutions. The contribution of this paper includes two parts: 1) We implemented an algorithm for generating synthetic benchmarks that have known optimal wirelengths and can match any given net distribution vector. 2) Using benchmarks of 10K to 2M placeable modules with known optimal solutions, we studied the optimality and scalability of three state-of-the-art placers, Dragon [4], Capo [1], mPL [24] from academia, and one leading edge industrial placer, QPlace [5] from Cadence. For the first time our study reveals the gap between the results produced by these tools versus true optimal solutions. The wirelengths produced by these tools are 1.66 to 2.53 times the optimal in the worst cases, and are 1.46 to 2.38 times the optimal on the average. As for scalability, the average solution quality of each tool deteriorates by an additional 4% to 25% when the problem size increases by a factor of 10. These results indicate significant room for improvement in existing placement algorithms.