Sheldon X.-d. Tan - Academia.edu (original) (raw)

Papers by Sheldon X.-d. Tan

SCS 2003. International Symposium on Signals, Circuits and Systems. Proceedings (Cat. No.03EX720)

This paper presents an efficient method to analyze power distribution networks in the time-domain... more This paper presents an efficient method to analyze power distribution networks in the time-domain. Instead of directly analyzing the integration approximated power/ground networks at each time step as previous methods did, the new method first builds the equivalent models for many series RLC-current chains based on their Norton's form companion models in the original networks, and then the Precondition Conjugate Gradient (PCG) based iterative method is used to solve the reduced networks. The solutions of the original networks then are back solved from that of the reduced networks. Our contribution is the introduction of an efficiency algorithm for reducing RLC power/ground network complexities by exploitation of the regularities in the power/ground networks. Experimental results show that the complexities of reduced networks are typically significantly smaller than that of the original circuits, which makes the new algorithm extremely fast. For instance, power/ground networks with more than one million branches can be solved in a few minutes on modern Sun workstations.

Lecture Notes in Computer Science, 2004

IPSJ Transactions on System LSI Design Methodology, 2020

In this article, we will present recent advances in VLSI reliability effects with a focus on elec... more In this article, we will present recent advances in VLSI reliability effects with a focus on electromigration (EM) failure/aging effect on interconnects, which is one of the most important reliability concerns for VLSI systems especially at the nanometer regime. One of the most important advances for EM analysis in recent years is the recognition that EM failure analysis can't depend on single wire segment anymore, as done in the traditional Black and Blech's based methods. New generation of EM modeling and design must consider all the wire segments in an interconnect as the hydrostatic stress in those wire segments affect each other. Such recognition bring both challenges and opportunities. We will start with physics-level stress-oriented characterization of EM failure effects and recently proposed three-phase EM models. Then we present a new EM immortality check at the circuit level considering multi-segment interconnects and void saturation volumes. After this, we will present how to accelerate EM aging effects for fast EM validation at the circuit level under normal working conditions using advanced structure-based techniques. Finally, we will present new EM sign-off analysis tool, called EMspice, at the full-chip power grid level considering the interplays between resistance changes from post-voiding processes and current density changes from power grids over the aging process. A number of other relevant works will be reviewed and compared as well.

IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2019

This article proposes a new P/G network sizing technique based on a recently proposed fast EM imm... more This article proposes a new P/G network sizing technique based on a recently proposed fast EM immortality check method for general multi-segment interconnect wires and a new physics-based EM assessment technique for more accurate time to failure analysis. The article first shows that the new P/G optimization problem, subject to the voltage IR drop and new EM constraints, can still be formulated as an efficient sequence of linear programming (SLP) problem, where the optimization is carried out in two linear programming phases in each iteration. The new optimization will ensure that none of the wires fail if all the constraints are satisfied. However, requiring all the wires to be EM immortal can be over-constrained. To mitigate this problem, the first improvement is by means of adding reservoir branches to the mortal wires whose lifetime cannot be made immortal by wire sizing. This is a very effective approach as long as there is sufficient reservoir area. The second improvement is to consider the aging effects of interconnect wires in P/G networks. The idea is to allow some short-lifetime wires to fail and optimize the rest of the wires while considering the additional resistance caused by the failed wire segments. In this way, the resulting P/G networks can be optimized such that the target lifetime of the whole P/G networks can be ensured and will become more robust and agingaware over the expected lifetime of the chip. Numerical results on a number of IBM and self-generated power supply networks demonstrate that the new method can effectively reduce the area of the networks while ensuring immortality or enforcing target lifetime for all the wires, which is not the case for the existing current density constrained optimization methods.

ASP-DAC 2004: Asia and South Pacific Design Automation Conference 2004 (IEEE Cat. No.04EX753)

In this paper, we present an efficient method to budget on-chip decoupling capacitors (decaps) to... more In this paper, we present an efficient method to budget on-chip decoupling capacitors (decaps) to optimize power delivery networks in an area efficient way. Our algorithm is based on an efficient gradient-based non-linear programming method for searching the solution. Our contributions are an efficient gradient computation method (time-domain merged adjoint network) and a novel equivalent circuit modeling technique to speed up the optimization process. Experimental results demonstrate that the algorithm is capable of efficiently optimizing very large scale P/G networks.

ICCAD-2005. IEEE/ACM International Conference on Computer-Aided Design, 2005., 2005

As power density increases exponentially, runtime regulation of operating temperature by dynamic ... more As power density increases exponentially, runtime regulation of operating temperature by dynamic thermal managements becomes necessary. This paper proposes a novel approach to the thermal analysis at chip architecture level for efficient dynamic thermal management. Our new approach is based on the observation that the power consumption of architecture level modules in microprocessors running typical workloads presents strong nature of periodicity. Such a feature can be exploited by fast spectrum analysis in frequency domain for computing steady state response. To obtain the transient temperature changes due to initial condition and constant power inputs, numerically stable moment matching approach is carried out. The total transient responses is the addition of the two simulation results. The resulting fast thermal analysis algorithm leads to at least 10x-100x speedup over traditional integration-based transient analysis with small accuracy loss.

Proceedings of the ASP-DAC 2005. Asia and South Pacific Design Automation Conference, 2005.

IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings.

This paper presents a novel approach to reducing the complexity of the transient linear circuit a... more This paper presents a novel approach to reducing the complexity of the transient linear circuit analysis for a hybrid structured clock network. Topology reduction is first used to reduce the complexity of the circuits and a preconditioned Krylov-subspace iterative method is then used to perform the nodal analysis on the reduced circuits. By proper choice of the simulation time step based on Elmore delay model, the delay of the clock signal between the clock source and the sink node and the skews between the sink nodes can be obtained efficiently and accurately. Our experimental results show that the proposed algorithm is two orders of magnitude faster than HSPICE without loss of accuracy and stability and the maximum error is within 0.4% of the exact delay time.

Proceedings of the 52nd Annual Design Automation Conference, 2015

ABSTRACT Electromigration (EM) in VLSI interconnects has become one of the major reliability issu... more ABSTRACT Electromigration (EM) in VLSI interconnects has become one of the major reliability issues for current and future VLSI technologies. However, existing EM modeling and analysis techniques are mainly developed for a single wire. For practical VLSI chips, the interconnects such as clock and power grid networks typically consist of multi-branch metal segments representing a continuously connected, highly conductive metal (Cu) lines within one layer of metallization, terminating at diffusion barriers. The EM effects in those branches are not independent and they have to be considered simultaneously. In this paper, we demonstrate, for the first time, a first principle based analytic solution of this problem. We investigate the analytic expressions describing the hydrostatic stress evolution in several typical interconnect trees: the straight-line 3-terminal wires, the T-shaped 4-terminal wires and the cross-shaped 5-terminal wires. The new approach solves the stress evolution in a multi-branch tree by de-coupling the individual segments through the proper boundary conditions accounting the interactions between different branches. By using Laplace transformation technique, analytical solutions are obtained for each type of the interconnect trees. The analytical solutions in terms of a set of auxiliary basis functions using the complementary error function agree well with the numerical analysis results. Our analysis further demonstrates that using the first two dominant basis functions can lead to 0.5% error, which is sufficient for practical EM analysis.

2009 Asia and South Pacific Design Automation Conference, 2009

2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2013

In this paper, we propose an efficient parallel dynamic linear solver, called GPU-GMRES, for tran... more In this paper, we propose an efficient parallel dynamic linear solver, called GPU-GMRES, for transient analysis of large power grid networks. The new method is based on the preconditioned generalized minimum residual (GMRES) iterative method implemented on heterogeneous CPU-GPU platforms. The new solver is very robust and can be applied to power grids with different structures and other applications like thermal analysis. The proposed GPU-GMRES solver adopts the very general and robust incomplete LU (ILU) based preconditioner. We show that by properly selecting the right amount of fill-ins in the incomplete LU factors, a good trade-off between GPU efficiency and GMRES convergence rate can be achieved for the best overall performance. Such a tunable feature makes this algorithm very adaptive to different problems. Furthermore, we properly partition the major computing tasks in GMRES solver to minimize the data traffic between CPU and GPU, which further boosts performance of the proposed method. Experimental results on the set of published IBM benchmark circuits and mesh-structured power grid networks show that the GPU-GMRES solver can deliver order of magnitudes speedup over the direct LU solver UMFPACK. GPU-GMRES can also deliver 3-10× speedup over the CPU implementation of the same GMRES method on transient analysis.

ICCAD-2005. IEEE/ACM International Conference on Computer-Aided Design, 2005.

This paper proposes a novel method to efficiently reduce the terminal number of general linear in... more This paper proposes a novel method to efficiently reduce the terminal number of general linear interconnect circuits with a large number of input and/or output terminals considering delay variations. Our new algorithm is motivated by the fact that VLSI interconnect circuits have many similar terminals in terms of their timing and delay metrics due to their closeness in structure or due to mathematic approximation using meshing in finite difference or finite element scheme during the extraction process. By allowing some delay tolerance or variations, we can reduce many similar terminals and keep a small number of representative terminals. After terminal reduction, traditional model order reduction methods can achieve more compact models and improve simulation efficiency. The new method, TERMMERG, is based on the moments of the circuits as the metrics for the timing or delay. It then employs singular value decomposition (SVD) method to determine the optimum number of clusters based on the low-rank approximation. After this, the K-means clustering algorithm is used to cluster the moments of the terminals into different clusters. Experimental results on a number of real industry interconnect circuits demonstrate the effectiveness of the proposed method.

2009 IEEE 8th International Conference on ASIC, 2009

i-The design of high-performance analog/RF ICs is challenging and requires numerical efficient de... more i-The design of high-performance analog/RF ICs is challenging and requires numerical efficient design tools. This paper reviews a number of recent research advances in modeling and analysis of high-performance analog/RF ICs. Structured analysis of EM-coupling, non-Monte-Carlo stochastic mismatch analysis, and nonlinear macromodelingfor the system-level synthesis are discussed.

2008 IEEE/ACM International Conference on Computer-Aided Design, 2008

Fast analysis of power grid networks has been a challenging problem for many years. The huge size... more Fast analysis of power grid networks has been a challenging problem for many years. The huge size renders circuit simulation inefficient and the large number of inputs further limits the application of existing Krylov-subspace macromodeling algorithms. However, strong locality has been observed that two nodes geometrically far have very small electrical impact on each other because of the exponential attenuation. However, no systematic approaches have been proposed to exploit such locality. In this paper, we propose a novel modeling and simulation scheme, which can automatically identify the dominant inputs for a given observed node in a power grid network. This enables us to build extremely compact models by projecting the system onto the locally dominant Krylov subspace corresponding to those dominant inputs only. The resulting simulation can be very fast with the compact models if we only need to view the responses of a few nodes under many different inputs. Experimental results show that the proposed method can have at least 100X speedup over SPICE-like simulations on a number of large power grid networks up to 1M nodes.

Sixth International Symposium on Quality of Electronic Design (ISQED'05)

Adding on-chip decoupling capacitors (decaps) is an effective way to reduce voltage noise in powe... more Adding on-chip decoupling capacitors (decaps) is an effective way to reduce voltage noise in power/ground networks and ensure robust power delivery. In this paper, we present a fast decap allocation algorithm, which is able to confine the voltage fluctuations below user specified threshold by adding decaps in an area efficient way. The new algorithm adopts the recently proposed time-domain adjoint network method for sensitivity calculation. To avoid the time consuming line search at each iteration in conjugate gradient method, we proposed a simple, yet efficient search step computation method to accelerate the optimization process. The experimental results show that the proposed algorithm is at least 10X faster than the fastest conjugate gradient method reported so far with similar optimization results.

2008 Design, Automation and Test in Europe, 2008

2007 IEEE/ACM International Conference on Computer-Aided Design, 2007

In this paper, we propose a novel stochastic method for analyzing the voltage drop variations of ... more In this paper, we propose a novel stochastic method for analyzing the voltage drop variations of on-chip power grid networks with log-normal leakage current variations. The new method, called StoEKS, applies Hermite polynomial chaos (PC) to represent the random variables in both power grid networks and input leakage currents. But different from the existing Hermit PC based stochastic simulation method, extended Krylov subspace method (EKS) is employed to compute variational responses using the augmented matrices consisting of the coefficients of Hermite polynomials. Our contribution lies in the combination of the statistical spectrum method with the extended Krylov subspace method to fast solve the variational circuit equations for the first time. Experimental results show that the proposed method is about two-order magnitude faster than the existing Hermite PC based simulation method and more order of magnitudes faster than Monte Carlo methods with marginal errors. StoEKS also can analyze much larger circuits than the exiting Hermit PC based methods.

2007 Asia and South Pacific Design Automation Conference, 2007

Proceedings of the 2006 international symposium on Physical design - ISPD '06, 2006

In this paper, we propose more accurate power/ground network circuit model, which consider both v... more In this paper, we propose more accurate power/ground network circuit model, which consider both via and ground bounce effects to improve the performance estimation accuracy of on-chip power distribution networks. On top of this, a new precondition iterative method, which exploits geometry characters of power/ground networks, is developed to reduce memory usage and speed up the simulation. Experimental results show that the proposed method is about 5X faster than the incomplete LU decomposition (ILU) based preconditioned conjugate gradient iterative method and about half memory usage for simulating multi-layers large scale power/ground networks.

7th International Symposium on Quality Electronic Design (ISQED'06)

In this paper, we propose an efficient algorithm to reduce the voltage noises for on-chip power/g... more In this paper, we propose an efficient algorithm to reduce the voltage noises for on-chip power/ground (P/G) networks of VLSI. The new method is based on the sequence of linear programming (SLP) method as the optimization engine and a localized scheme via partitioning for dealing with large circuits. We show that by directly optimizing the decap area as the objective function and using the time-domain adjoint method, SLP can deliver much better quality than existing methods based on the merged time-domain adjoint method. The partitioning strategy further improves the scalability of the proposed algorithm and makes it efficient for large circuits. The resulting algorithm is general enough for any P/G network. Experimental results demonstrate the advantage of the proposed method over existing state-of-the-art methods in terms of solution quality at a mild computation cost increase.