Carl Sechen - Academia.edu (original) (raw)

Papers by Carl Sechen

Research paper thumbnail of Extending the Lifetime of Coarse-Grained Runtime Reconfigurable FPGAs by Balancing Processing Element Usage

2019 International Conference on Field-Programmable Technology (ICFPT), 2019

Contemporary coarse-grained runtime reconfigurable architectures (CGRRAs) are increasingly sensit... more Contemporary coarse-grained runtime reconfigurable architectures (CGRRAs) are increasingly sensitive to aging effects such as Negative Bias Temperature Instability (NBTI). To address this, we propose a reliability-aware floorplanner for CGRRAs based on a mixed Linear Programming (LP) and Integer Linear Programming (ILP) method that extends the Mean Time to Failure (MTTF) of CGRRAs by balancing processing element (PE) usage. We use this as the basis of a design space explorer that generates a variety of configurations, trading off PE displacement vs. MTTF. On average, a 2.4× improvement in MTTF was obtained for an average critical path delay increase of under 2 percent (although most benchmarks had no delay increase) compared to the default lifetime unaware floorplan.

Research paper thumbnail of Generation of colour-constrained spanning trees with application in symbolic circuit analysis

Int J Circuit Theor Appl, 1996

ABSTRACT Finding the product terms in the symbolic network function in the Laplace domain is tran... more ABSTRACT Finding the product terms in the symbolic network function in the Laplace domain is transformed to finding the common spanning trees of a pair of graphs derived from the circuit. More precisely the problem is as follows: Given a weighted undirected graph in which each edge is either red or green – find the K i lowest weight spanning trees, each of which has exactly i red edges, in increasing order of weight for all feasible i. This paper describes an algorithm for this problem which is an extension of one of the existing algorithms to generate spanning trees in increasing order of weight, where no edge colour is introduced. This is made possible by a property of the second lowest weight spanning tree with colour constraints which is proved. The algorithm runs in O(ElogE+KE α (E,V)) time and O(K:E) space, where K=∑ i K i is the total number of spanning trees generated, E is the number of edges in the graph, V is the number of vertices in the graph and α(x,y) is the very slow growing inverse of Ackermann’s function.Reviewer: A.Michalski (Warszawa)

Research paper thumbnail of Efficient Post-layout Power-Delay Curve Generation

Lecture Notes in Computer Science, 2005

ABSTRACT We have developed a complete design flow from Verilog/VHDL to layout that generates what... more ABSTRACT We have developed a complete design flow from Verilog/VHDL to layout that generates what is effectively a post-layout power versus delay curve for a digital IC block. Post-layout timing convergence is rapid over the entire delay range spanned by a power versus delay tradeoff curve. The points on the gate-sizing generated power-delay curve, when actually laid out, are extremely close in transistor-level simulated power and delay, using full 3D extracted parasitics. The user can therefore confidently obtain any feasible post-layout power-delay tradeoff from the power-delay curve for a logic block. To the best of our knowledge, this is the first report of such a post-layout capability.

Research paper thumbnail of MSP Liberator ASIC Design Flow Produces Full Custom Performance Required for Next

Research paper thumbnail of Efficient and effective placement for very large circuits

Research paper thumbnail of Partitioning with performance optimization

Research paper thumbnail of Libraries (lifejacket or straitjacket)

Proceedings of the 40th Annual Design Automation Conference, Jun 2, 2003

Research paper thumbnail of Charge-flow structures as polymeric early-warning fire alarm devices

Nasa Sti Recon Technical Report N, May 1, 1977

Research paper thumbnail of Method of high-performance CMOS design

Research paper thumbnail of Nonconvex Gate Delay Modeling and Delay Optimization

Ieee Transactions on Computer Aided Design of Integrated Circuits and Systems, Sep 1, 2008

ABSTRACT Convex delay models like the Elmore model, the related Logical Effort model, posynomial,... more ABSTRACT Convex delay models like the Elmore model, the related Logical Effort model, posynomial, and generalized posynomial models have always been favored by researchers, as convexity has a priori guarantees of global optimum solutions. The accuracy of the model may be sacrificed in this quest to generate convex delay models. In this paper, we investigate the use of signomial delay modeling for area/delay optimization. We present a procedure to automatically generate signomial gate delay models by nonlinear least squares fitting. As opposed to posynomial models, signomial models achieve better fits to SPICE generated data. However, signomials are not convex in general. Nevertheless, we show via duality arguments that we obtain near optimum (within 1%) solutions. Our optimization considers beta-ratio constraints, minimum and maximum size constraints for n- and p-transistors, rise/fall delays, and edge rates. The gate sizes for the fastest delay solution for a 44000-cell design, using the IBM 130-nm process, can be achieved in about 16 min of CPU time on a PC, and the area-delay tradeoff curve for 21 points can be generated in about 2 h of CPU time. To the best of our knowledge, this is the first report of using a true signomial delay model and its application to optimum gate sizing. In addition, we give performance details for the automatic data fitting for an 11-function library of static CMOS gates.

Research paper thumbnail of Monotonic static CMOS and dual V/sub T/ technology

Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477), 2000

Research paper thumbnail of AKORD: transistor level and mixed transistor/gate level placement tool for digital data paths

1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051), 2000

Research paper thumbnail of High Speed Redundant Adder and Divider in Output Prediction Logic

IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design (ISVLSI'05), 2005

ABSTRACT A redundant bit adder (RBA) and a divider, both implemented in output prediction logic (... more ABSTRACT A redundant bit adder (RBA) and a divider, both implemented in output prediction logic (OPL), are presented. By combining the carry-free nature of the redundant number system and the high-speed characteristics of OPL, the performance of the arithmetic blocks was tremendously improved. Fabricated in 0.18μm/1.8V CMOS, the adder achieves a measured delay of 211ps (2.4 fanout-of-four inverter delays), which is significantly faster than any previously published RBAs. The divider implemented in the same technology can achieve an operating frequency of 1.25GHz.

Research paper thumbnail of Large standard cell libraries and their impact on layout area and circuit performance

Proceedings International Conference on Computer Design. VLSI in Computers and Processors, 1996

ABSTRACT We present a complete study of layout area and circuit performance as a result of utiliz... more ABSTRACT We present a complete study of layout area and circuit performance as a result of utilizing a large library of standard cells. We built libraries of all possible static CMOS cells having a chain length of up to 7. We refer to a library of all possible cells having a chain length limit of n as sn. Although library s7 has billions of possible cells in it, our technology mapper only selected on the order of 100 of these cells to implement each of the MCNC logic synthesis benchmark circuits. We drew the following conclusions from this study. (1) For three or more layers of metal, using very large libraries (e.g., s7) is optimal in terms of area and delay. (2) For two layers of metal, limiting the library size to s5, but at least s4, is optimal in terms of area and delay. Given that the number of distinct combinational cells in industrial libraries today never exceeds 200 (and usually considerably fewer) and given that even library s4 has 3503 distinct cells, tremendous area savings (without increase in worst case path delay) are readily available by utilizing much larger cell libraries. Specifically, given that library s3 has 87 distinct cells (current industrial libraries typically have no more than this), we surmise that area savings of about 30% can be achieved try using library s7 for three or more metal layers versus any current industrial library

Research paper thumbnail of RADAR -- reconfigurable analog and digital array for radiation-hardened circuits

2004 IEEE Aerospace Conference Proceedings (IEEE Cat. No.04TH8720), 2004

ABSTRACT Because of their flexibility and reprogrammability, FPGAs have been proposed for many us... more ABSTRACT Because of their flexibility and reprogrammability, FPGAs have been proposed for many uses in avionics. Existing commercial FPGAs, being finegrained, are programmable at the bit level, allowing them to be used across a wide range of applications. Compared to an ASIC developed for a specific application, however, such a fine-grained FPGA consumes more power and exhibits higher latency and/or lower throughput. In this paper, we describe RADAR - a coarse-grained, programmable, radiation-hardened array that exhibits power and throughput similar to ASICs, yet contains a high degree of programmability. Containing multipliers, adders, SRAMs and a programmable interconnect, RADAR is customized to the DSP domain and targets applications that require low power and high throughput. The degree of radiation hardening in RADAR is a programmable feature, allowing a tradeoff between radiation hardness, throughput, and power.

Research paper thumbnail of Locally-clocked dynamic logic

1998 Midwest Symposium on Circuits and Systems (Cat. No. 98CB36268), 1999

... The basic structure of the gates used in LC dynamic logic is similar to that of True Single P... more ... The basic structure of the gates used in LC dynamic logic is similar to that of True Single Phase Clock (TSPC) ... dynamic logic section latching section clk a b a+b+.... a+b+.... . optional keeper Locally-Clocked Dynamic Logic Gregg Hoyer, Gin Yee and Carl Sechen Department ...

Research paper thumbnail of Generation of color-constrained spanning trees with application in symbolic circuit analysis

Proceedings of 4th Great Lakes Symposium on VLSI, 1994

Research paper thumbnail of Fully symbolic analysis of large analog integrated circuits

Proceedings of IEEE Custom Integrated Circuits Conference - CICC '94, 1994

ABSTRACT Our fully symbolic analysis algorithms relax the circuit size limitation (due to exponen... more ABSTRACT Our fully symbolic analysis algorithms relax the circuit size limitation (due to exponential time and storage complexity) of previously known methods. Our implementation (AnalogSifter) will output simplified symbolic expressions in the desired frequency range for any desired transfer function, including input resistances, output resistances, as well as current and voltage gains. Symbolic pole and zero expressions were extracted accurately in the desired frequency range. Very compact symbolic expressions for the voltage gains of the two-stage CMOS and 741 operational amplifiers were obtained in less than 40 CPU seconds on a SUN SPARCstation 2, and the simplified results match well with the exact numerical values up to the unity gain frequencies in both the magnitude and phase versus frequency plots

Research paper thumbnail of Maximum projections of don't care conditions in a Boolean network

Proceedings of 1993 International Conference on Computer Aided Design (ICCAD), 1993

Research paper thumbnail of Multi-layer over-the-cell routing with obstacles

Proceedings of CICC 97 - Custom Integrated Circuits Conference, 1997

ABSTRACT In this paper we present the first multi-layer graph-based router which can handle obsta... more ABSTRACT In this paper we present the first multi-layer graph-based router which can handle obstacles on any layer. We show that it improves the performance of a maze router with rip-up and reroute (RR) when obstacles are present. The graph-based router combined with a maze router with RR has yielded the best reported routing results for multi-layer problems with or without obstacles on any layer

Research paper thumbnail of Extending the Lifetime of Coarse-Grained Runtime Reconfigurable FPGAs by Balancing Processing Element Usage

2019 International Conference on Field-Programmable Technology (ICFPT), 2019

Contemporary coarse-grained runtime reconfigurable architectures (CGRRAs) are increasingly sensit... more Contemporary coarse-grained runtime reconfigurable architectures (CGRRAs) are increasingly sensitive to aging effects such as Negative Bias Temperature Instability (NBTI). To address this, we propose a reliability-aware floorplanner for CGRRAs based on a mixed Linear Programming (LP) and Integer Linear Programming (ILP) method that extends the Mean Time to Failure (MTTF) of CGRRAs by balancing processing element (PE) usage. We use this as the basis of a design space explorer that generates a variety of configurations, trading off PE displacement vs. MTTF. On average, a 2.4× improvement in MTTF was obtained for an average critical path delay increase of under 2 percent (although most benchmarks had no delay increase) compared to the default lifetime unaware floorplan.

Research paper thumbnail of Generation of colour-constrained spanning trees with application in symbolic circuit analysis

Int J Circuit Theor Appl, 1996

ABSTRACT Finding the product terms in the symbolic network function in the Laplace domain is tran... more ABSTRACT Finding the product terms in the symbolic network function in the Laplace domain is transformed to finding the common spanning trees of a pair of graphs derived from the circuit. More precisely the problem is as follows: Given a weighted undirected graph in which each edge is either red or green – find the K i lowest weight spanning trees, each of which has exactly i red edges, in increasing order of weight for all feasible i. This paper describes an algorithm for this problem which is an extension of one of the existing algorithms to generate spanning trees in increasing order of weight, where no edge colour is introduced. This is made possible by a property of the second lowest weight spanning tree with colour constraints which is proved. The algorithm runs in O(ElogE+KE α (E,V)) time and O(K:E) space, where K=∑ i K i is the total number of spanning trees generated, E is the number of edges in the graph, V is the number of vertices in the graph and α(x,y) is the very slow growing inverse of Ackermann’s function.Reviewer: A.Michalski (Warszawa)

Research paper thumbnail of Efficient Post-layout Power-Delay Curve Generation

Lecture Notes in Computer Science, 2005

ABSTRACT We have developed a complete design flow from Verilog/VHDL to layout that generates what... more ABSTRACT We have developed a complete design flow from Verilog/VHDL to layout that generates what is effectively a post-layout power versus delay curve for a digital IC block. Post-layout timing convergence is rapid over the entire delay range spanned by a power versus delay tradeoff curve. The points on the gate-sizing generated power-delay curve, when actually laid out, are extremely close in transistor-level simulated power and delay, using full 3D extracted parasitics. The user can therefore confidently obtain any feasible post-layout power-delay tradeoff from the power-delay curve for a logic block. To the best of our knowledge, this is the first report of such a post-layout capability.

Research paper thumbnail of MSP Liberator ASIC Design Flow Produces Full Custom Performance Required for Next

Research paper thumbnail of Efficient and effective placement for very large circuits

Research paper thumbnail of Partitioning with performance optimization

Research paper thumbnail of Libraries (lifejacket or straitjacket)

Proceedings of the 40th Annual Design Automation Conference, Jun 2, 2003

Research paper thumbnail of Charge-flow structures as polymeric early-warning fire alarm devices

Nasa Sti Recon Technical Report N, May 1, 1977

Research paper thumbnail of Method of high-performance CMOS design

Research paper thumbnail of Nonconvex Gate Delay Modeling and Delay Optimization

Ieee Transactions on Computer Aided Design of Integrated Circuits and Systems, Sep 1, 2008

ABSTRACT Convex delay models like the Elmore model, the related Logical Effort model, posynomial,... more ABSTRACT Convex delay models like the Elmore model, the related Logical Effort model, posynomial, and generalized posynomial models have always been favored by researchers, as convexity has a priori guarantees of global optimum solutions. The accuracy of the model may be sacrificed in this quest to generate convex delay models. In this paper, we investigate the use of signomial delay modeling for area/delay optimization. We present a procedure to automatically generate signomial gate delay models by nonlinear least squares fitting. As opposed to posynomial models, signomial models achieve better fits to SPICE generated data. However, signomials are not convex in general. Nevertheless, we show via duality arguments that we obtain near optimum (within 1%) solutions. Our optimization considers beta-ratio constraints, minimum and maximum size constraints for n- and p-transistors, rise/fall delays, and edge rates. The gate sizes for the fastest delay solution for a 44000-cell design, using the IBM 130-nm process, can be achieved in about 16 min of CPU time on a PC, and the area-delay tradeoff curve for 21 points can be generated in about 2 h of CPU time. To the best of our knowledge, this is the first report of using a true signomial delay model and its application to optimum gate sizing. In addition, we give performance details for the automatic data fitting for an 11-function library of static CMOS gates.

Research paper thumbnail of Monotonic static CMOS and dual V/sub T/ technology

Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477), 2000

Research paper thumbnail of AKORD: transistor level and mixed transistor/gate level placement tool for digital data paths

1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051), 2000

Research paper thumbnail of High Speed Redundant Adder and Divider in Output Prediction Logic

IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design (ISVLSI'05), 2005

ABSTRACT A redundant bit adder (RBA) and a divider, both implemented in output prediction logic (... more ABSTRACT A redundant bit adder (RBA) and a divider, both implemented in output prediction logic (OPL), are presented. By combining the carry-free nature of the redundant number system and the high-speed characteristics of OPL, the performance of the arithmetic blocks was tremendously improved. Fabricated in 0.18μm/1.8V CMOS, the adder achieves a measured delay of 211ps (2.4 fanout-of-four inverter delays), which is significantly faster than any previously published RBAs. The divider implemented in the same technology can achieve an operating frequency of 1.25GHz.

Research paper thumbnail of Large standard cell libraries and their impact on layout area and circuit performance

Proceedings International Conference on Computer Design. VLSI in Computers and Processors, 1996

ABSTRACT We present a complete study of layout area and circuit performance as a result of utiliz... more ABSTRACT We present a complete study of layout area and circuit performance as a result of utilizing a large library of standard cells. We built libraries of all possible static CMOS cells having a chain length of up to 7. We refer to a library of all possible cells having a chain length limit of n as sn. Although library s7 has billions of possible cells in it, our technology mapper only selected on the order of 100 of these cells to implement each of the MCNC logic synthesis benchmark circuits. We drew the following conclusions from this study. (1) For three or more layers of metal, using very large libraries (e.g., s7) is optimal in terms of area and delay. (2) For two layers of metal, limiting the library size to s5, but at least s4, is optimal in terms of area and delay. Given that the number of distinct combinational cells in industrial libraries today never exceeds 200 (and usually considerably fewer) and given that even library s4 has 3503 distinct cells, tremendous area savings (without increase in worst case path delay) are readily available by utilizing much larger cell libraries. Specifically, given that library s3 has 87 distinct cells (current industrial libraries typically have no more than this), we surmise that area savings of about 30% can be achieved try using library s7 for three or more metal layers versus any current industrial library

Research paper thumbnail of RADAR -- reconfigurable analog and digital array for radiation-hardened circuits

2004 IEEE Aerospace Conference Proceedings (IEEE Cat. No.04TH8720), 2004

ABSTRACT Because of their flexibility and reprogrammability, FPGAs have been proposed for many us... more ABSTRACT Because of their flexibility and reprogrammability, FPGAs have been proposed for many uses in avionics. Existing commercial FPGAs, being finegrained, are programmable at the bit level, allowing them to be used across a wide range of applications. Compared to an ASIC developed for a specific application, however, such a fine-grained FPGA consumes more power and exhibits higher latency and/or lower throughput. In this paper, we describe RADAR - a coarse-grained, programmable, radiation-hardened array that exhibits power and throughput similar to ASICs, yet contains a high degree of programmability. Containing multipliers, adders, SRAMs and a programmable interconnect, RADAR is customized to the DSP domain and targets applications that require low power and high throughput. The degree of radiation hardening in RADAR is a programmable feature, allowing a tradeoff between radiation hardness, throughput, and power.

Research paper thumbnail of Locally-clocked dynamic logic

1998 Midwest Symposium on Circuits and Systems (Cat. No. 98CB36268), 1999

... The basic structure of the gates used in LC dynamic logic is similar to that of True Single P... more ... The basic structure of the gates used in LC dynamic logic is similar to that of True Single Phase Clock (TSPC) ... dynamic logic section latching section clk a b a+b+.... a+b+.... . optional keeper Locally-Clocked Dynamic Logic Gregg Hoyer, Gin Yee and Carl Sechen Department ...

Research paper thumbnail of Generation of color-constrained spanning trees with application in symbolic circuit analysis

Proceedings of 4th Great Lakes Symposium on VLSI, 1994

Research paper thumbnail of Fully symbolic analysis of large analog integrated circuits

Proceedings of IEEE Custom Integrated Circuits Conference - CICC '94, 1994

ABSTRACT Our fully symbolic analysis algorithms relax the circuit size limitation (due to exponen... more ABSTRACT Our fully symbolic analysis algorithms relax the circuit size limitation (due to exponential time and storage complexity) of previously known methods. Our implementation (AnalogSifter) will output simplified symbolic expressions in the desired frequency range for any desired transfer function, including input resistances, output resistances, as well as current and voltage gains. Symbolic pole and zero expressions were extracted accurately in the desired frequency range. Very compact symbolic expressions for the voltage gains of the two-stage CMOS and 741 operational amplifiers were obtained in less than 40 CPU seconds on a SUN SPARCstation 2, and the simplified results match well with the exact numerical values up to the unity gain frequencies in both the magnitude and phase versus frequency plots

Research paper thumbnail of Maximum projections of don't care conditions in a Boolean network

Proceedings of 1993 International Conference on Computer Aided Design (ICCAD), 1993

Research paper thumbnail of Multi-layer over-the-cell routing with obstacles

Proceedings of CICC 97 - Custom Integrated Circuits Conference, 1997

ABSTRACT In this paper we present the first multi-layer graph-based router which can handle obsta... more ABSTRACT In this paper we present the first multi-layer graph-based router which can handle obstacles on any layer. We show that it improves the performance of a maze router with rip-up and reroute (RR) when obstacles are present. The graph-based router combined with a maze router with RR has yielded the best reported routing results for multi-layer problems with or without obstacles on any layer