Carl Sechen - Profile on Academia.edu (original) (raw)

Papers by Carl Sechen

2019 International Conference on Field-Programmable Technology (ICFPT), 2019

Contemporary coarse-grained runtime reconfigurable architectures (CGRRAs) are increasingly sensit... more Contemporary coarse-grained runtime reconfigurable architectures (CGRRAs) are increasingly sensitive to aging effects such as Negative Bias Temperature Instability (NBTI). To address this, we propose a reliability-aware floorplanner for CGRRAs based on a mixed Linear Programming (LP) and Integer Linear Programming (ILP) method that extends the Mean Time to Failure (MTTF) of CGRRAs by balancing processing element (PE) usage. We use this as the basis of a design space explorer that generates a variety of configurations, trading off PE displacement vs. MTTF. On average, a 2.4× improvement in MTTF was obtained for an average critical path delay increase of under 2 percent (although most benchmarks had no delay increase) compared to the default lifetime unaware floorplan.

Generation of colour-constrained spanning trees with application in symbolic circuit analysis

Int J Circuit Theor Appl, 1996

ABSTRACT Finding the product terms in the symbolic network function in the Laplace domain is tran... more ABSTRACT Finding the product terms in the symbolic network function in the Laplace domain is transformed to finding the common spanning trees of a pair of graphs derived from the circuit. More precisely the problem is as follows: Given a weighted undirected graph in which each edge is either red or green – find the K i lowest weight spanning trees, each of which has exactly i red edges, in increasing order of weight for all feasible i. This paper describes an algorithm for this problem which is an extension of one of the existing algorithms to generate spanning trees in increasing order of weight, where no edge colour is introduced. This is made possible by a property of the second lowest weight spanning tree with colour constraints which is proved. The algorithm runs in O(ElogE+KE α (E,V)) time and O(K:E) space, where K=∑ i K i is the total number of spanning trees generated, E is the number of edges in the graph, V is the number of vertices in the graph and α(x,y) is the very slow growing inverse of Ackermann’s function.Reviewer: A.Michalski (Warszawa)

Efficient Post-layout Power-Delay Curve Generation

Lecture Notes in Computer Science, 2005

ABSTRACT We have developed a complete design flow from Verilog/VHDL to layout that generates what... more ABSTRACT We have developed a complete design flow from Verilog/VHDL to layout that generates what is effectively a post-layout power versus delay curve for a digital IC block. Post-layout timing convergence is rapid over the entire delay range spanned by a power versus delay tradeoff curve. The points on the gate-sizing generated power-delay curve, when actually laid out, are extremely close in transistor-level simulated power and delay, using full 3D extracted parasitics. The user can therefore confidently obtain any feasible post-layout power-delay tradeoff from the power-delay curve for a logic block. To the best of our knowledge, this is the first report of such a post-layout capability.

Next generation military systems and other national security applications require advanced digita... more Next generation military systems and other national security applications require advanced digital signal processing electronics implemented in highly optimized ASICs with Mission Specific Processing Architectures. But there is a problem that currently blocks these systems from fully exploiting the potential of integrated circuit technology. Aerospace ASICs are typically, of economic necessity, designed differently and significantly suboptimally relative to high volume standard products. The project described in this paper, named Liberator, has the objective to develop and demonstrate new ASIC design methodology that will close the gap between Full Custom and ASIC methodologies. The paper will first describe the barriers that must be addressed. It will then describe the technical approach we are following to overcome these barriers. Improvement results for our methods versus standard ASIC design methods will be given. Finally a typical application which is being used to demonstrate the results of this project and the expected system benefits in performance and cost will be described.

Partitioning with performance optimization

Libraries (lifejacket or straitjacket)

Proceedings of the 40th Annual Design Automation Conference, Jun 2, 2003

Nasa Sti Recon Technical Report N, May 1, 1977

The objective of this thesis was to investigate a new device called the charge-flow transistor (C... more The objective of this thesis was to investigate a new device called the charge-flow transistor (CFT) and its applications for fire detection and gas sensing. A simpler device, the lock-and-key, was first used to determine the utility of various thin-film polymers as possible sensing materials. Two polymers, PFI and PSB, were found to be particularly suitable for fire detection. One polymer, PAPA, was found to be promising as a relative humidity sensor. The charge-flow capacitor was examined next. This structure is basically a parallel-plate capacitor with a polymer-filled gap in the metallic top electrode. Its behavior was successfully modeled as an RC transmission line, and it demonstrated the basic principle of CFT operation in an easy-to-fabricate device structure. Finally, the charge-flow transistor was investigated. Prototype devices were fabricated and tested. A model of device performance was developed. Extensive fire tests performed with a PFI-coated CFT indicated very good sensitivity to smouldering fires. It was discovered that the effective threshold voltage of the CFT depended on whether surface or bulk conduction in the thin film was dominant. For the case of PFI, the surface conductivity was substantially larger than the bulk conductivity. 4 ACKNOWLEDGMENTS The author acknowledges the tremendous effort of Professor Stephen D. Senturia who supervised this thesis and provided many theoretical insights thror,ghout the course of this work. The author is deeply indebted to Tony Colozzi whose invaluable suggestions made possible the experimental measurements. Appreciation is gratefully expressed to Debbie Samkoff for preparing the polymer solutions and to Tally Stone for typing the manuscript. Thanks are also extended to technical artist John Mara for his fine work in preparing the figures. Acknowledgement is made of the research assistant support which was made available by the NASA/Lewis Research Center under grant number NSG-3061.

Method of high-performance CMOS design

Nonconvex Gate Delay Modeling and Delay Optimization

Ieee Transactions on Computer Aided Design of Integrated Circuits and Systems, Sep 1, 2008

ABSTRACT Convex delay models like the Elmore model, the related Logical Effort model, posynomial,... more ABSTRACT Convex delay models like the Elmore model, the related Logical Effort model, posynomial, and generalized posynomial models have always been favored by researchers, as convexity has a priori guarantees of global optimum solutions. The accuracy of the model may be sacrificed in this quest to generate convex delay models. In this paper, we investigate the use of signomial delay modeling for area/delay optimization. We present a procedure to automatically generate signomial gate delay models by nonlinear least squares fitting. As opposed to posynomial models, signomial models achieve better fits to SPICE generated data. However, signomials are not convex in general. Nevertheless, we show via duality arguments that we obtain near optimum (within 1%) solutions. Our optimization considers beta-ratio constraints, minimum and maximum size constraints for n- and p-transistors, rise/fall delays, and edge rates. The gate sizes for the fastest delay solution for a 44000-cell design, using the IBM 130-nm process, can be achieved in about 16 min of CPU time on a PC, and the area-delay tradeoff curve for 21 points can be generated in about 2 h of CPU time. To the best of our knowledge, this is the first report of using a true signomial delay model and its application to optimum gate sizing. In addition, we give performance details for the automatic data fitting for an 11-function library of static CMOS gates.

Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477), 2000

We developed a methodology and tools for synthesizing monotonic static CMOS networks, which consi... more We developed a methodology and tools for synthesizing monotonic static CMOS networks, which consist of alternating low-skewed and high-skewed static gates. When used with a dual V T process, monotonic static CMOS can simultaneously reduce standby static power and increase performance by using low V T devices in the evaluation networks and making all other devices high V T. Experimental results show monotonic static CMOS to be 1.67 times faster than traditional static CMOS. 2. Monotonic Static CMOS An example of a low-skewed (LS) monotonic static CMOS gate is shown in Fig. 1 (a). The devices are sized so that the gate has a fast fall delay at the expense of a slow rise delay. Likewise, a high-skewed (HS) gate is designed to provide a fast pull-up, as shown in Fig. 1 (b).

1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051), 2000

This paper describes AKORD, a transistor level and mixed transistor/gate level placement tool. AK... more This paper describes AKORD, a transistor level and mixed transistor/gate level placement tool. AKORD has unique layout capabilities that address the digital data path layout problem. In order to improve communication between the placement and routing steps, new post placement algorithms were developed: a device re-spacing procedure, an optimization procedure for gate contacts, and a procedure which reduces wire crossovers. AKORD supports dynamically: 1. Transistor folding without usage of device libraries that contain variants of the same device; 2. Device merging, including information about optimal transistor chain formation; and 3. Well area minimization. Experimental results show that the automated layouts are comparable to skilled manual layouts and that the computation times are quite modest.

High Speed Redundant Adder and Divider in Output Prediction Logic

IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design (ISVLSI'05), 2005

ABSTRACT A redundant bit adder (RBA) and a divider, both implemented in output prediction logic (... more ABSTRACT A redundant bit adder (RBA) and a divider, both implemented in output prediction logic (OPL), are presented. By combining the carry-free nature of the redundant number system and the high-speed characteristics of OPL, the performance of the arithmetic blocks was tremendously improved. Fabricated in 0.18μm/1.8V CMOS, the adder achieves a measured delay of 211ps (2.4 fanout-of-four inverter delays), which is significantly faster than any previously published RBAs. The divider implemented in the same technology can achieve an operating frequency of 1.25GHz.

Large standard cell libraries and their impact on layout area and circuit performance

Proceedings International Conference on Computer Design. VLSI in Computers and Processors, 1996

ABSTRACT We present a complete study of layout area and circuit performance as a result of utiliz... more ABSTRACT We present a complete study of layout area and circuit performance as a result of utilizing a large library of standard cells. We built libraries of all possible static CMOS cells having a chain length of up to 7. We refer to a library of all possible cells having a chain length limit of n as sn. Although library s7 has billions of possible cells in it, our technology mapper only selected on the order of 100 of these cells to implement each of the MCNC logic synthesis benchmark circuits. We drew the following conclusions from this study. (1) For three or more layers of metal, using very large libraries (e.g., s7) is optimal in terms of area and delay. (2) For two layers of metal, limiting the library size to s5, but at least s4, is optimal in terms of area and delay. Given that the number of distinct combinational cells in industrial libraries today never exceeds 200 (and usually considerably fewer) and given that even library s4 has 3503 distinct cells, tremendous area savings (without increase in worst case path delay) are readily available by utilizing much larger cell libraries. Specifically, given that library s3 has 87 distinct cells (current industrial libraries typically have no more than this), we surmise that area savings of about 30% can be achieved try using library s7 for three or more metal layers versus any current industrial library

RADAR -- reconfigurable analog and digital array for radiation-hardened circuits

2004 IEEE Aerospace Conference Proceedings (IEEE Cat. No.04TH8720), 2004

ABSTRACT Because of their flexibility and reprogrammability, FPGAs have been proposed for many us... more ABSTRACT Because of their flexibility and reprogrammability, FPGAs have been proposed for many uses in avionics. Existing commercial FPGAs, being finegrained, are programmable at the bit level, allowing them to be used across a wide range of applications. Compared to an ASIC developed for a specific application, however, such a fine-grained FPGA consumes more power and exhibits higher latency and/or lower throughput. In this paper, we describe RADAR - a coarse-grained, programmable, radiation-hardened array that exhibits power and throughput similar to ASICs, yet contains a high degree of programmability. Containing multipliers, adders, SRAMs and a programmable interconnect, RADAR is customized to the DSP domain and targets applications that require low power and high throughput. The degree of radiation hardening in RADAR is a programmable feature, allowing a tradeoff between radiation hardness, throughput, and power.

Locally-clocked dynamic logic

1998 Midwest Symposium on Circuits and Systems (Cat. No. 98CB36268), 1999

... The basic structure of the gates used in LC dynamic logic is similar to that of True Single P... more ... The basic structure of the gates used in LC dynamic logic is similar to that of True Single Phase Clock (TSPC) ... dynamic logic section latching section clk a b a+b+.... a+b+.... . optional keeper Locally-Clocked Dynamic Logic Gregg Hoyer, Gin Yee and Carl Sechen Department ...

Proceedings of 4th Great Lakes Symposium on VLSI, 1994

Fully symbolic analysis of large analog integrated circuits

Proceedings of IEEE Custom Integrated Circuits Conference - CICC '94, 1994

ABSTRACT Our fully symbolic analysis algorithms relax the circuit size limitation (due to exponen... more ABSTRACT Our fully symbolic analysis algorithms relax the circuit size limitation (due to exponential time and storage complexity) of previously known methods. Our implementation (AnalogSifter) will output simplified symbolic expressions in the desired frequency range for any desired transfer function, including input resistances, output resistances, as well as current and voltage gains. Symbolic pole and zero expressions were extracted accurately in the desired frequency range. Very compact symbolic expressions for the voltage gains of the two-stage CMOS and 741 operational amplifiers were obtained in less than 40 CPU seconds on a SUN SPARCstation 2, and the simplified results match well with the exact numerical values up to the unity gain frequencies in both the magnitude and phase versus frequency plots

Proceedings of 1993 International Conference on Computer Aided Design (ICCAD), 1993

The maximum flexibility of afunction at a node in a Boolean network is described by the incomplet... more The maximum flexibility of afunction at a node in a Boolean network is described by the incompletely specified function formed using the union of the satisfability don't care set (SDC) and observability don't care set (ODC} of the node. 17te normal representation of these sets depends on every variable in the network, can be quite large and m y be hard to compute. Usually, we are only interested in the don't care set when restricted to a certain set of variables. We give a formulation for the don't care set of a node in terms of a pre-specijied set of variables and prove that this formulation is the maximum projection of the entire don't care set onto the chosen set of variables. This formulation allows computation to start at the target node andproceed by traversing the network outwards. The computation m y be stopped at any time yielding a valid subset of the don't care set.

Multi-layer over-the-cell routing with obstacles

Proceedings of CICC 97 - Custom Integrated Circuits Conference, 1997

ABSTRACT In this paper we present the first multi-layer graph-based router which can handle obsta... more ABSTRACT In this paper we present the first multi-layer graph-based router which can handle obstacles on any layer. We show that it improves the performance of a maze router with rip-up and reroute (RR) when obstacles are present. The graph-based router combined with a maze router with RR has yielded the best reported routing results for multi-layer problems with or without obstacles on any layer