Kwangsoo Han | University of California, San Diego (original) (raw)

Papers by Kwangsoo Han

Research paper thumbnail of Novel level-up shifters for high performance and low power mobile devices

2013 IEEE International Conference on Consumer Electronics (ICCE), 2013

Research paper thumbnail of Ultra low power and high speed FPGA design with CNFET

2012 International Symposium on Communications and Information Technologies (ISCIT), 2012

Research paper thumbnail of Improved Performance of 3DIC Implementations Through Inherent Awareness of Mix-and-Match Die Stacking

Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2016

Research paper thumbnail of Delay uncertainty and signal criticality driven routing channel optimization for advanced DRAM products

2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), 2016

Research paper thumbnail of Scalable detailed placement legalization for complex sub-14nm constraints

2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2015

Research paper thumbnail of ILP-based co-optimization of cut mask layout, dummy fill, and timing for sub-14nm BEOL technology

Photomask Technology 2015, 2015

Research paper thumbnail of A global-local optimization framework for simultaneous multi-mode multi-corner clock skew variation reduction

Proceedings of the 52nd Annual Design Automation Conference on - DAC '15, 2015

As combinations of signoff corners grow in modern SoCs, minimization of clock skew variation acro... more As combinations of signoff corners grow in modern SoCs, minimization of clock skew variation across corners is important. Large skew variation can cause difficulties in multi-corner timing closure because fixing violations at one corner can lead to violations at other corners. Such “ping-pong” effects lead to significant power and area overheads and time to signoff. We propose a novel framework encompassing both global and local clock network optimizations to minimize the sum of skew variations across different PVT corners between all sequentially adjacent sink pairs. The global optimization uses linear programming to guide buffer insertion, buffer removal and routing detours. The local optimization is based on machine learning-based predictors of latency change; these are used for iterative optimization with tree surgery, buffer sizing and buffer displacement operators. Our optimization achieves up to 22% total skew variation reduction across multiple testcases implemented in foundry 28nm technology, as compared to a best-practices CTS solution using a leading commercial tool.

Research paper thumbnail of Evaluation of BEOL Design Rule Impacts Using An Optimal ILP-based Detailed Router

Continued technology scaling with more pervasive use of multipatterning has led to complex design... more Continued technology scaling with more pervasive use of multipatterning has led to complex design rules and increased
difficulty of maintaining high layout densities. Intuitively,
emerging constraints such as unidirectional patterning or
increased via spacing will decrease achievable density of the
final place-and-route solution, worsening die area and product
cost. However, no methodology exists for accurate assessment
of design rules’ impact on physical chip implementation. At
the same time, this is a crucial need for early development
of BEOL process technologies, particularly with FinFET or
future vertical-device architectures where cell footprints can
become much smaller than in bulk planar CMOS technologies.
In this work, we study impacts of patterning technology
choices and associated design rules on physical implementation
density, with respect to cost-optimal design rule-correct
detailed routing. A key contribution is an Integer Linear
Programming (ILP) based optimal router (OptRouter) which
considers complex design rules that arise in sub-20nm process
technologies. Using OptRouter, we assess wirelength and via
count impacts of various design rules (implicitly, patterning
technology choices) by analyzing optimal routing solutions of
clips (i.e., switchbox instances) extracted from post-detailed
route layouts in an advanced technology.

Research paper thumbnail of A Global-Local Optimization Framework for Simultaneous Multi-Mode Multi-Corner Clock Skew Variation Reduction

As combinations of signoff corners grow in modern SoCs, minimization of clock skew variation acro... more As combinations of signoff corners grow in modern SoCs,
minimization of clock skew variation across corners is important. Large skew variation can cause difficulties in multi-corner timing closure because fixing violations at one corner can lead to violations at other corners. Such “ping-pong” effects lead to significant power and area overheads and time to signoff. We propose a novel framework encompassing both global and local clock network optimizations to minimize the sum of skew variations across different PVT corners between all
sequentially adjacent sink pairs. The global optimization uses
linear programming to guide buffer insertion, buffer removal and routing detours. The local optimization is based on machine learning-based predictors of latency change; these are used for iterative optimization with tree surgery, buffer sizing and buffer displacement operators. Our optimization achieves up to 22% total skew variation reduction across multiple testcases implemented in foundry 28nm technology, as compared to a best-practices CTS solution using a leading commercial tool.

Research paper thumbnail of Clock Clustering and IO Optimization for 3D Integration

3D interconnect between two dies can span a wide range of bandwidths and region areas, depending ... more 3D interconnect between two dies can span a wide range of
bandwidths and region areas, depending on the application, partitioning of the dies, die size, and floorplan. We explore the concept of dividing such an interconnect into local clusters, each with a cluster clock. We combine such clustering with a choice of three clock synchronization schemes (synchronous, source-synchronous, asynchronous) and study impacts on power, area and timing of the clock tree, data path and 3DIO. We build a model for the power, area and timing as a function of key system requirements and constraints: total bandwidth, region area, number of clusters, clock synchronization scheme, and 3DIO frequency. Such a model enables architects to perform pathfinding exploration of clocking and IO power, area and bandwidth optimization for 3D integration.

Research paper thumbnail of Benchmarking of Mask Fracturing Heuristics

Aggressive resolution enhancement techniques such as inverse lithography (ILT) often lead to comp... more Aggressive resolution enhancement techniques such as inverse
lithography (ILT) often lead to complex, non-rectilinear mask
shapes which make mask writing extremely slow and expensive. To reduce shot count of complex mask shapes, mask writers allow overlapping shots, due to which the problem of fracturing mask shapes with minimum shot count is NP-hard. The need to correct for e-beam proximity effect makes mask fracturing even more challenging. Although a number of fracturing heuristics have been proposed, there has been no systematic study to analyze the quality of their solutions. In this work, we propose a new method to generate
benchmarks with known optimal solutions that can be used to
evaluate the suboptimality of mask fracturing heuristics. We also propose a method to generate tight upper and lower bounds for actual ILT mask shapes by formulating mask fracturing as an integer linear program and solving it using branch and price. Our results show that a state-of-the-art prototype [version of] capability within a commercial EDA tool for e-beam mask shot decomposition can be suboptimal by as much as 3.7 for generated benchmarks, and by as much as 3.6 for actual ILT shapes.

Research paper thumbnail of OCV-Aware Top-Level Clock Tree Optimization

The clock trees of high-performance synchronous circuits have many clock logic cells (e.g., clock... more The clock trees of high-performance synchronous circuits have
many clock logic cells (e.g., clock gating cells, multiplexers
and dividers) in order to achieve aggressive clock gating and
required performance across a wide range of operating modes
and conditions. As a result, clock tree structures have become
very complex and difficult to optimize with automatic clock tree
synthesis (CTS) tools. In advanced process nodes, CTS becomes
even more challenging due to on-chip variation (OCV) effects. In
this paper, we present a new CTS methodology that optimizes clock logic cell placements and buffer insertions in the top level of a clock tree. We formulate the top-level clock tree optimization problem as a linear program that minimizes a weighted sum of timing slacks, clock uncertainty and wirelength. Experimental results in a commercial 28nm FDSOI technology show that our method can improve post-CTS worst negative slack across all modes/corners by up to 320ps compared to a leading commercial provider’s CTS flow.

Research paper thumbnail of Novel level-up shifters for high performance and low power mobile devices

2013 IEEE International Conference on Consumer Electronics (ICCE), 2013

Research paper thumbnail of Ultra low power and high speed FPGA design with CNFET

2012 International Symposium on Communications and Information Technologies (ISCIT), 2012

Research paper thumbnail of Improved Performance of 3DIC Implementations Through Inherent Awareness of Mix-and-Match Die Stacking

Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2016

Research paper thumbnail of Delay uncertainty and signal criticality driven routing channel optimization for advanced DRAM products

2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), 2016

Research paper thumbnail of Scalable detailed placement legalization for complex sub-14nm constraints

2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2015

Research paper thumbnail of ILP-based co-optimization of cut mask layout, dummy fill, and timing for sub-14nm BEOL technology

Photomask Technology 2015, 2015

Research paper thumbnail of A global-local optimization framework for simultaneous multi-mode multi-corner clock skew variation reduction

Proceedings of the 52nd Annual Design Automation Conference on - DAC '15, 2015

As combinations of signoff corners grow in modern SoCs, minimization of clock skew variation acro... more As combinations of signoff corners grow in modern SoCs, minimization of clock skew variation across corners is important. Large skew variation can cause difficulties in multi-corner timing closure because fixing violations at one corner can lead to violations at other corners. Such “ping-pong” effects lead to significant power and area overheads and time to signoff. We propose a novel framework encompassing both global and local clock network optimizations to minimize the sum of skew variations across different PVT corners between all sequentially adjacent sink pairs. The global optimization uses linear programming to guide buffer insertion, buffer removal and routing detours. The local optimization is based on machine learning-based predictors of latency change; these are used for iterative optimization with tree surgery, buffer sizing and buffer displacement operators. Our optimization achieves up to 22% total skew variation reduction across multiple testcases implemented in foundry 28nm technology, as compared to a best-practices CTS solution using a leading commercial tool.

Research paper thumbnail of Evaluation of BEOL Design Rule Impacts Using An Optimal ILP-based Detailed Router

Continued technology scaling with more pervasive use of multipatterning has led to complex design... more Continued technology scaling with more pervasive use of multipatterning has led to complex design rules and increased
difficulty of maintaining high layout densities. Intuitively,
emerging constraints such as unidirectional patterning or
increased via spacing will decrease achievable density of the
final place-and-route solution, worsening die area and product
cost. However, no methodology exists for accurate assessment
of design rules’ impact on physical chip implementation. At
the same time, this is a crucial need for early development
of BEOL process technologies, particularly with FinFET or
future vertical-device architectures where cell footprints can
become much smaller than in bulk planar CMOS technologies.
In this work, we study impacts of patterning technology
choices and associated design rules on physical implementation
density, with respect to cost-optimal design rule-correct
detailed routing. A key contribution is an Integer Linear
Programming (ILP) based optimal router (OptRouter) which
considers complex design rules that arise in sub-20nm process
technologies. Using OptRouter, we assess wirelength and via
count impacts of various design rules (implicitly, patterning
technology choices) by analyzing optimal routing solutions of
clips (i.e., switchbox instances) extracted from post-detailed
route layouts in an advanced technology.

Research paper thumbnail of A Global-Local Optimization Framework for Simultaneous Multi-Mode Multi-Corner Clock Skew Variation Reduction

As combinations of signoff corners grow in modern SoCs, minimization of clock skew variation acro... more As combinations of signoff corners grow in modern SoCs,
minimization of clock skew variation across corners is important. Large skew variation can cause difficulties in multi-corner timing closure because fixing violations at one corner can lead to violations at other corners. Such “ping-pong” effects lead to significant power and area overheads and time to signoff. We propose a novel framework encompassing both global and local clock network optimizations to minimize the sum of skew variations across different PVT corners between all
sequentially adjacent sink pairs. The global optimization uses
linear programming to guide buffer insertion, buffer removal and routing detours. The local optimization is based on machine learning-based predictors of latency change; these are used for iterative optimization with tree surgery, buffer sizing and buffer displacement operators. Our optimization achieves up to 22% total skew variation reduction across multiple testcases implemented in foundry 28nm technology, as compared to a best-practices CTS solution using a leading commercial tool.

Research paper thumbnail of Clock Clustering and IO Optimization for 3D Integration

3D interconnect between two dies can span a wide range of bandwidths and region areas, depending ... more 3D interconnect between two dies can span a wide range of
bandwidths and region areas, depending on the application, partitioning of the dies, die size, and floorplan. We explore the concept of dividing such an interconnect into local clusters, each with a cluster clock. We combine such clustering with a choice of three clock synchronization schemes (synchronous, source-synchronous, asynchronous) and study impacts on power, area and timing of the clock tree, data path and 3DIO. We build a model for the power, area and timing as a function of key system requirements and constraints: total bandwidth, region area, number of clusters, clock synchronization scheme, and 3DIO frequency. Such a model enables architects to perform pathfinding exploration of clocking and IO power, area and bandwidth optimization for 3D integration.

Research paper thumbnail of Benchmarking of Mask Fracturing Heuristics

Aggressive resolution enhancement techniques such as inverse lithography (ILT) often lead to comp... more Aggressive resolution enhancement techniques such as inverse
lithography (ILT) often lead to complex, non-rectilinear mask
shapes which make mask writing extremely slow and expensive. To reduce shot count of complex mask shapes, mask writers allow overlapping shots, due to which the problem of fracturing mask shapes with minimum shot count is NP-hard. The need to correct for e-beam proximity effect makes mask fracturing even more challenging. Although a number of fracturing heuristics have been proposed, there has been no systematic study to analyze the quality of their solutions. In this work, we propose a new method to generate
benchmarks with known optimal solutions that can be used to
evaluate the suboptimality of mask fracturing heuristics. We also propose a method to generate tight upper and lower bounds for actual ILT mask shapes by formulating mask fracturing as an integer linear program and solving it using branch and price. Our results show that a state-of-the-art prototype [version of] capability within a commercial EDA tool for e-beam mask shot decomposition can be suboptimal by as much as 3.7 for generated benchmarks, and by as much as 3.6 for actual ILT shapes.

Research paper thumbnail of OCV-Aware Top-Level Clock Tree Optimization

The clock trees of high-performance synchronous circuits have many clock logic cells (e.g., clock... more The clock trees of high-performance synchronous circuits have
many clock logic cells (e.g., clock gating cells, multiplexers
and dividers) in order to achieve aggressive clock gating and
required performance across a wide range of operating modes
and conditions. As a result, clock tree structures have become
very complex and difficult to optimize with automatic clock tree
synthesis (CTS) tools. In advanced process nodes, CTS becomes
even more challenging due to on-chip variation (OCV) effects. In
this paper, we present a new CTS methodology that optimizes clock logic cell placements and buffer insertions in the top level of a clock tree. We formulate the top-level clock tree optimization problem as a linear program that minimizes a weighted sum of timing slacks, clock uncertainty and wirelength. Experimental results in a commercial 28nm FDSOI technology show that our method can improve post-CTS worst negative slack across all modes/corners by up to 320ps compared to a leading commercial provider’s CTS flow.