CHENGHUA WANG - Academia.edu (original) (raw)

Papers by CHENGHUA WANG

Research paper thumbnail of Horizontal Correlation Analysis without Precise Location on Schoolbook Polynomial Multiplication of Lattice-based Cryptosystem

2022 IEEE International Symposium on Circuits and Systems (ISCAS)

Research paper thumbnail of A Novel Combined Correlation Power Analysis (CPA) Attack on Schoolbook Polynomial Multiplication in Lattice-based Cryptosystems

2022 IEEE 35th International System-on-Chip Conference (SOCC)

Research paper thumbnail of TCAD Simulation of Single Event Transient in Si Bulk MOSFET at Cryogenic Temperature

IEEE Access

In this paper, the functional relationship between temperature and single event transient current... more In this paper, the functional relationship between temperature and single event transient currents caused by heavy-ion striking using TCAD simulation is investigated from 77K to 300 K on 65nm Si bulk n MOSFET. TCAD simulation shows that temperature has a significant influence on the trends of heavyions-induced current. The peak value of drain current and collected charge in MOSFET reach the maximum value at 200 K, while the SET fall time decreases monotonously as temperature decrease from 300 K to 77 K at four LET values. The extracted analysis found that the enhancement of bipolar amplification effect is the main reason for the increase of collected charge and broader pulse width at high temperatures. The above conclusions provide a theoretical basis for the wide application of 65nm devices in 77 K-300 K space environment. INDEX TERMS Heavy ion, single event transient, cryogenic temperatures, TCAD simulation.

Research paper thumbnail of Dynamic Reconfigurable PUFs Based on FPGA

2019 IEEE International Workshop on Signal Processing Systems (SiPS)

Physical unclonable function (PUF) is a promising security primitive. Configurable ring oscillato... more Physical unclonable function (PUF) is a promising security primitive. Configurable ring oscillator (CRO) PUF is an evolvement of conventional RO PUF, which improves the entropy and decrease the hardware cost by introducing configurability. Compared with other types of PUF structures, CRO PUFs are FPGA friendly. In this paper, a dynamic reconfigurable mechanism is proposed for the CRO PUF in FPGA implementation. Three different CRO PUFs are implemented using the proposed reconfigurable method and each CRO can be implemented in a single configurable logic block (CLB) of FPGA. Based on the partial reconFigure functions provided by Xilinx FPGAs, the PUF structures can be configured to any of the three PUF structures. The experimental results show that the dynamic reconfigurable PUF structure has a higher hardware efficiency, reliability and stability compared with the previous works.

Research paper thumbnail of Theoretical Analysis of Configurable RO PUFs and Strategies to Enhance Security

2019 IEEE International Workshop on Signal Processing Systems (SiPS)

Compared to traditional ring oscillator PUF (RO PUF), configurable RO PUF (CRO PUF) greatly incre... more Compared to traditional ring oscillator PUF (RO PUF), configurable RO PUF (CRO PUF) greatly increases the number of challenge response pairs (CRPs) and improves hardware utilization. However, in the conventional CRO PUF design, when a path is selected by the challenge to generate a response, the circuit characteristic information constituting the CRO PUF, such as the delay information of the configurable unit, the transmission model, and etc., can also be leaked. Once the adversary monitors and masters this information, they can use this information to attack the CRO PUF circuits, such as modeling attacks. This paper establishes a theoretical model of CRO PUF and analyzes its unpredictability and security. Based on this model, a new mechanism to generate the proper challenges is proposed in this paper. In the proposed mechanism, the challenge is generated and utilized by a specific way, which can delay the feature leakage of the CRO PUF, thereby improving the security of the CRO PUF.

Research paper thumbnail of Theoretical Analysis of Delay-Based PUFs and Design Strategies for Improvement

2019 IEEE International Symposium on Circuits and Systems (ISCAS)

Delay-based physical unclonable function (PUF) designs use the random delay differences in circui... more Delay-based physical unclonable function (PUF) designs use the random delay differences in circuit transmission to extract response. In the existing PUF designs, there are few studies on investigating the link between process variation and PUF performance. The experimental data can reflect the performance of the new design to a certain extent, but lack of theoretical analysis to provide thorough information. In this paper, a theoretical model for delay-based PUF designs is proposed. An analysis of the delay-based PUF improvements by existing design strategies is also investigated. Moreover, a guidance to develop and improve future delay-based PUF designs using the proposed theoretical model is also given in this paper.

Research paper thumbnail of Radiation and Annealing Effects on GaN MOSFETs Irradiated by 1 MeV Electrons

Electronics

In this paper, the 650 V N-channel GaN MOSFETs are chosen as the research object to study the rad... more In this paper, the 650 V N-channel GaN MOSFETs are chosen as the research object to study the radiation and annealing effects under 1 MeV electron irradiation. The output, transfer, and breakdown characteristics are measured before and after electron irradiation. The experimental results show the variation of the I-V curves after irradiation, which is related to the increased conductivity due to the generation of an oxide charge in the GaN MOSFETs. However, the gradual formation of the interface trapped charge offsets the effect of the oxide charge, which decreases the conductivity of the GaN MOSFETs and the drain-source current. The long-term annealing at room temperature degrades the interface trapped charges, leading to the restoration of the I-V characteristics. After room temperature annealing, the breakdown voltage is still higher than the unirradiated level, and this is because the displacement defects caused by electron irradiation cannot be recovered at room temperature.

Research paper thumbnail of Design of Approximate FFT with Bit-width Selection Algorithms

2018 IEEE International Symposium on Circuits and Systems (ISCAS), 2018

This paper presents the approximate designs of Fast Fourier Transformation (FFT) circuit. The tra... more This paper presents the approximate designs of Fast Fourier Transformation (FFT) circuit. The tradeoff between accuracy and hardware performance is achieved by using bit-width selection for each stage. The error rate can be tuned with bit-width selection. We proposed two algorithms for bit-width selection under certain error restriction. The first algorithm is targeting an approximate FFT design with low hardware cost. While the second algorithm is proposed to achieve high performance. Both of proposed algorithms allow the designer to tradeoff hardware performance and computation accuracy in each stage. The proposed two designs are implemented on FPGA. The results show that the approximate FFT design using the first algorithm can reduce hardware resource consumption up to 30.2%. The second algorithm can increases the performance of the approximate FFT deisgn up to 24.0%, while it also saves 25.2% resource consumption.

Research paper thumbnail of An Energy Efficient Accelerator for Bidirectional Recurrent Neural Networks (BiRNNs) Using Hybrid-Iterative Compression With Error Sensitivity

IEEE Transactions on Circuits and Systems I: Regular Papers, 2021

Recurrent Neural Networks (RNNs) have been widely used in many sequential applications, such as m... more Recurrent Neural Networks (RNNs) have been widely used in many sequential applications, such as machine translation, speech recognition and sentiment analysis. Long Term Short Term Memory (LSTM) and Gated Recurrent Unit (GRU) are widely used variants of RNN due to their effectiveness in overcoming gradient vanishing and exploding problems; however, compared to conventional RNN, their massive storage and computation requirements hinder their application. In addition, the recurrent structure of RNNs makes them prone to accumulate errors, resulting in a severe loss of accuracy. In this work, we propose a hybrid-iterative compression (HIC) algorithm for LSTM/GRU. By exploiting the error sensitivity of RNN, the gating units are divided into error-sensitive and error-insensitive groups, that are compressed using different algorithms. By using this approach, a 37.1times/32.3times37.1\\times /32.3\\times 37.1times/32.3times compression ratio is achieved with negligible accuracy loss for LSTM/GRU. Further, an energy efficient accelerator for bidirectional RNNs is proposed. In this accelerator, the data flow of the matrix operation unit based on the block structure matrix (MOU-S) is improved through rearranging weights; the utilization of BRAM is improved through a fine-grained parallelism configuration of matrix-vector multiplications (MVMs). Meanwhile, the timing matching strategy alleviates the load-imbalance problem between MOU-S and the matrix operation unit based on top- kkk pruning (MOU-P). When running at 200MHz on Xilinx ADM-PCIE-7V3 FPGA, the proposed design achieves an improvement in energy efficiency in a range of 5%-237% for LSTM networks, and an improvement of 58% for GRU networks compared with state-of-the-art designs.

Research paper thumbnail of Ultra High-Speed Polynomial Multiplications for Lattice-Based Cryptography on FPGAs

IEEE Transactions on Emerging Topics in Computing, 2022

Research paper thumbnail of Design and Evaluation of a Power-Efficient Approximate Systolic Array Architecture for Matrix Multiplication

2019 IEEE International Workshop on Signal Processing Systems (SiPS), 2019

Matrix multiplication (MM) is a basic operation for many Digital Signal Processing applications. ... more Matrix multiplication (MM) is a basic operation for many Digital Signal Processing applications. A Systolic Array (SA) is often considered as one of the most favorable architecture to achieve high performance for matrix multiplication. In this paper, the design exploration for an approximate SA is pursued; three design schemes are proposed by introducing approximation in multiple sub-modules. An approximation factor alpha\alphaalpha is introduced; it is related to the inexact columns in the SA to explore the accuracy-efficiency trade-off present in the proposed designs. In the evaluation, an 8-bit input operand matrix multiplication is considered; the Synopsys Design Compiler at 45nm technology node is used to establish hardware-related metrics. The Error Rate (ER), Normalized Mean Error Distance (NMED) and Mean Relative Error Distance (MRED) are used as figures of merit for error analysis. Results show that the proposed architecture for 8-bit matrix multiplication with an approximation factor alpha=7\alpha=7alpha=7 has the lower power consumption compared to existing inexact designs found in the technical literature with comparable NMED. In addition, a power delay product vs NMED analysis shows the proposed designs have a lower PDP so applicable to low power applications. The practicality of the proposed architecture is established by computing the Discrete Cosine Transform.

Research paper thumbnail of Security Analysis of Hardware Trojans on Approximate Circuits

Proceedings of the 2020 on Great Lakes Symposium on VLSI, 2020

Approximate computing, for error-tolerant applications, provides trade-offs for computations to a... more Approximate computing, for error-tolerant applications, provides trade-offs for computations to achieve improved speed and power performance. Approximate circuits, in particular approximate arithmetic circuits, directly affect the performance of a computing system. Hence, approximate circuit designs have been extensively studied. However, security issues of approximate circuits have been ignored. Moreover, hardware Trojans have been found in fabricated chips in manufacturing industry chains by untrusted foundries. Hardware Trojans could affect the functionality of approximate circuits under very rare circumstances with inconsiderable footprints. In this paper, hardware Trojan insertion methods based on signal transition probability are utilized to investigate and evaluate the security threats in approximate circuits. A approximate low-partor-adder (LOA) adder is utilized as an example and analyzed in the paper. The evaluation results show that with the increase of the number of approximation modules, the approximate LOA adder is more possible to be inserted hardware Trojans than the exact LOA adder.

Research paper thumbnail of Dynamically Configurable Physical Unclonable Function based on RRAM Crossbar

2021 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH), 2021

Physical unclonable function (PUF) has been an effective solution for hardware security with the ... more Physical unclonable function (PUF) has been an effective solution for hardware security with the popularity of the internet of things (IoT). Due to low power consumption and high area efficiency, an emerging nonvolatile memory, resistive random access memory (RRAM) based PUF designs have attracted many attentions. Due to the bottleneck in the existing RRAM PUFs that it can not be fully compatible with the memory architecture, a dynamically configurable PUF based on the mainstream RRAM crossbar is proposed in this paper. Utilizing the device-to-device variation of the RRAM resistance, abundant challenge-response pairs (CRPs) are generated with a flexible configuration of an RRAM crossbar. Furthermore, different from the existing RRAM-based PUF designs, the proposed RRAM PUF can be dynamically configured between a memory cell and a PUF cell, without requiring additional sense circuits, leading to a minimal design overhead. The simulation results show that the proposed PUF exhibits good performance with a high uniqueness and reliability. Moreover, it achieves a great resistance against machine learning (ML) attack.

Research paper thumbnail of A Hardware/Software Co-design Method for Approximate Semi-Supervised K-Means Clustering

2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2018

As one of the most promising energy-efficient emerging paradigms for designing digital systems, a... more As one of the most promising energy-efficient emerging paradigms for designing digital systems, approximate computing has attracted a significant attention in recent years. Applications utilizing approximate computing can tolerate some loss of quality in the computed results for attaining high performance. Approximate arithmetic circuits have been extensively studied; however, their application at system level has not been extensively pursued. Furthermore, when approximate arithmetic circuits are applied at system level, error-accumulation effects and a convergence problem may occur in computation. Semi-supervised learning can improve accuracy and performance by using unlabeled examples. In this paper, a hardware/software co-design method for approximate semi-supervised k-means clustering is proposed. It makes use of feature constraints to guide the approximate computation at various accuracy levels in each iteration of the learning process. Compared with a baseline design, the proposed method reduces the power-delay product by over 67% while only a small loss of accuracy is introduced. A case study of image segmentation validates the effectiveness of the proposed method.

Research paper thumbnail of Deformation Prediction of Excavated Slopes with a Neural Network Model Based on Nonlinear Numerical Analyses

Proceedings of GeoShanghai 2018 International Conference: Advances in Soil Dynamics and Foundation Engineering, 2018

The control and predication of deformations of excavated slopes is one of the most important prob... more The control and predication of deformations of excavated slopes is one of the most important problems in foundation engineering, but the evaluation of the deformation of excavated slopes has been lack of adequate methods, to which special attentions need to pay for the sack of protection of engineering environment. In order to make prediction of deformations of excavated slopes, an artificial neural network model was set up based on nonlinear finite element analyses of excavated slopes. Firstly, the pattern of the deformation of excavated slopes is generalized through a large amount of finite element analyses. Secondly, a practical and fast algorithm for predicting the deformations of excavated slopes-a neural network predication model based on numerical analysis was set up. The neural network for predicting deformations of excavated soil slopes contains four layers of neural elements, i.e. the input layer, the first and second hidden layers and the output layer. Totally 70 sets of data from finite element analyses were used for training the network, while deformation prediction were conducted with other 20 sets of dada, which given a good accuracy with errors within 10% for practical applications. The result of predictions with the neural network model demonstrates that the combination of the numerical methods and neural networks is a feasible way of deformation predication.

Research paper thumbnail of Lightweight Configurable Ring Oscillator PUF Based on RRAM/CMOS Hybrid Circuits

IEEE Open Journal of Nanotechnology, 2020

Physical unclonable function (PUF) is a lightweight security primitive for energy constrained dig... more Physical unclonable function (PUF) is a lightweight security primitive for energy constrained digital systems. As an enhanced design of conventional ring oscillator (RO) PUFs, configurable ring oscillator (CRO) PUFs improve the uniqueness and reliability compared with the conventional RO PUF designs. In typical CRO PUF designs, multiplexers (MUXs) are utilized as configurable components. In this paper, a hybrid nano-scale CRO (hn-CRO) PUF is proposed. The configurable components of the proposed hn-CRO PUF are implemented by RRAMs. The delay elements are based on CMOS inverters. Compared with traditional CRO PUF designs, the proposed hn-CRO PUF is cost-efficient in terms of circuit density and gate per challenge response pair (CRP) bit. To validate the proposed hn-CRO PUF, the Monte Carlo simulation results of a compact RRAM model under UMC 65 nm technology are presented. The results show that the proposed hn-CRO PUF has a good uniqueness and low hardware consumption compared with the previous works.

Research paper thumbnail of A Dynamically Configurable PUF and Dynamic Matching Authentication Protocol

IEEE Transactions on Emerging Topics in Computing, 2021

A physical unclonable function (PUF) is a hardware security primitive, which can be used secure v... more A physical unclonable function (PUF) is a hardware security primitive, which can be used secure various hardware-based applications. As a type of PUFs, strong PUFs have a large number of challenge-response pairs (CRPs), which can be used for authentication. At present, most strong PUF structures follow a one-to-one input/output relationship, i.e. linear function. As such, strong PUF designs are vulnerable to machine learning (ML) based modeling attacks. To address the issue, a dynamically configurable PUF structure is proposed in this paper. A mathematical model of the proposed dynamic PUF is presented and the design is proposed against the effective ML based attacks, such as deep neural network (DNN), logistic regression (LR) and reliability-based covariance matrix adaptation evolution strategies (CMA-ES). Experimental results on field programmable gate arrays (FPGAs) show that the proposed dynamic structure has achived good uniqueness and reliability. It is also shown that the dynamic PUF has a strong resistance to the CMA-ES attack. Due to the dynamic nature of the proposed PUF structure, an authentication protocol is also designed to generate recognizable authentication bits string. The protocol shows strong resistance to classical machine learning attacks including the new variant of CMA-ES.

Research paper thumbnail of Attacking Arbiter PUFs Using Various Modeling Attack Algorithms: A Comparative Study

2018 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), 2018

Physical Unclonable Function (PUF) is becoming popular in the era of the internet of things (IoT)... more Physical Unclonable Function (PUF) is becoming popular in the era of the internet of things (IoT) due to its lightweight implementation and unique feature of physically unclonable capability. However, it has been shown that PUF can be venerable to modeling attacks using machine learning based algorithms. For example, logic regression (LR) is used as an effective attack method to break Arbiter PUF (APUF) design. In this paper, we investigate the effectiveness of three different machine learning algorithms, including LR, Naï ve Bayes, and AdaBoost, on attacking APUF design. A comparison of experimental results between theses algorithms is presented. The results show that the performance of the algorithms is related to the number of training data, the noise level involved in the APUF design and the number of stages in the generation of each bit response. It is found that the performance of LR is worse for a small number of data compared to the Naï ve Bayes and AdaBoost algorithms.

Research paper thumbnail of A buck-type isolated AC/DC converter for battery application

2016 IEEE 2nd Annual Southern Power Electronics Conference (SPEC), 2016

This paper introduces a bidirectional isolated AC/DC converter for storage battery application wi... more This paper introduces a bidirectional isolated AC/DC converter for storage battery application with current space vector pulse width modulation (SVPWM) method. The converter characterizes high power factor, wide output voltage range, bidirectional power flow and galvanic isolation. Resonant reset is used to demagnetize the transformers for simplicity. The operation principle of the converter is presented together with simulation results in PLECS and experimental results based on a 2kW prototype.

Research paper thumbnail of Design of Dynamic Range Approximate Logarithmic Multipliers

Proceedings of the 2018 on Great Lakes Symposium on VLSI, 2018

Approximate computing is an emerging approach for designing high performance and low power arithm... more Approximate computing is an emerging approach for designing high performance and low power arithmetic circuits. The logarithmic multiplier (LM) converts multiplication into addition and has inherent approximate characteristics. A method combining the Mitchell's approximation and a dynamic range operand truncation scheme is proposed in this paper to design non-iterative and iterative approximate LMs. The accuracy and the circuit requirements of these designs are assessed to select the best approximate scheme according to different metrics. Compared with conventional non-iterative and iterative 16-bit LMs with exact operands, the normalized mean error distance (NMED) of the best proposed approximate non-iterative and iterative LMs is decreased up to 24.1% and 18.5%, respectively, while the power-delay product (PDP) is decreased up to 51.7% and 45.3%, respectively. Case studies for two error-tolerant applications show the validity of the proposed approximate LMs.

Research paper thumbnail of Horizontal Correlation Analysis without Precise Location on Schoolbook Polynomial Multiplication of Lattice-based Cryptosystem

2022 IEEE International Symposium on Circuits and Systems (ISCAS)

Research paper thumbnail of A Novel Combined Correlation Power Analysis (CPA) Attack on Schoolbook Polynomial Multiplication in Lattice-based Cryptosystems

2022 IEEE 35th International System-on-Chip Conference (SOCC)

Research paper thumbnail of TCAD Simulation of Single Event Transient in Si Bulk MOSFET at Cryogenic Temperature

IEEE Access

In this paper, the functional relationship between temperature and single event transient current... more In this paper, the functional relationship between temperature and single event transient currents caused by heavy-ion striking using TCAD simulation is investigated from 77K to 300 K on 65nm Si bulk n MOSFET. TCAD simulation shows that temperature has a significant influence on the trends of heavyions-induced current. The peak value of drain current and collected charge in MOSFET reach the maximum value at 200 K, while the SET fall time decreases monotonously as temperature decrease from 300 K to 77 K at four LET values. The extracted analysis found that the enhancement of bipolar amplification effect is the main reason for the increase of collected charge and broader pulse width at high temperatures. The above conclusions provide a theoretical basis for the wide application of 65nm devices in 77 K-300 K space environment. INDEX TERMS Heavy ion, single event transient, cryogenic temperatures, TCAD simulation.

Research paper thumbnail of Dynamic Reconfigurable PUFs Based on FPGA

2019 IEEE International Workshop on Signal Processing Systems (SiPS)

Physical unclonable function (PUF) is a promising security primitive. Configurable ring oscillato... more Physical unclonable function (PUF) is a promising security primitive. Configurable ring oscillator (CRO) PUF is an evolvement of conventional RO PUF, which improves the entropy and decrease the hardware cost by introducing configurability. Compared with other types of PUF structures, CRO PUFs are FPGA friendly. In this paper, a dynamic reconfigurable mechanism is proposed for the CRO PUF in FPGA implementation. Three different CRO PUFs are implemented using the proposed reconfigurable method and each CRO can be implemented in a single configurable logic block (CLB) of FPGA. Based on the partial reconFigure functions provided by Xilinx FPGAs, the PUF structures can be configured to any of the three PUF structures. The experimental results show that the dynamic reconfigurable PUF structure has a higher hardware efficiency, reliability and stability compared with the previous works.

Research paper thumbnail of Theoretical Analysis of Configurable RO PUFs and Strategies to Enhance Security

2019 IEEE International Workshop on Signal Processing Systems (SiPS)

Compared to traditional ring oscillator PUF (RO PUF), configurable RO PUF (CRO PUF) greatly incre... more Compared to traditional ring oscillator PUF (RO PUF), configurable RO PUF (CRO PUF) greatly increases the number of challenge response pairs (CRPs) and improves hardware utilization. However, in the conventional CRO PUF design, when a path is selected by the challenge to generate a response, the circuit characteristic information constituting the CRO PUF, such as the delay information of the configurable unit, the transmission model, and etc., can also be leaked. Once the adversary monitors and masters this information, they can use this information to attack the CRO PUF circuits, such as modeling attacks. This paper establishes a theoretical model of CRO PUF and analyzes its unpredictability and security. Based on this model, a new mechanism to generate the proper challenges is proposed in this paper. In the proposed mechanism, the challenge is generated and utilized by a specific way, which can delay the feature leakage of the CRO PUF, thereby improving the security of the CRO PUF.

Research paper thumbnail of Theoretical Analysis of Delay-Based PUFs and Design Strategies for Improvement

2019 IEEE International Symposium on Circuits and Systems (ISCAS)

Delay-based physical unclonable function (PUF) designs use the random delay differences in circui... more Delay-based physical unclonable function (PUF) designs use the random delay differences in circuit transmission to extract response. In the existing PUF designs, there are few studies on investigating the link between process variation and PUF performance. The experimental data can reflect the performance of the new design to a certain extent, but lack of theoretical analysis to provide thorough information. In this paper, a theoretical model for delay-based PUF designs is proposed. An analysis of the delay-based PUF improvements by existing design strategies is also investigated. Moreover, a guidance to develop and improve future delay-based PUF designs using the proposed theoretical model is also given in this paper.

Research paper thumbnail of Radiation and Annealing Effects on GaN MOSFETs Irradiated by 1 MeV Electrons

Electronics

In this paper, the 650 V N-channel GaN MOSFETs are chosen as the research object to study the rad... more In this paper, the 650 V N-channel GaN MOSFETs are chosen as the research object to study the radiation and annealing effects under 1 MeV electron irradiation. The output, transfer, and breakdown characteristics are measured before and after electron irradiation. The experimental results show the variation of the I-V curves after irradiation, which is related to the increased conductivity due to the generation of an oxide charge in the GaN MOSFETs. However, the gradual formation of the interface trapped charge offsets the effect of the oxide charge, which decreases the conductivity of the GaN MOSFETs and the drain-source current. The long-term annealing at room temperature degrades the interface trapped charges, leading to the restoration of the I-V characteristics. After room temperature annealing, the breakdown voltage is still higher than the unirradiated level, and this is because the displacement defects caused by electron irradiation cannot be recovered at room temperature.

Research paper thumbnail of Design of Approximate FFT with Bit-width Selection Algorithms

2018 IEEE International Symposium on Circuits and Systems (ISCAS), 2018

This paper presents the approximate designs of Fast Fourier Transformation (FFT) circuit. The tra... more This paper presents the approximate designs of Fast Fourier Transformation (FFT) circuit. The tradeoff between accuracy and hardware performance is achieved by using bit-width selection for each stage. The error rate can be tuned with bit-width selection. We proposed two algorithms for bit-width selection under certain error restriction. The first algorithm is targeting an approximate FFT design with low hardware cost. While the second algorithm is proposed to achieve high performance. Both of proposed algorithms allow the designer to tradeoff hardware performance and computation accuracy in each stage. The proposed two designs are implemented on FPGA. The results show that the approximate FFT design using the first algorithm can reduce hardware resource consumption up to 30.2%. The second algorithm can increases the performance of the approximate FFT deisgn up to 24.0%, while it also saves 25.2% resource consumption.

Research paper thumbnail of An Energy Efficient Accelerator for Bidirectional Recurrent Neural Networks (BiRNNs) Using Hybrid-Iterative Compression With Error Sensitivity

IEEE Transactions on Circuits and Systems I: Regular Papers, 2021

Recurrent Neural Networks (RNNs) have been widely used in many sequential applications, such as m... more Recurrent Neural Networks (RNNs) have been widely used in many sequential applications, such as machine translation, speech recognition and sentiment analysis. Long Term Short Term Memory (LSTM) and Gated Recurrent Unit (GRU) are widely used variants of RNN due to their effectiveness in overcoming gradient vanishing and exploding problems; however, compared to conventional RNN, their massive storage and computation requirements hinder their application. In addition, the recurrent structure of RNNs makes them prone to accumulate errors, resulting in a severe loss of accuracy. In this work, we propose a hybrid-iterative compression (HIC) algorithm for LSTM/GRU. By exploiting the error sensitivity of RNN, the gating units are divided into error-sensitive and error-insensitive groups, that are compressed using different algorithms. By using this approach, a 37.1times/32.3times37.1\\times /32.3\\times 37.1times/32.3times compression ratio is achieved with negligible accuracy loss for LSTM/GRU. Further, an energy efficient accelerator for bidirectional RNNs is proposed. In this accelerator, the data flow of the matrix operation unit based on the block structure matrix (MOU-S) is improved through rearranging weights; the utilization of BRAM is improved through a fine-grained parallelism configuration of matrix-vector multiplications (MVMs). Meanwhile, the timing matching strategy alleviates the load-imbalance problem between MOU-S and the matrix operation unit based on top- kkk pruning (MOU-P). When running at 200MHz on Xilinx ADM-PCIE-7V3 FPGA, the proposed design achieves an improvement in energy efficiency in a range of 5%-237% for LSTM networks, and an improvement of 58% for GRU networks compared with state-of-the-art designs.

Research paper thumbnail of Ultra High-Speed Polynomial Multiplications for Lattice-Based Cryptography on FPGAs

IEEE Transactions on Emerging Topics in Computing, 2022

Research paper thumbnail of Design and Evaluation of a Power-Efficient Approximate Systolic Array Architecture for Matrix Multiplication

2019 IEEE International Workshop on Signal Processing Systems (SiPS), 2019

Matrix multiplication (MM) is a basic operation for many Digital Signal Processing applications. ... more Matrix multiplication (MM) is a basic operation for many Digital Signal Processing applications. A Systolic Array (SA) is often considered as one of the most favorable architecture to achieve high performance for matrix multiplication. In this paper, the design exploration for an approximate SA is pursued; three design schemes are proposed by introducing approximation in multiple sub-modules. An approximation factor alpha\alphaalpha is introduced; it is related to the inexact columns in the SA to explore the accuracy-efficiency trade-off present in the proposed designs. In the evaluation, an 8-bit input operand matrix multiplication is considered; the Synopsys Design Compiler at 45nm technology node is used to establish hardware-related metrics. The Error Rate (ER), Normalized Mean Error Distance (NMED) and Mean Relative Error Distance (MRED) are used as figures of merit for error analysis. Results show that the proposed architecture for 8-bit matrix multiplication with an approximation factor alpha=7\alpha=7alpha=7 has the lower power consumption compared to existing inexact designs found in the technical literature with comparable NMED. In addition, a power delay product vs NMED analysis shows the proposed designs have a lower PDP so applicable to low power applications. The practicality of the proposed architecture is established by computing the Discrete Cosine Transform.

Research paper thumbnail of Security Analysis of Hardware Trojans on Approximate Circuits

Proceedings of the 2020 on Great Lakes Symposium on VLSI, 2020

Approximate computing, for error-tolerant applications, provides trade-offs for computations to a... more Approximate computing, for error-tolerant applications, provides trade-offs for computations to achieve improved speed and power performance. Approximate circuits, in particular approximate arithmetic circuits, directly affect the performance of a computing system. Hence, approximate circuit designs have been extensively studied. However, security issues of approximate circuits have been ignored. Moreover, hardware Trojans have been found in fabricated chips in manufacturing industry chains by untrusted foundries. Hardware Trojans could affect the functionality of approximate circuits under very rare circumstances with inconsiderable footprints. In this paper, hardware Trojan insertion methods based on signal transition probability are utilized to investigate and evaluate the security threats in approximate circuits. A approximate low-partor-adder (LOA) adder is utilized as an example and analyzed in the paper. The evaluation results show that with the increase of the number of approximation modules, the approximate LOA adder is more possible to be inserted hardware Trojans than the exact LOA adder.

Research paper thumbnail of Dynamically Configurable Physical Unclonable Function based on RRAM Crossbar

2021 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH), 2021

Physical unclonable function (PUF) has been an effective solution for hardware security with the ... more Physical unclonable function (PUF) has been an effective solution for hardware security with the popularity of the internet of things (IoT). Due to low power consumption and high area efficiency, an emerging nonvolatile memory, resistive random access memory (RRAM) based PUF designs have attracted many attentions. Due to the bottleneck in the existing RRAM PUFs that it can not be fully compatible with the memory architecture, a dynamically configurable PUF based on the mainstream RRAM crossbar is proposed in this paper. Utilizing the device-to-device variation of the RRAM resistance, abundant challenge-response pairs (CRPs) are generated with a flexible configuration of an RRAM crossbar. Furthermore, different from the existing RRAM-based PUF designs, the proposed RRAM PUF can be dynamically configured between a memory cell and a PUF cell, without requiring additional sense circuits, leading to a minimal design overhead. The simulation results show that the proposed PUF exhibits good performance with a high uniqueness and reliability. Moreover, it achieves a great resistance against machine learning (ML) attack.

Research paper thumbnail of A Hardware/Software Co-design Method for Approximate Semi-Supervised K-Means Clustering

2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2018

As one of the most promising energy-efficient emerging paradigms for designing digital systems, a... more As one of the most promising energy-efficient emerging paradigms for designing digital systems, approximate computing has attracted a significant attention in recent years. Applications utilizing approximate computing can tolerate some loss of quality in the computed results for attaining high performance. Approximate arithmetic circuits have been extensively studied; however, their application at system level has not been extensively pursued. Furthermore, when approximate arithmetic circuits are applied at system level, error-accumulation effects and a convergence problem may occur in computation. Semi-supervised learning can improve accuracy and performance by using unlabeled examples. In this paper, a hardware/software co-design method for approximate semi-supervised k-means clustering is proposed. It makes use of feature constraints to guide the approximate computation at various accuracy levels in each iteration of the learning process. Compared with a baseline design, the proposed method reduces the power-delay product by over 67% while only a small loss of accuracy is introduced. A case study of image segmentation validates the effectiveness of the proposed method.

Research paper thumbnail of Deformation Prediction of Excavated Slopes with a Neural Network Model Based on Nonlinear Numerical Analyses

Proceedings of GeoShanghai 2018 International Conference: Advances in Soil Dynamics and Foundation Engineering, 2018

The control and predication of deformations of excavated slopes is one of the most important prob... more The control and predication of deformations of excavated slopes is one of the most important problems in foundation engineering, but the evaluation of the deformation of excavated slopes has been lack of adequate methods, to which special attentions need to pay for the sack of protection of engineering environment. In order to make prediction of deformations of excavated slopes, an artificial neural network model was set up based on nonlinear finite element analyses of excavated slopes. Firstly, the pattern of the deformation of excavated slopes is generalized through a large amount of finite element analyses. Secondly, a practical and fast algorithm for predicting the deformations of excavated slopes-a neural network predication model based on numerical analysis was set up. The neural network for predicting deformations of excavated soil slopes contains four layers of neural elements, i.e. the input layer, the first and second hidden layers and the output layer. Totally 70 sets of data from finite element analyses were used for training the network, while deformation prediction were conducted with other 20 sets of dada, which given a good accuracy with errors within 10% for practical applications. The result of predictions with the neural network model demonstrates that the combination of the numerical methods and neural networks is a feasible way of deformation predication.

Research paper thumbnail of Lightweight Configurable Ring Oscillator PUF Based on RRAM/CMOS Hybrid Circuits

IEEE Open Journal of Nanotechnology, 2020

Physical unclonable function (PUF) is a lightweight security primitive for energy constrained dig... more Physical unclonable function (PUF) is a lightweight security primitive for energy constrained digital systems. As an enhanced design of conventional ring oscillator (RO) PUFs, configurable ring oscillator (CRO) PUFs improve the uniqueness and reliability compared with the conventional RO PUF designs. In typical CRO PUF designs, multiplexers (MUXs) are utilized as configurable components. In this paper, a hybrid nano-scale CRO (hn-CRO) PUF is proposed. The configurable components of the proposed hn-CRO PUF are implemented by RRAMs. The delay elements are based on CMOS inverters. Compared with traditional CRO PUF designs, the proposed hn-CRO PUF is cost-efficient in terms of circuit density and gate per challenge response pair (CRP) bit. To validate the proposed hn-CRO PUF, the Monte Carlo simulation results of a compact RRAM model under UMC 65 nm technology are presented. The results show that the proposed hn-CRO PUF has a good uniqueness and low hardware consumption compared with the previous works.

Research paper thumbnail of A Dynamically Configurable PUF and Dynamic Matching Authentication Protocol

IEEE Transactions on Emerging Topics in Computing, 2021

A physical unclonable function (PUF) is a hardware security primitive, which can be used secure v... more A physical unclonable function (PUF) is a hardware security primitive, which can be used secure various hardware-based applications. As a type of PUFs, strong PUFs have a large number of challenge-response pairs (CRPs), which can be used for authentication. At present, most strong PUF structures follow a one-to-one input/output relationship, i.e. linear function. As such, strong PUF designs are vulnerable to machine learning (ML) based modeling attacks. To address the issue, a dynamically configurable PUF structure is proposed in this paper. A mathematical model of the proposed dynamic PUF is presented and the design is proposed against the effective ML based attacks, such as deep neural network (DNN), logistic regression (LR) and reliability-based covariance matrix adaptation evolution strategies (CMA-ES). Experimental results on field programmable gate arrays (FPGAs) show that the proposed dynamic structure has achived good uniqueness and reliability. It is also shown that the dynamic PUF has a strong resistance to the CMA-ES attack. Due to the dynamic nature of the proposed PUF structure, an authentication protocol is also designed to generate recognizable authentication bits string. The protocol shows strong resistance to classical machine learning attacks including the new variant of CMA-ES.

Research paper thumbnail of Attacking Arbiter PUFs Using Various Modeling Attack Algorithms: A Comparative Study

2018 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), 2018

Physical Unclonable Function (PUF) is becoming popular in the era of the internet of things (IoT)... more Physical Unclonable Function (PUF) is becoming popular in the era of the internet of things (IoT) due to its lightweight implementation and unique feature of physically unclonable capability. However, it has been shown that PUF can be venerable to modeling attacks using machine learning based algorithms. For example, logic regression (LR) is used as an effective attack method to break Arbiter PUF (APUF) design. In this paper, we investigate the effectiveness of three different machine learning algorithms, including LR, Naï ve Bayes, and AdaBoost, on attacking APUF design. A comparison of experimental results between theses algorithms is presented. The results show that the performance of the algorithms is related to the number of training data, the noise level involved in the APUF design and the number of stages in the generation of each bit response. It is found that the performance of LR is worse for a small number of data compared to the Naï ve Bayes and AdaBoost algorithms.

Research paper thumbnail of A buck-type isolated AC/DC converter for battery application

2016 IEEE 2nd Annual Southern Power Electronics Conference (SPEC), 2016

This paper introduces a bidirectional isolated AC/DC converter for storage battery application wi... more This paper introduces a bidirectional isolated AC/DC converter for storage battery application with current space vector pulse width modulation (SVPWM) method. The converter characterizes high power factor, wide output voltage range, bidirectional power flow and galvanic isolation. Resonant reset is used to demagnetize the transformers for simplicity. The operation principle of the converter is presented together with simulation results in PLECS and experimental results based on a 2kW prototype.

Research paper thumbnail of Design of Dynamic Range Approximate Logarithmic Multipliers

Proceedings of the 2018 on Great Lakes Symposium on VLSI, 2018

Approximate computing is an emerging approach for designing high performance and low power arithm... more Approximate computing is an emerging approach for designing high performance and low power arithmetic circuits. The logarithmic multiplier (LM) converts multiplication into addition and has inherent approximate characteristics. A method combining the Mitchell's approximation and a dynamic range operand truncation scheme is proposed in this paper to design non-iterative and iterative approximate LMs. The accuracy and the circuit requirements of these designs are assessed to select the best approximate scheme according to different metrics. Compared with conventional non-iterative and iterative 16-bit LMs with exact operands, the normalized mean error distance (NMED) of the best proposed approximate non-iterative and iterative LMs is decreased up to 24.1% and 18.5%, respectively, while the power-delay product (PDP) is decreased up to 51.7% and 45.3%, respectively. Case studies for two error-tolerant applications show the validity of the proposed approximate LMs.