Kasem Khalil | University of Louisiana at Lafayette (original) (raw)

Papers by Kasem Khalil

Embryonic Hardware (EmHW) is commonly used in several domains due to its ability to self-heal and... more Embryonic Hardware (EmHW) is commonly used in several domains due to its ability to self-heal and reconfigure itself as needed. The main challenges of the current EmHW method are high average delay and low throughput. This paper proposes an efficient EmHW approach based on Network-on-Chip (NoC) to improve system performance. In the proposed method, EmHW consists of multiple cells, and the proposed cell has better performance than the traditional EmHW cell. The NoC is used to provide flexible communication between the cells. The proposed method is implemented and tested using VHDL on Altera Arria 10 GX FPGA. The proposed method improves the throughput by up to 92% and decreases average delay by up to 80% at a small area overhead. The method is tested using multiple traffic patterns, and the results show the effectiveness and viability of the proposed method for both delay and throughput.

Bookmarks Related papers MentionsView impact

Through silicon via (TSV) based 3D integrated circuits (ICs) have become a popular approach to re... more Through silicon via (TSV) based 3D integrated circuits (ICs) have become a popular approach to revive Moore's law. However, the reliability of a TSV is an important issue, as a faulty TSV can result in the failure of the entire 3D IC. Most of TSV faults can be detected during the testing process, however, detecting TSV aging faults during the testing process is impossible. Certain mechanisms are required to be deployed to control the reliability of the chip in the presence of aging faults. In this paper we propose some solutions to repair the TSVs suffering from moderate types of delay fault without re-routing through a spare TSV while meeting the specified constraints of the design. Our experimental results indicate the efficiency of our proposed methods in reducing the adverse effects of an aged TSV in terms of delay reduction.

Bookmarks Related papers MentionsView impact

Contour detection of an object is a fundamental computer vision problem in image processing domai... more Contour detection of an object is a fundamental computer vision problem in image processing domain. The goal is to find a concrete boundary for pixel ownership between an OOI (object-of-interest) and its corresponding background. However, contour extraction from low SN SEM images is a very challenging problem as different sources of noise shadow the estimation of underlying structural geometries. As device scaling continues to 3nm node and below, the extraction of accurate CD contour geometries from SEM images especially ADI (after developed inspection) is of utmost importance for a qualitative lithographic process as well as to verify device characterization in aggressive pitches. In this paper, we have applied a U-Net architecture based unsupervised machine learning approach for de-noising CD-SEM images. Unlike other discriminative deep-learning based de-noising approaches, the proposed method does not require any ground-truth as clean/noiseless images or synthetic noiseless images for training. Simultaneously, we have also attempted to demonstrate how de-noising is helping to improve the contour detection accuracy. We have analyzed and validated our result by using a programmable tool (SEMSuiteTM) for contour extraction. We have de-noised SEM images with categorically different geometrical patterns such as L/S (line-space), T2T (tip-to-tip), pillars with different scan types etc. and extracted the contours in both noisy and de-noised images. The comparative analysis demonstrates that de-noised images have higher confidence contour metric than their noisy twins while keeping the same parameter settings for both data input. When the ML algorithm is applied, the contour extraction results would have higher confidence numbers comparing with the ones only applied the conventional Gaussian or Median blur de-noise method. The final goal of this work is to establish a robust de-noising method to reduce the dependency of SEM image acquisition settings and provide more accurate metrology data for OPC calibration.

Bookmarks Related papers MentionsView impact

Convolutional-Neural-Network (CNN) is a deep learning model, which is used extensively to solve c... more Convolutional-Neural-Network (CNN) is a deep learning model, which is used extensively to solve complex image classification or computer vision problems. CNN and more complex architecture variants of it such as vggX, GoogleNet, ImageNet, etc. are widely used in various application domains such as object detection, self-driving cars, instance segmentation, Optical Character Recognition (OCR), surveillance and security systems, etc. However, operations involved under CNN are both computationally as well as memory extensive which further leads to high computational cost, area overhead, and excessive power dissipation against higher accuracy compatible architectures discussed above. In this paper, we have proposed a novel design of fully reversible-logic-based CNN architecture in the context of low-power VLSI (Very-Large-Scale-Integration) circuit synthesis. Ideally, reversible logic operations are lossless due to no information-loss mechanism, which results in Zero-heat dissipation. The proposed architecture has been implemented using VHDL on Altera Arria10 GX FPGA. The comparative analysis demonstrates that the proposed approach has achieved an approximately 19.24% decrease in overall power dissipation compared to the conventional classical approach. The proposed approach also has better scalability than the classical design approach.

Bookmarks Related papers MentionsView impact

Spectrum awareness (SA) stretches the performance bounds of spectrum-sensing-based dynamic spectr... more Spectrum awareness (SA) stretches the performance bounds of spectrum-sensing-based dynamic spectrum access by intelligently exploiting the big spectrum data (BSD) generated by a network. Hence, in order to analyze the performance and scalability of large scale cognitive radio networks (CRNs), spectrum awareness capacity would take preeminence over spectrum sensing capacity. Although, conventional methods use techniques such as the receiver operating characteristic (ROC) curve and the root mean square error (RMSE) technique to quantify the performance of CRN SA, they do not consider the impact of BSD velocity, variety and volume. Therefore, this research work proposes a novel knowledge-centric method for quantifying, analyzing and comparing the performance and scalability of CRNs based on spectrum awareness. The proposed method considers key performance indices including reliability, computational complexity and latency of the network parameters that are generated by spectrum data acquisition, conversion and dissemination. The steps and applicability of the proposed method in user and network-level performance measurement are also analyzed.

Bookmarks Related papers MentionsView impact

Object detection is a fundamental process in traffic management systems and self-driving cars. De... more Object detection is a fundamental process in traffic management systems and self-driving cars. Deformable part model (DPM) is a popular and competitive detector for its high precision. This paper presents a programmable, low power hardware implementation of DPM based object detection for real-time applications. Our approach employs a very fast object detection pipeline with complementary techniques such as fast feature pyramid, Fast Fourier Transform (FFT) and early classification to accelerate DPM with a reasonable accuracy loss and achieves a speed-up of 50x and 6x over original DPM and cascade DPM respectively on single core CPU. The hardware circuit uses 65nm CMOS technology and consumes only 36.5mW (0.81 nJ/pixel) based on the post-layout simulation. The ASIC has an area of 3362 kgates and 295.5 KB on-chip memory and the design utilizes two simultaneous engines to process two independent object categories with 8 deformable parts per category.

Bookmarks Related papers MentionsView impact

Hardware-based machine learning is becoming increasingly popular due to its high speed of computa... more Hardware-based machine learning is becoming increasingly popular due to its high speed of computation. One of the desired characteristics of such hardware is reduced hardware and design costs. This paper proposes a design approach for a neural network to reduce the cost of hardware in terms of adders and multipliers. Adders and multipliers are parts of the main components in the neural network, and they are used in each node in the network. The proposed approach reduces the number of multipliers and adders in the network by half, which reduces the cost. The proposed technique is based on sharing multiplier and adder between two hidden layers. The method has been tested and validated using multiple datasets. The accuracy of the proposed approach is similar to the traditional methods in the literature, while the proposed approach utilizes only half the number of multipliers and adders. The proposed design is implemented using VHDL and Altera Arria 10 GX FPGA. The simulation result shows the proposed method retains the performance of the network with a 63% reduction in the hardware design with acceptable accuracy.

Bookmarks Related papers MentionsView impact

Metrology, Inspection, and Process Control for Semiconductor Manufacturing XXXV, Feb 22, 2021

CD-SEM images inherently contain a significant level of noise. This is because a limited number o... more CD-SEM images inherently contain a significant level of noise. This is because a limited number of frames are used for averaging, which is critical to ensure throughput and minimize resist shrinkage. This noise level of SEM images may lead to false defect detections and erroneous metrology. Therefore, reducing noise in SEM images is of utmost importance. Both conventional noise filtering techniques and recent most discriminative deep-learning based denoising algorithms are restricted with certain limitations. The first enables the risk of loss of information content and the later mostly requires clean ground-truth or synthetic images to train with. In this paper, we have proposed an U-Net architecture based unsupervised machine learning approach for denoising CD-SEM images without the requirement of any such ground-truth or synthetic images in true sense. Also, we have analysed and validated our result using MetroLER, v2.2.5.0. library. We have compared the power spectral density (PSD) of both the original noisy and denoised images. The high frequency component related to noise is clearly affected, as expected, while the low frequency component, related to the actual morphology of the feature, is unaltered. This indicate that the information content of the denoised images was not degraded by the proposed denoising approach in comparison to other existing approaches.

Bookmarks Related papers MentionsView impact

2021 28th IEEE International Conference on Electronics, Circuits, and Systems (ICECS), Nov 28, 2021

Neural networks are being used in several domains and applications. One of the main challenges is... more Neural networks are being used in several domains and applications. One of the main challenges is the hardware implementation of neural networks. The hardware design is fixed in the number of nodes and layers that make the network applicable to specific applications. This paper presents a reconfigurable neural network where the number of layers and nodes can be changed according to the applications. The proposed method is based on the Network on Chip (NoC) configuration, which is used for routing data between different layers and nodes. Each router in NoC is connected to m nodes that can represent a part or complete layer. According to the reconfiguration, the number of routers can be selected to present each layer, and the number of needed nodes per layer is decided by the number of nodes in each router. The proposed method is implemented on FPGA Altera 10 GX, and it achieves an accuracy of 97% for using the MNIST dataset. The throughput and delay of the proposed method are more efficient compared to the traditional method.

Bookmarks Related papers MentionsView impact

With the advancement of advanced node technology beyond sub-10 nm nodes, high-performance computi... more With the advancement of advanced node technology beyond sub-10 nm nodes, high-performance computing is facing a great challenge in the form of excessive levels of heat. Against this limitation, we can re-synthesis any complex digital circuits using reversible logic only, known for ideally Zero-heat dissipation. This paper proposes a novel reversible logic based on Configurable Fault-Tolerant Embryonic Hardware. We have reinvestigated the concept of Self-healing for hardware systems in the context of reversible logic and circuits. This paper presents a comparative analysis between conventional and proposed quantum approach on various parameters such as area-overhead, power dissipation and quantum cost along with the limitations of conventional computing. The reliability of the proposed approach is analyzed against other existing classical approaches with different failure rates. The overall power dissipation is almost 19% lower for the proposed approach compared to other conventional approaches using digital gates with cell number 32. The proposed approach is implemented for the ALU array using VHDL on Altera 10 GX FPGA.

Bookmarks Related papers MentionsView impact

Neural networks have been commonly used in learning applications. Implementing a neural network o... more Neural networks have been commonly used in learning applications. Implementing a neural network on hardware is a complex and challenging task for hardware designers as many hyperparameters and trade-offs need to be considered. This paper presents a reconfigurable feed-forward neural network which can be used for different applications. The proposed method has the flexibility to change the node organization to be suitable for an application. The network is divided into two parts: one part has a fixed node in each layer and the second part includes the reconfigurable nodes. The reconfigurable nodes have the ability to switch from one layer to another to speed up the network. The proposed method is compared with the traditional network, and the result shows the proposed method improves the performance of the network. The learning speed is improved by 35% using 100 neurons within a layer. The hardware implementation of the proposed method is presented using VHDL and Altera Arria10 GX FPGA.

Bookmarks Related papers MentionsView impact

Video systems are the core of many IoT systems, and efficient processing is crucial for their ope... more Video systems are the core of many IoT systems, and efficient processing is crucial for their operation. There is little work focusing on the flexibility of current hardware systems when used for long term deployment, particularly for constrained devices such as those used in IoT. This paper shows a unique case study for Video Processing Nodes that adopt deep learning algorithms and dynamically switch the models within a streaming path to investigate the flexibility that can be offered and the limitations within different applications. The video processing node utilizes a framework called FINN that generates FPGA compatible models. The proposed system can switch between different configurations according to their quantization, showing how accuracy and confidence can vary with each option. Inference per second is one of the major benefits when switching between different configurations, where a 1-bit weight and 1-bit activation achieves the highest inference rate for convolutional neural networks and greatly reduces energy consumption.

Bookmarks Related papers MentionsView impact

High-performance computing beyond sub-10 nm advanced node technology allows us to explore and use... more High-performance computing beyond sub-10 nm advanced node technology allows us to explore and use complex 2.5D/3D SOC design architecture. Node scaling, heterogeneous integration, and complex design enable us to think beyond Moore’s law but, at the same time, limit the scope with concerns of excessive power dissipation. The field of quantum computation and reversible logic functions has been researched in recent years in the context of low power VLSI circuit designs and nanotechnology. Reversible computation exhibits significantly reduced power dissipation in digital circuits. In this paper, we propose a novel design of Artificial Neural Network (ANN) using reversible logic gates. A thorough search of the relevant literature yielded only a few related articles. To the best of our knowledge, our proposed approach is the first attempt to implement a complete feedforward neural network circuit using only reversible logic gates. The comparative analysis demonstrates that our proposed approach has achieved an approximately 16% reduction in overall power dissipation compared to existing approaches. The proposed approach also has better scalability than the classical design approach.

Bookmarks Related papers MentionsView impact

Neural network is one of the main concepts used in machine learning applications. The hardware re... more Neural network is one of the main concepts used in machine learning applications. The hardware realization of neural network requires a large area to implement a network with many hidden layers. This paper presents a novel design of a neural network to reduce the hardware area. The proposed approach reduces the number of physical hidden layers from N to N/2 while maintaining full accuracy with a minimal increase in time complexity. The proposed approach adopts the concept of multiplexing input and output layers of the neural network. The approach is implemented based on Tensorflow framework and Xilinx Virtex-7 FPGA. The simulation results show the accuracy of the proposed approach is the same as expected from traditional network, which uses N layers, while using only N/2 hardware layers. The hardware implementation results show the proposed approach saves 42% area.

Bookmarks Related papers MentionsView impact

IEEE Transactions on Biomedical Circuits and Systems, Aug 1, 2020

This paper proposes novel methods for making embryonic bio-inspired hardware efficient against fa... more This paper proposes novel methods for making embryonic bio-inspired hardware efficient against faults through self-healing, fault prediction, and fault-prediction assisted self-healing. The proposed self-healing recovers a faulty embryonic cell through innovative usage of healthy cells. Through experimentations, it is observed that self-healing is effective, but it takes a considerable amount of time for the hardware to recover from a fault that occurs suddenly without forewarning. To get over this problem of delay, novel deep learning-based formulations are proposed for fault predictions. The proposed self-healing technique is then deployed along with the proposed fault prediction methods to gauge the accuracy and delay of embryonic hardware. The proposed fault prediction and self-healing methods have been implemented in VHDL over FPGA. The proposed fault predictions achieve high accuracy with low training time. The accuracy is up to 99.36% with the training time of 2.16 min. The area overhead of the proposed self-healing method is 34%, and the fault recovery percentage is 75%. To the best of our knowledge, this is the first such work in embryonic hardware, and it is expected to open a new frontier in fault-prediction assisted self-healing for embryonic systems.

Bookmarks Related papers MentionsView impact

2021 28th IEEE International Conference on Electronics, Circuits, and Systems (ICECS), Nov 28, 2021

Computer vision is witnessing increased usage of Convolutional-Neural-Network (CNN) deep learning... more Computer vision is witnessing increased usage of Convolutional-Neural-Network (CNN) deep learning architectures. Researchers have experimented with more complex architecture variants of CNN, such as VGGX, GoogleNet, and ImageNet to correlate depth of architectures, especially convolutional blocks with model accuracy. Such architectures are extensively used in complex object classification and computer vision problems such as self-driving cars, surveillance, and security systems. However, the drawbacks are high computational cost, area overhead, and excessive power dissipation since the operations involved under CNN architectures are both computationally and memory extensive. This work proposes a novel design fully reversible-logic based VGGNet architecture for low-power VLSI (Very-Large-Scale-Integration) circuit synthesis. We have implemented two architecture variants of VGGNet, as RL-VGG-16 and RL-VGG-19 using only reversible logic gates and circuits. Ideally, no information can be erased during reversible logic operations. Therefore, reversible circuits generally do not dissipate any heat. The proposed architectures have been implemented using VHDL on Altera Arria10 GX FPGA. The comparative analysis demonstrates that proposed RL-VGG-16 architecture achieves approximately an 18.08% decrease in overall power dissipation compared to the classical VGG-16 architecture. The proposed RL-VGG-19 architecture achieves approximately a 16.48% decrease in overall power dissipation compared to classical VGG-19 architecture. Both proposed approaches also have better scalability than the classical design approaches.

Bookmarks Related papers MentionsView impact

Neural Network is used in many applications and guarding its performance against faults is a rese... more Neural Network is used in many applications and guarding its performance against faults is a research challenge. Self-healing neural network is a promising concept for achieving reliability, which is the ability to detect and fix a fault in the system automatically. Most of the current self-healing neural network are based on replication of hardware nodes which causes significant area overhead. The proposed self-healing approach results in a modest area overhead and it is suitable for complex neural network. The proposed method is based on a shared operation and a spare node in each layer which compensates for any faulty node in the layer. Each faulty node will be compensated by its neighbor node, and the neighbor node performs the faulty node as well as its own operations sequentially. In the case the neighbor is faulty, the spare node will compensate for it. The proposed method is implemented using VHDL and the simulation results are obtained using Altira 10 GX FPGA for a different number of nodes. The area overhead is very small for a complex network. The reliability of the proposed method is studied and compared with the traditional neural network.

Bookmarks Related papers MentionsView impact

Neural networks are increasingly being used in many applications because of their ability to solv... more Neural networks are increasingly being used in many applications because of their ability to solve complex problems. In order to increase the processing speed of neural networks, hardware-based techniques are being actively researched in the literature. However, implementing a neural network using conventional hardware design methods is a complex and challenging task for hardware designers as there are many hyperparameters and trade-offs that need to be examined in depth. This paper presents a novel Neural-Network-on-Chip (N2OC) to provide a hardware implementation of a neural network based on network-on-chip. The proposed approach provides reconfigurability when the number of nodes per layer varies depending on the desired performance and application. The proposed method provides a flexible hardware implementation of a neural network where the number and order of nodes can be controlled. Two datasets have been used for testing the proposed method, and the proposed method has a comparable result with the state-of-the-art. The hardware design is implemented using VHDL and Altera Arria 10 GX FPGA 10AX115N2F45E1SG. Throughput and average delay of the network are studied, and the simulation result shows the design has stable performance. On a problem studied (in handwritten digits classification), the proposed method has an accuracy of 99.24% while the state-of-the-art has an accuracy of 98.17%.

Bookmarks Related papers MentionsView impact

NoCs are a well established research topic and several implementations have been proposed for fau... more NoCs are a well established research topic and several implementations have been proposed for fault tolerance or self-healing. Self-healing multi-core architectures are getting wide attention, which integrates redundant cores to improve the manufacturing yield or aging faults. Self-healing is the ability of a system to detect faults or failures and fix them through healing or repairing. In current research, there are some challenges such as area overhead, power consumption, scalability, and reliability. NoC'$s$ performance will decrease with faulty routers which isolate process element (PE) from other routers. This paper presents an approach of self healing router to make PE connected to the network and using it as a path for routing. The proposed approach is based on adding self-healing block in each router to heal the network and make it works within expected performance. To evaluate the method, the reliability of the proposed technique is studied and compared to conventional system with different failure rates. The proposed method is implemented using VHDL and the simulation results are obtained using ISE Xilinx Vertex 5. The area overhead is 18% for the proposed approach which is much lower compared to other approaches using redundancy. Also, the latency and throughput of the proposed approach is investigated after accounting for faulty routers.

Bookmarks Related papers MentionsView impact

International Journal of Microelectronics and Computer Science, 2018

Bookmarks Related papers MentionsView impact

Metrology, Inspection, and Process Control for Semiconductor Manufacturing XXXV, Feb 22, 2021

Bookmarks Related papers MentionsView impact

2021 28th IEEE International Conference on Electronics, Circuits, and Systems (ICECS), Nov 28, 2021

Bookmarks Related papers MentionsView impact

IEEE Transactions on Biomedical Circuits and Systems, Aug 1, 2020

Bookmarks Related papers MentionsView impact

2021 28th IEEE International Conference on Electronics, Circuits, and Systems (ICECS), Nov 28, 2021

Bookmarks Related papers MentionsView impact

International Journal of Microelectronics and Computer Science, 2018

Bookmarks Related papers MentionsView impact

Assuit University, 2014

Due to its relatively simpler design and lower power consumption, levelcrossing analog-to-digital... more Due to its relatively simpler design and lower power consumption, levelcrossing analog-to-digital converters (LC-ADCs) have attracted some interest in the past decade. In LC-ADCs, the input (analog) signal is irregularly sampled when it crosses a group of defined threshold levels. Then, the time between the crossing moments is measured by a timing measurement circuit. The signal can be reconstructed by using the threshold level and crossing time information. The main building blocks of a LC-ADC are comparator and time-to-digital converter (TDC). The characteristics of these two blocks highly affect the performance of the designated LC-ADC. This thesis presents a study on the two main components of a LC-ADC, namely, comparator and TDC. The thesis comes in six chapters and three appendices.

Bookmarks Related papers MentionsView impact