Sek Chai - Academia.edu (original) (raw)
Papers by Sek Chai
2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2), 2018
In order to achieve high processing efficiencies, next generation computer architecture designs n... more In order to achieve high processing efficiencies, next generation computer architecture designs need an effective Artificial Intelligence (AI)-framework to learn large-scale processor interactions. In this short paper, we present Deep Temporal Models (DTMs) that offer effective and scalable time-series representations to addresses key challenges for learning processor data: high data rate, cyclic patterns, and high dimensionality. We present our approach using DTMs to learn and predict processor events. We show comparisons using these learning models with promising initial simulation results.
ArXiv, 2017
We present a novel optimization strategy for training neural networks which we call "BitNet&... more We present a novel optimization strategy for training neural networks which we call "BitNet". The parameters of neural networks are usually unconstrained and have a dynamic range dispersed over all real values. Our key idea is to limit the expressive power of the network by dynamically controlling the range and set of values that the parameters can take. We formulate this idea using a novel end-to-end approach that circumvents the discrete parameter space by optimizing a relaxed continuous and differentiable upper bound of the typical classification loss function. The approach can be interpreted as a regularization inspired by the Minimum Description Length (MDL) principle. For each layer of the network, our approach optimizes real-valued translation and scaling factors and arbitrary precision integer-valued parameters (weights). We empirically compare BitNet to an equivalent unregularized model on the MNIST and CIFAR-10 datasets. We show that BitNet converges faster to a ...
ArXiv, 2019
Our research is focused on understanding and applying biological memory transfers to new AI syste... more Our research is focused on understanding and applying biological memory transfers to new AI systems that can fundamentally improve their performance, throughout their fielded lifetime experience. We leverage current understanding of biological memory transfer to arrive at AI algorithms for memory consolidation and replay. In this paper, we propose the use of generative memory that can be recalled in batch samples to train a multi-task agent in a pseudo-rehearsal manner. We show results motivating the need for task-agnostic separation of latent space for the generative memory to address issues of catastrophic forgetting in lifelong learning.
ArXiv, 2021
We propose a Quantization Guided Training (QGT) method to guide DNN training towards optimized lo... more We propose a Quantization Guided Training (QGT) method to guide DNN training towards optimized low-bit-precision targets and reach extreme compression levels below 8-bit precision. Unlike standard quantization-aware training (QAT) approaches, QGT uses customized regularization to encourage weight values towards a distribution that maximizes accuracy while reducing quantization errors. One of the main benefits of this approach is the ability to identify compression bottlenecks. We validate QGT using state-ofthe-art model architectures on vision datasets. We also demonstrate the effectiveness of QGT with an 81KB tiny model for person detection down to 2-bit precision (representing 17.7x size reduction), while maintaining an accuracy drop of only 3% compared to a floating-point baseline.
ArXiv, 2018
The use of deep neural networks in edge computing devices hinges on the balance between accuracy ... more The use of deep neural networks in edge computing devices hinges on the balance between accuracy and complexity of computations. Ternary Connect (TC) \cite{lin2015neural} addresses this issue by restricting the parameters to three levels −1,0-1, 0−1,0, and +1+1+1, thus eliminating multiplications in the forward pass of the network during prediction. We propose Generalized Ternary Connect (GTC), which allows an arbitrary number of levels while at the same time eliminating multiplications by restricting the parameters to integer powers of two. The primary contribution is that GTC learns the number of levels and their values for each layer, jointly with the weights of the network in an end-to-end fashion. Experiments on MNIST and CIFAR-10 show that GTC naturally converges to an `almost binary' network for deep classification networks (e.g. VGG-16) and deep variational auto-encoders, with negligible loss of classification accuracy and comparable visual quality of generated samples respectiv...
Quantization for deep neural networks (DNN) have enabled developers to deploy models with less me... more Quantization for deep neural networks (DNN) have enabled developers to deploy models with less memory and more efficient low-power inference. However, not all DNN designs are friendly to quantization. For example, the popular Mobilenet architecture has been tuned to reduce parameter size and computational latency with separable depth-wise convolutions, but not all quantization algorithms work well and the accuracy can suffer against its float point versions. In this paper, we analyzed several root causes of quantization loss and proposed alternatives that do not rely on per-channel or training-aware approaches. We evaluate the image classification task on ImageNet dataset, and our post-training quantized 8-bit inference top-1 accuracy in within 0.7% of the floating point version.
Attacks against the control processor of a power-grid system, especially zero-day attacks, can be... more Attacks against the control processor of a power-grid system, especially zero-day attacks, can be catastrophic. Earlier detection of the attacks can prevent further damage. However, detecting zero-day attacks can be challenging because they have no known code and have unknown behavior. In order to address the zero-day attack problem, we propose a data-driven defense by training a temporal deep learning model, using only normal data from legitimate processes that run daily in these power-grid systems, to model the normal behavior of the power-grid controller. Then, we can quickly find malicious codes running on the processor, by estimating deviations from the normal behavior with a statistical test. Experimental results on a real power-grid controller show that we can detect anomalous behavior with over 99.9% accuracy and nearly zero false positives.
Complex image processing and computer vision systems often consist of a “pipeline” of “black boxe... more Complex image processing and computer vision systems often consist of a “pipeline” of “black boxes” that each solve part of the problem. We intend to replace parts or all of a target pipeline with deep neural networks to achieve benefits such as increased accuracy or reduced computational requirement. To acquire a large amounts of labeled data necessary to train the deep neural network, we propose a workflow that leverages the target pipeline to create a significantly larger labeled training set automatically, without prior domain knowledge of the target pipeline. We show experimentally that despite the noise introduced by automated labeling and only using a very small initially labeled data set, the trained deep neural networks can achieve similar or even better performance than the components they replace, while in some cases also reducing computational requirements.
2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS), 2019
Quantization for deep neural networks have afforded models for edge devices that use less on-boar... more Quantization for deep neural networks have afforded models for edge devices that use less on-board memory and enable efficient low-power inference. In this paper, we present a comparison of model-parameter driven quantization approaches that can achieve as low as 3-bit precision without affecting accuracy. The post-training quantization approaches are data-free, and the resulting weight values are closely tied to the dataset distribution on which the model has converged to optimality. We show quantization results for a number of state-of-art deep neural networks (DNN) using large dataset like ImageNet. To better analyze quantization results, we describe the overall range and local sparsity of values afforded through various quantization schemes. We show the methods to lower bit-precision beyond quantization limits with object class clustering.
2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE), 2019
Controllers of security-critical cyber-physical systems, like the power grid, are a very importan... more Controllers of security-critical cyber-physical systems, like the power grid, are a very important class of computer systems. Attacks against the control code of a power-grid system, especially zero-day attacks, can be catastrophic. Earlier detection of the anomalies can prevent further damage. However, detecting zero-day attacks is extremely challenging because they have no known code and have unknown behavior. Furthermore, if data collected from the controller is transferred to a server through networks for analysis and detection of anomalous behavior, this creates a very large attack surface and also delays detection. In order to address this problem, we propose Reconstruction Error Distribution (RED) of Hardware Performance Counters (HPCs), and a data-driven defense system based on it. Specifically, we first train a temporal deep learning model, using only normal HPC readings from legitimate processes that run daily in these power-grid systems, to model the normal behavior of the power-grid controller. Then, we run this model using real-time data from commonly available HPCs. We use the proposed RED to enhance the temporal deep learning detection of anomalous behavior, by estimating distribution deviations from the normal behavior with an effective statistical test. Experimental results on a real power-grid controller show that we can detect anomalous behavior with high accuracy (>99.9%), nearly zero false positives and short (<360ms) latency.
International Journal of Computer Vision, 2017
We propose a novel hybrid model that exploits the strength of discriminative classifiers along wi... more We propose a novel hybrid model that exploits the strength of discriminative classifiers along with the representation power of generative models. Our focus is on detecting multimodal events in time varying sequences as well as generating missing data in any of the modalities. Discriminative classifiers have been shown to achieve higher performances than the corresponding generative likelihood-based classifiers. On the other hand, generative models learn a rich informative space which allows for data generation and joint feature representation that discriminative models lack. We propose a new model that jointly optimizes the representation space using a hybrid energy function. We employ a Restricted Boltzmann Machines (RBMs) based model to learn a shared representation across multiple modalities with time varying data. The Conditional RBMs (CRBMs) is an extension of the RBM model that takes into account short term tem-Communicated by Cordelia Schmid ,V. Lepetit.
Journal of Signal Processing Systems, 2017
Memory performance is a key bottleneck for deep learning systems. Binarization of both activation... more Memory performance is a key bottleneck for deep learning systems. Binarization of both activations and weights is one promising approach that can best scale to realize the highest energy efficient system using the lowest possible precision. In this paper, we utilize and analyze the binarized neural network in doing human detection on infrared images. Our results show comparable algorithmic performance of binarized versus 32bit floating-point networks, with the added benefit of greatly simplified computation and reduced memory overhead. In addition, we present a system architecture designed specifically for computation using binary representation that achieves at least 4× speedup and the energy is improved by three orders of magnitude over GPU.
2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), 2016
This paper presents a programmable and scalable digital neuromorphic architecture based on 3D hig... more This paper presents a programmable and scalable digital neuromorphic architecture based on 3D highdensity memory integrated with logic tier for efficient neural computing. The proposed architecture consists of clusters of processing engines, connected by 2D mesh network as a processing tier, which is integrated in 3D with multiple tiers of DRAM. The PE clusters access multiple memory channels (vaults) in parallel. The operating principle, referred to as the memory centric computing, embeds specialized state-machines within the vault controllers of HMC to drive data into the PE clusters. The paper presents the basic architecture of the Neurocube and an analysis of the logic tier synthesized in 28nm and 15nm process technologies. The performance of the Neurocube is evaluated and illustrated through the mapping of a Convolutional Neural Network and estimating the subsequent power and performance for both training and inference.
2016 IEEE Winter Applications of Computer Vision Workshops (WACVW), 2016
Scientists today face an onerous task to manually annotate vast amount of underwater video data f... more Scientists today face an onerous task to manually annotate vast amount of underwater video data for fish stock assessment. In this paper, we propose a robust and unsupervised deep learning algorithm to automatically detect fish and thereby easing the burden of manual annotation. The algorithm automates fish sampling in the training stage by fusion of optical flow segments and objective proposals. We auto-generate large amounts of fish samples from the detection of flow motion and based on the flow-objectiveness overlap probability we annotate the true-false samples. We also adapt a biased training weight towards negative samples to reduce noise. In detection, in addition to fused regions, we used a Modified Non-Maximum Suppression (MNMS) algorithm to reduce false classifications on part of the fishes from the aggressive NMS approach. We exhaustively tested our algorithms using NOAA provided, luminance-only underwater fish videos. Our tests have shown that Average Precision (AP) of detection improved by about 10% compared to non-fusion approach and about another 10% by using MNMS.
Proceedings International Conference on Computer Design VLSI in Computers and Processors
Page 1. Power Constrained Design of Multiprocessor Interconnection Networks Chirag S. Patel, Sek ... more Page 1. Power Constrained Design of Multiprocessor Interconnection Networks Chirag S. Patel, Sek M. Chai, Sudhakar Yalamanchili, David E. Schimmel School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta GA. 30332-0250 Abstract ...
2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2), 2018
In order to achieve high processing efficiencies, next generation computer architecture designs n... more In order to achieve high processing efficiencies, next generation computer architecture designs need an effective Artificial Intelligence (AI)-framework to learn large-scale processor interactions. In this short paper, we present Deep Temporal Models (DTMs) that offer effective and scalable time-series representations to addresses key challenges for learning processor data: high data rate, cyclic patterns, and high dimensionality. We present our approach using DTMs to learn and predict processor events. We show comparisons using these learning models with promising initial simulation results.
ArXiv, 2017
We present a novel optimization strategy for training neural networks which we call "BitNet&... more We present a novel optimization strategy for training neural networks which we call "BitNet". The parameters of neural networks are usually unconstrained and have a dynamic range dispersed over all real values. Our key idea is to limit the expressive power of the network by dynamically controlling the range and set of values that the parameters can take. We formulate this idea using a novel end-to-end approach that circumvents the discrete parameter space by optimizing a relaxed continuous and differentiable upper bound of the typical classification loss function. The approach can be interpreted as a regularization inspired by the Minimum Description Length (MDL) principle. For each layer of the network, our approach optimizes real-valued translation and scaling factors and arbitrary precision integer-valued parameters (weights). We empirically compare BitNet to an equivalent unregularized model on the MNIST and CIFAR-10 datasets. We show that BitNet converges faster to a ...
ArXiv, 2019
Our research is focused on understanding and applying biological memory transfers to new AI syste... more Our research is focused on understanding and applying biological memory transfers to new AI systems that can fundamentally improve their performance, throughout their fielded lifetime experience. We leverage current understanding of biological memory transfer to arrive at AI algorithms for memory consolidation and replay. In this paper, we propose the use of generative memory that can be recalled in batch samples to train a multi-task agent in a pseudo-rehearsal manner. We show results motivating the need for task-agnostic separation of latent space for the generative memory to address issues of catastrophic forgetting in lifelong learning.
ArXiv, 2021
We propose a Quantization Guided Training (QGT) method to guide DNN training towards optimized lo... more We propose a Quantization Guided Training (QGT) method to guide DNN training towards optimized low-bit-precision targets and reach extreme compression levels below 8-bit precision. Unlike standard quantization-aware training (QAT) approaches, QGT uses customized regularization to encourage weight values towards a distribution that maximizes accuracy while reducing quantization errors. One of the main benefits of this approach is the ability to identify compression bottlenecks. We validate QGT using state-ofthe-art model architectures on vision datasets. We also demonstrate the effectiveness of QGT with an 81KB tiny model for person detection down to 2-bit precision (representing 17.7x size reduction), while maintaining an accuracy drop of only 3% compared to a floating-point baseline.
ArXiv, 2018
The use of deep neural networks in edge computing devices hinges on the balance between accuracy ... more The use of deep neural networks in edge computing devices hinges on the balance between accuracy and complexity of computations. Ternary Connect (TC) \cite{lin2015neural} addresses this issue by restricting the parameters to three levels −1,0-1, 0−1,0, and +1+1+1, thus eliminating multiplications in the forward pass of the network during prediction. We propose Generalized Ternary Connect (GTC), which allows an arbitrary number of levels while at the same time eliminating multiplications by restricting the parameters to integer powers of two. The primary contribution is that GTC learns the number of levels and their values for each layer, jointly with the weights of the network in an end-to-end fashion. Experiments on MNIST and CIFAR-10 show that GTC naturally converges to an `almost binary' network for deep classification networks (e.g. VGG-16) and deep variational auto-encoders, with negligible loss of classification accuracy and comparable visual quality of generated samples respectiv...
Quantization for deep neural networks (DNN) have enabled developers to deploy models with less me... more Quantization for deep neural networks (DNN) have enabled developers to deploy models with less memory and more efficient low-power inference. However, not all DNN designs are friendly to quantization. For example, the popular Mobilenet architecture has been tuned to reduce parameter size and computational latency with separable depth-wise convolutions, but not all quantization algorithms work well and the accuracy can suffer against its float point versions. In this paper, we analyzed several root causes of quantization loss and proposed alternatives that do not rely on per-channel or training-aware approaches. We evaluate the image classification task on ImageNet dataset, and our post-training quantized 8-bit inference top-1 accuracy in within 0.7% of the floating point version.
Attacks against the control processor of a power-grid system, especially zero-day attacks, can be... more Attacks against the control processor of a power-grid system, especially zero-day attacks, can be catastrophic. Earlier detection of the attacks can prevent further damage. However, detecting zero-day attacks can be challenging because they have no known code and have unknown behavior. In order to address the zero-day attack problem, we propose a data-driven defense by training a temporal deep learning model, using only normal data from legitimate processes that run daily in these power-grid systems, to model the normal behavior of the power-grid controller. Then, we can quickly find malicious codes running on the processor, by estimating deviations from the normal behavior with a statistical test. Experimental results on a real power-grid controller show that we can detect anomalous behavior with over 99.9% accuracy and nearly zero false positives.
Complex image processing and computer vision systems often consist of a “pipeline” of “black boxe... more Complex image processing and computer vision systems often consist of a “pipeline” of “black boxes” that each solve part of the problem. We intend to replace parts or all of a target pipeline with deep neural networks to achieve benefits such as increased accuracy or reduced computational requirement. To acquire a large amounts of labeled data necessary to train the deep neural network, we propose a workflow that leverages the target pipeline to create a significantly larger labeled training set automatically, without prior domain knowledge of the target pipeline. We show experimentally that despite the noise introduced by automated labeling and only using a very small initially labeled data set, the trained deep neural networks can achieve similar or even better performance than the components they replace, while in some cases also reducing computational requirements.
2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS), 2019
Quantization for deep neural networks have afforded models for edge devices that use less on-boar... more Quantization for deep neural networks have afforded models for edge devices that use less on-board memory and enable efficient low-power inference. In this paper, we present a comparison of model-parameter driven quantization approaches that can achieve as low as 3-bit precision without affecting accuracy. The post-training quantization approaches are data-free, and the resulting weight values are closely tied to the dataset distribution on which the model has converged to optimality. We show quantization results for a number of state-of-art deep neural networks (DNN) using large dataset like ImageNet. To better analyze quantization results, we describe the overall range and local sparsity of values afforded through various quantization schemes. We show the methods to lower bit-precision beyond quantization limits with object class clustering.
2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE), 2019
Controllers of security-critical cyber-physical systems, like the power grid, are a very importan... more Controllers of security-critical cyber-physical systems, like the power grid, are a very important class of computer systems. Attacks against the control code of a power-grid system, especially zero-day attacks, can be catastrophic. Earlier detection of the anomalies can prevent further damage. However, detecting zero-day attacks is extremely challenging because they have no known code and have unknown behavior. Furthermore, if data collected from the controller is transferred to a server through networks for analysis and detection of anomalous behavior, this creates a very large attack surface and also delays detection. In order to address this problem, we propose Reconstruction Error Distribution (RED) of Hardware Performance Counters (HPCs), and a data-driven defense system based on it. Specifically, we first train a temporal deep learning model, using only normal HPC readings from legitimate processes that run daily in these power-grid systems, to model the normal behavior of the power-grid controller. Then, we run this model using real-time data from commonly available HPCs. We use the proposed RED to enhance the temporal deep learning detection of anomalous behavior, by estimating distribution deviations from the normal behavior with an effective statistical test. Experimental results on a real power-grid controller show that we can detect anomalous behavior with high accuracy (>99.9%), nearly zero false positives and short (<360ms) latency.
International Journal of Computer Vision, 2017
We propose a novel hybrid model that exploits the strength of discriminative classifiers along wi... more We propose a novel hybrid model that exploits the strength of discriminative classifiers along with the representation power of generative models. Our focus is on detecting multimodal events in time varying sequences as well as generating missing data in any of the modalities. Discriminative classifiers have been shown to achieve higher performances than the corresponding generative likelihood-based classifiers. On the other hand, generative models learn a rich informative space which allows for data generation and joint feature representation that discriminative models lack. We propose a new model that jointly optimizes the representation space using a hybrid energy function. We employ a Restricted Boltzmann Machines (RBMs) based model to learn a shared representation across multiple modalities with time varying data. The Conditional RBMs (CRBMs) is an extension of the RBM model that takes into account short term tem-Communicated by Cordelia Schmid ,V. Lepetit.
Journal of Signal Processing Systems, 2017
Memory performance is a key bottleneck for deep learning systems. Binarization of both activation... more Memory performance is a key bottleneck for deep learning systems. Binarization of both activations and weights is one promising approach that can best scale to realize the highest energy efficient system using the lowest possible precision. In this paper, we utilize and analyze the binarized neural network in doing human detection on infrared images. Our results show comparable algorithmic performance of binarized versus 32bit floating-point networks, with the added benefit of greatly simplified computation and reduced memory overhead. In addition, we present a system architecture designed specifically for computation using binary representation that achieves at least 4× speedup and the energy is improved by three orders of magnitude over GPU.
2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), 2016
This paper presents a programmable and scalable digital neuromorphic architecture based on 3D hig... more This paper presents a programmable and scalable digital neuromorphic architecture based on 3D highdensity memory integrated with logic tier for efficient neural computing. The proposed architecture consists of clusters of processing engines, connected by 2D mesh network as a processing tier, which is integrated in 3D with multiple tiers of DRAM. The PE clusters access multiple memory channels (vaults) in parallel. The operating principle, referred to as the memory centric computing, embeds specialized state-machines within the vault controllers of HMC to drive data into the PE clusters. The paper presents the basic architecture of the Neurocube and an analysis of the logic tier synthesized in 28nm and 15nm process technologies. The performance of the Neurocube is evaluated and illustrated through the mapping of a Convolutional Neural Network and estimating the subsequent power and performance for both training and inference.
2016 IEEE Winter Applications of Computer Vision Workshops (WACVW), 2016
Scientists today face an onerous task to manually annotate vast amount of underwater video data f... more Scientists today face an onerous task to manually annotate vast amount of underwater video data for fish stock assessment. In this paper, we propose a robust and unsupervised deep learning algorithm to automatically detect fish and thereby easing the burden of manual annotation. The algorithm automates fish sampling in the training stage by fusion of optical flow segments and objective proposals. We auto-generate large amounts of fish samples from the detection of flow motion and based on the flow-objectiveness overlap probability we annotate the true-false samples. We also adapt a biased training weight towards negative samples to reduce noise. In detection, in addition to fused regions, we used a Modified Non-Maximum Suppression (MNMS) algorithm to reduce false classifications on part of the fishes from the aggressive NMS approach. We exhaustively tested our algorithms using NOAA provided, luminance-only underwater fish videos. Our tests have shown that Average Precision (AP) of detection improved by about 10% compared to non-fusion approach and about another 10% by using MNMS.
Proceedings International Conference on Computer Design VLSI in Computers and Processors
Page 1. Power Constrained Design of Multiprocessor Interconnection Networks Chirag S. Patel, Sek ... more Page 1. Power Constrained Design of Multiprocessor Interconnection Networks Chirag S. Patel, Sek M. Chai, Sudhakar Yalamanchili, David E. Schimmel School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta GA. 30332-0250 Abstract ...