rahim entezari | AmirKabir University Of Technology (original) (raw)
Papers by rahim entezari
arXiv (Cornell University), Nov 15, 2022
In this paper we look into the conjecture of Entezari et al. (2021) which states that if the perm... more In this paper we look into the conjecture of Entezari et al. (2021) which states that if the permutation invariance of neural networks is taken into account, then there is likely no loss barrier to the linear interpolation between SGD solutions. First, we observe that neuron alignment methods alone are insufficient to establish low-barrier linear connectivity between SGD solutions due to a phenomenon we call variance collapse: interpolated deep networks suffer a collapse in the variance of their activations, causing poor performance. Next, we propose REPAIR (REnormalizing Permuted Activations for Interpolation Repair) which mitigates variance collapse by rescaling the preactivations of such interpolated networks. We explore the interaction between our method and the choice of normalization layer, network width, and depth, and demonstrate that using REPAIR on top of neuron alignment methods leads to 60%-100% relative barrier reduction across a wide variety of architecture families and tasks. In particular, we report a 74% barrier reduction for ResNet50 on ImageNet and 90% barrier reduction for ResNet18 on CIFAR10.
Cornell University - arXiv, Jun 15, 2022
Recently, pruning deep neural networks (DNNs) has received a lot of attention for improving accur... more Recently, pruning deep neural networks (DNNs) has received a lot of attention for improving accuracy and generalization power, reducing network size, and increasing inference speed on specialized hardwares. Although pruning was mainly tested on computer vision tasks, its application in the context of medical image analysis has hardly been explored. This work investigates the impact of well-known pruning techniques, namely layer-wise and network-wide magnitude pruning, on the nuclei instance segmentation performance in histological images. Our utilised instance segmentation model consists of two main branches: (1) a semantic segmentation branch, and (2) a deep regression branch. We investigate the impact of weight pruning on the performance of both branches separately, and on the final nuclei instance segmentation result. Evaluated on two publicly available datasets, our results show that layer-wise pruning delivers slightly better performance than networkwide pruning for small compression ratios (CRs) while for large CRs, network-wide pruning yields superior performance. For semantic segmentation, deep regression and final instance segmentation, 93.75 %, 95 %, and 80 % of the model weights can be pruned by layer-wise pruning with less than 2 % reduction in the performance of respective models.
Cornell University - arXiv, Jul 1, 2022
We study the impact of different pruning techniques on the representation learned by deep neural ... more We study the impact of different pruning techniques on the representation learned by deep neural networks trained with contrastive loss functions. Our work finds that at high sparsity levels, contrastive learning results in a higher number of misclassified examples relative to models trained with traditional cross-entropy loss. To understand this pronounced difference, we use metrics such as the number of PIEs (Hooker et al., 2019), Q-Score (Kalibhat et al., 2022) and PD-Score (Baldock et al., 2021) to measure the impact of pruning on the learned representation quality. Our analysis suggests the schedule of the pruning method implementation matters. We find that the negative impact of sparsity on the quality of the learned representation is the highest when pruning is introduced early-on in training phase.
2022 IEEE/ACM Seventh International Conference on Internet-of-Things Design and Implementation (IoTDI)
International Conference on Machine Learning, Jul 8, 2021
This paper examines the impact of static sparsity on the robustness of a trained network to weigh... more This paper examines the impact of static sparsity on the robustness of a trained network to weight perturbations, data corruption, and adversarial examples. We show that, up to a certain sparsity achieved by increasing network width and depth while keeping the network capacity fixed, sparsified networks consistently match and often outperform their initially dense versions. Robustness and accuracy decline simultaneously for very high sparsity due to loose connectivity between network layers. Our findings show that a rapid robustness drop caused by network compression observed in the literature is due to a reduced network capacity rather than sparsity.
ArXiv, 2019
Today's deep neural networks require substantial computation resources for their training, st... more Today's deep neural networks require substantial computation resources for their training, storage, and inference, which limits their effective use on resource-constrained devices. Many recent research activities explore different options for compressing and optimizing deep models. On the one hand, in many real-world applications, we face the data imbalance challenge, i.e. when the number of labeled instances of one class considerably outweighs the number of labeled instances of the other class. On the other hand, applications may pose a class imbalance problem, i.e. higher number of false positives produced when training a model and optimizing its performance may be tolerable, yet the number of false negatives must stay low. The problem originates from the fact that some classes are more important for the application than others, e.g. detection problems in medical and surveillance domains. Motivated by the success of the lottery ticket hypothesis, in this paper we propose an it...
2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS), 2019
Network control in microgrids is an active research area driven by a steady increase in energy de... more Network control in microgrids is an active research area driven by a steady increase in energy demand, the necessity to minimize the environmental footprint, yet achieve socioeconomic benefits and ensure sustainability. Reducing deviation of the predicted energy consumption from the actual one, softening peaks in demand and filling in the troughs, especially at times when power is more affordable and clean, present challenges for the demand-side response. In this paper, we present a hierarchical energy system architecture with embedded control. This architecture pushes prediction models to edge devices and executes local control loops to address the challenge of managing demand-side response locally. We employ a two-step approach: At an upper level of hierarchy, we adopt a conventional machine learning pipeline to build load prediction models using automated domain-specific feature extraction and selection. Given historical data, these models are then used to label prediction failur...
2016 2nd International Conference of Signal Processing and Intelligent Systems (ICSPIS), 2016
In this paper we introduce a general probabilistic graphical model for human everyday activity re... more In this paper we introduce a general probabilistic graphical model for human everyday activity recognition. The proposed model is a discriminative graphical model with hidden variables for modeling body pose and sequential order of them. We use a unified framework for prediction task that is faster and more efficient than structured support vector machine and hidden conditional random fields. We have trained and tested the model on RGB-D videos and the result was comparable to the state of the art.
2020 IEEE Second Workshop on Machine Learning on Edge in Sensor Systems (SenSys-ML), 2020
Today’s deep neural networks require substantial computation resources for their training, storag... more Today’s deep neural networks require substantial computation resources for their training, storage and inference, which limits their effective use on resource-constrained devices. Many recent research activities explore different options for compressing and optimizing deep models. On the one hand, in many real-world applications we face the data imbalance challenge, i.e., when the number of labeled instances of one class considerably outweighs the number of labeled instances of the other class. On the other hand, applications may pose a class imbalance problem, i.e., higher number of false positives produced when training a model and optimizing its performance may be tolerable, yet the number of false negatives must stay low. The problem originates from the fact that some classes are more important for the application than others, e.g., detection problems in medical and surveillance domains. Motivated by the success of the lottery ticket hypothesis, in this paper we propose an itera...
Today’s deep neural networks require substantial computation resources for their training, storag... more Today’s deep neural networks require substantial computation resources for their training, storage, and inference, which limits their effective use on resource-constrained devices. Many recent research activities explore different options for compressing and optimizing deep models for the Internet of Things (IoT). Our work so far has focused on two important aspects of deep neural network compression: class-dependent model compression and explainable compression. We shortly summarize our contributions and conclude with an outline of our future research directions.
ArXiv, 2021
In this paper, we conjecture that if the permutation invariance of neural networks is taken into ... more In this paper, we conjecture that if the permutation invariance of neural networks is taken into account, SGD solutions will likely have no barrier in the linear interpolation between them. Although it is a bold conjecture, we show how extensive empirical attempts fall short of refuting it. We further provide a preliminary theoretical result to support our conjecture. Our conjecture has implications for lottery ticket hypothesis, distributed training, and ensemble methods.
Computer Vision – ACCV 2018
Real-time detection of irregularities in visual data is very invaluable and useful in many prospe... more Real-time detection of irregularities in visual data is very invaluable and useful in many prospective applications including surveillance, patient monitoring systems, etc. With the surge of deep learning methods in the recent years, researchers have tried a wide spectrum of methods for different applications. However, for the case of irregularity or anomaly detection in videos, training an end-to-end model is still an open challenge, since often irregularity is not well-defined and there are not enough irregular samples to use during training. In this paper, inspired by the success of generative adversarial networks (GANs) for training deep models in unsupervised or self-supervised settings, we propose an end-to-end deep network for detection and fine localization of irregularities in videos (and images). Our proposed architecture is composed of two networks, which are trained in competing with each other while collaborating to find the irregularity. One network works as a pixel-level irregularity Inpainter, and the other works as a patch-level Detector. After an adversarial self-supervised training, in which I tries to fool D into accepting its inpainted output as regular (normal), the two networks collaborate to detect and fine-segment the irregularity in any given testing video. Our results on three different datasets show that our method can outperform the state-of-the-art and fine-segment the irregularity.
arXiv (Cornell University), Nov 15, 2022
In this paper we look into the conjecture of Entezari et al. (2021) which states that if the perm... more In this paper we look into the conjecture of Entezari et al. (2021) which states that if the permutation invariance of neural networks is taken into account, then there is likely no loss barrier to the linear interpolation between SGD solutions. First, we observe that neuron alignment methods alone are insufficient to establish low-barrier linear connectivity between SGD solutions due to a phenomenon we call variance collapse: interpolated deep networks suffer a collapse in the variance of their activations, causing poor performance. Next, we propose REPAIR (REnormalizing Permuted Activations for Interpolation Repair) which mitigates variance collapse by rescaling the preactivations of such interpolated networks. We explore the interaction between our method and the choice of normalization layer, network width, and depth, and demonstrate that using REPAIR on top of neuron alignment methods leads to 60%-100% relative barrier reduction across a wide variety of architecture families and tasks. In particular, we report a 74% barrier reduction for ResNet50 on ImageNet and 90% barrier reduction for ResNet18 on CIFAR10.
Cornell University - arXiv, Jun 15, 2022
Recently, pruning deep neural networks (DNNs) has received a lot of attention for improving accur... more Recently, pruning deep neural networks (DNNs) has received a lot of attention for improving accuracy and generalization power, reducing network size, and increasing inference speed on specialized hardwares. Although pruning was mainly tested on computer vision tasks, its application in the context of medical image analysis has hardly been explored. This work investigates the impact of well-known pruning techniques, namely layer-wise and network-wide magnitude pruning, on the nuclei instance segmentation performance in histological images. Our utilised instance segmentation model consists of two main branches: (1) a semantic segmentation branch, and (2) a deep regression branch. We investigate the impact of weight pruning on the performance of both branches separately, and on the final nuclei instance segmentation result. Evaluated on two publicly available datasets, our results show that layer-wise pruning delivers slightly better performance than networkwide pruning for small compression ratios (CRs) while for large CRs, network-wide pruning yields superior performance. For semantic segmentation, deep regression and final instance segmentation, 93.75 %, 95 %, and 80 % of the model weights can be pruned by layer-wise pruning with less than 2 % reduction in the performance of respective models.
Cornell University - arXiv, Jul 1, 2022
We study the impact of different pruning techniques on the representation learned by deep neural ... more We study the impact of different pruning techniques on the representation learned by deep neural networks trained with contrastive loss functions. Our work finds that at high sparsity levels, contrastive learning results in a higher number of misclassified examples relative to models trained with traditional cross-entropy loss. To understand this pronounced difference, we use metrics such as the number of PIEs (Hooker et al., 2019), Q-Score (Kalibhat et al., 2022) and PD-Score (Baldock et al., 2021) to measure the impact of pruning on the learned representation quality. Our analysis suggests the schedule of the pruning method implementation matters. We find that the negative impact of sparsity on the quality of the learned representation is the highest when pruning is introduced early-on in training phase.
2022 IEEE/ACM Seventh International Conference on Internet-of-Things Design and Implementation (IoTDI)
International Conference on Machine Learning, Jul 8, 2021
This paper examines the impact of static sparsity on the robustness of a trained network to weigh... more This paper examines the impact of static sparsity on the robustness of a trained network to weight perturbations, data corruption, and adversarial examples. We show that, up to a certain sparsity achieved by increasing network width and depth while keeping the network capacity fixed, sparsified networks consistently match and often outperform their initially dense versions. Robustness and accuracy decline simultaneously for very high sparsity due to loose connectivity between network layers. Our findings show that a rapid robustness drop caused by network compression observed in the literature is due to a reduced network capacity rather than sparsity.
ArXiv, 2019
Today's deep neural networks require substantial computation resources for their training, st... more Today's deep neural networks require substantial computation resources for their training, storage, and inference, which limits their effective use on resource-constrained devices. Many recent research activities explore different options for compressing and optimizing deep models. On the one hand, in many real-world applications, we face the data imbalance challenge, i.e. when the number of labeled instances of one class considerably outweighs the number of labeled instances of the other class. On the other hand, applications may pose a class imbalance problem, i.e. higher number of false positives produced when training a model and optimizing its performance may be tolerable, yet the number of false negatives must stay low. The problem originates from the fact that some classes are more important for the application than others, e.g. detection problems in medical and surveillance domains. Motivated by the success of the lottery ticket hypothesis, in this paper we propose an it...
2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS), 2019
Network control in microgrids is an active research area driven by a steady increase in energy de... more Network control in microgrids is an active research area driven by a steady increase in energy demand, the necessity to minimize the environmental footprint, yet achieve socioeconomic benefits and ensure sustainability. Reducing deviation of the predicted energy consumption from the actual one, softening peaks in demand and filling in the troughs, especially at times when power is more affordable and clean, present challenges for the demand-side response. In this paper, we present a hierarchical energy system architecture with embedded control. This architecture pushes prediction models to edge devices and executes local control loops to address the challenge of managing demand-side response locally. We employ a two-step approach: At an upper level of hierarchy, we adopt a conventional machine learning pipeline to build load prediction models using automated domain-specific feature extraction and selection. Given historical data, these models are then used to label prediction failur...
2016 2nd International Conference of Signal Processing and Intelligent Systems (ICSPIS), 2016
In this paper we introduce a general probabilistic graphical model for human everyday activity re... more In this paper we introduce a general probabilistic graphical model for human everyday activity recognition. The proposed model is a discriminative graphical model with hidden variables for modeling body pose and sequential order of them. We use a unified framework for prediction task that is faster and more efficient than structured support vector machine and hidden conditional random fields. We have trained and tested the model on RGB-D videos and the result was comparable to the state of the art.
2020 IEEE Second Workshop on Machine Learning on Edge in Sensor Systems (SenSys-ML), 2020
Today’s deep neural networks require substantial computation resources for their training, storag... more Today’s deep neural networks require substantial computation resources for their training, storage and inference, which limits their effective use on resource-constrained devices. Many recent research activities explore different options for compressing and optimizing deep models. On the one hand, in many real-world applications we face the data imbalance challenge, i.e., when the number of labeled instances of one class considerably outweighs the number of labeled instances of the other class. On the other hand, applications may pose a class imbalance problem, i.e., higher number of false positives produced when training a model and optimizing its performance may be tolerable, yet the number of false negatives must stay low. The problem originates from the fact that some classes are more important for the application than others, e.g., detection problems in medical and surveillance domains. Motivated by the success of the lottery ticket hypothesis, in this paper we propose an itera...
Today’s deep neural networks require substantial computation resources for their training, storag... more Today’s deep neural networks require substantial computation resources for their training, storage, and inference, which limits their effective use on resource-constrained devices. Many recent research activities explore different options for compressing and optimizing deep models for the Internet of Things (IoT). Our work so far has focused on two important aspects of deep neural network compression: class-dependent model compression and explainable compression. We shortly summarize our contributions and conclude with an outline of our future research directions.
ArXiv, 2021
In this paper, we conjecture that if the permutation invariance of neural networks is taken into ... more In this paper, we conjecture that if the permutation invariance of neural networks is taken into account, SGD solutions will likely have no barrier in the linear interpolation between them. Although it is a bold conjecture, we show how extensive empirical attempts fall short of refuting it. We further provide a preliminary theoretical result to support our conjecture. Our conjecture has implications for lottery ticket hypothesis, distributed training, and ensemble methods.
Computer Vision – ACCV 2018
Real-time detection of irregularities in visual data is very invaluable and useful in many prospe... more Real-time detection of irregularities in visual data is very invaluable and useful in many prospective applications including surveillance, patient monitoring systems, etc. With the surge of deep learning methods in the recent years, researchers have tried a wide spectrum of methods for different applications. However, for the case of irregularity or anomaly detection in videos, training an end-to-end model is still an open challenge, since often irregularity is not well-defined and there are not enough irregular samples to use during training. In this paper, inspired by the success of generative adversarial networks (GANs) for training deep models in unsupervised or self-supervised settings, we propose an end-to-end deep network for detection and fine localization of irregularities in videos (and images). Our proposed architecture is composed of two networks, which are trained in competing with each other while collaborating to find the irregularity. One network works as a pixel-level irregularity Inpainter, and the other works as a patch-level Detector. After an adversarial self-supervised training, in which I tries to fool D into accepting its inpainted output as regular (normal), the two networks collaborate to detect and fine-segment the irregularity in any given testing video. Our results on three different datasets show that our method can outperform the state-of-the-art and fine-segment the irregularity.