Yash Bhalgat - University of Oxford | LinkedIn (original) (raw)
About
PhD Researcher with the Visual Geometry Group at Oxford, working on Video Generation for…
Activity
9K followers
Experience & Education
University of Oxford
View Yash’s full experience
See their title, tenure and more.
Licenses & Certifications
Volunteer Experience
Volunteer
ElderHelp of San Diego
Jun 2020 - Jun 2021 1 year 1 month
Social Services
Buying groceries for seniors (70+ age) every weekend since they are at high risk due to COVID-19.
Publications
Accepted at NeurIPS 2020 September 25, 2020
In this work, we tackle model efficiency by exploiting redundancy in the implicit structure of the building blocks of convolutional neural networks. We start our analysis by introducing a general definition of Composite Kernel structures that enable the execution of convolution operations in the form of efficient, scaled, sum-pooling components. As its special case, we propose Structured Convolutions and show that these allow decomposition of the convolution operation into a sum-pooling…
In this work, we tackle model efficiency by exploiting redundancy in the implicit structure of the building blocks of convolutional neural networks. We start our analysis by introducing a general definition of Composite Kernel structures that enable the execution of convolution operations in the form of efficient, scaled, sum-pooling components. As its special case, we propose Structured Convolutions and show that these allow decomposition of the convolution operation into a sum-pooling operation followed by a convolution with significantly lower complexity and fewer weights. We show how this decomposition can be applied to 2D and 3D kernels as well as the fully-connected layers. Furthermore, we present a Structural Regularization loss that promotes neural network layers to leverage on this desired structure in a way that, after training, they can be decomposed with negligible performance loss. By applying our method to a wide range of CNN architectures, we demonstrate "structured" versions of the ResNets that are up to 2x smaller and a new Structured-MobileNetV2 that is more efficient while staying within an accuracy loss of 1% on ImageNet and CIFAR-10 datasets. We also show similar structured versions of EfficientNet on ImageNet and HRNet architecture for semantic segmentation on the Cityscapes dataset. Our method performs equally well or superior in terms of the complexity reduction in comparison to the existing tensor decomposition and channel pruning methods.
Other authors
-
See publication LSQ+: Improving low-bit quantization through learnable offsets and better initialization
Joint Workshop on Efficient Deep Learning in Computer Vision, CVPR 2020 Jun 2020
Unlike ReLU, newer activation functions (like Swish, H-swish, Mish) that are frequently employed in popular efficient architectures can also result in negative activation values, with skewed positive and negative ranges. Typical learnable quantization schemes [PACT, LSQ] assume unsigned quantization for activations and quantize all negative activations to zero which leads to significant loss in performance. Naively using signed quantization to accommodate these negative values requires an extra…
Unlike ReLU, newer activation functions (like Swish, H-swish, Mish) that are frequently employed in popular efficient architectures can also result in negative activation values, with skewed positive and negative ranges. Typical learnable quantization schemes [PACT, LSQ] assume unsigned quantization for activations and quantize all negative activations to zero which leads to significant loss in performance. Naively using signed quantization to accommodate these negative values requires an extra sign bit which is expensive for low-bit (2-, 3-, 4-bit) quantization. To solve this problem, we propose LSQ+, a natural extension of LSQ, wherein we introduce a general asymmetric quantization scheme with trainable scale and offset parameters that can learn to accommodate the negative activations. Gradient-based learnable quantization schemes also commonly suffer from high instability or variance in the final training performance, hence requiring a great deal of hyper-parameter tuning to reach a satisfactory performance. LSQ+ alleviates this problem by using an MSE-based initialization scheme for the quantization parameters. We show that this initialization leads to significantly lower variance in final performance across multiple training runs. Overall, LSQ+ shows state-of-the-art results for EfficientNet and MixNet and also significantly outperforms LSQ for low-bit quantization of neural nets with Swish activations (e.g.: 1.8% gain with W4A4 quantization and upto 5.6% gain with W2A2 quantization of EfficientNet-B0 on ImageNet dataset). To the best of our knowledge, ours is the first work to quantize such architectures to extremely low bit-widths.
Other authors
arXiv preprint arXiv:2003.00075 Feb 2020
This paper presents a novel differentiable method for unstructured weight pruning of deep neural networks. Our learned-threshold pruning (LTP) method learns per-layer thresholds via gradient descent, unlike conventional methods where they are set as input. Making thresholds trainable also makes LTP computationally efficient, hence scalable to deeper networks. For example, it takes less than 30 epochs for LTP to prune most networks on ImageNet. This is in contrast to other methods that search…
This paper presents a novel differentiable method for unstructured weight pruning of deep neural networks. Our learned-threshold pruning (LTP) method learns per-layer thresholds via gradient descent, unlike conventional methods where they are set as input. Making thresholds trainable also makes LTP computationally efficient, hence scalable to deeper networks. For example, it takes less than 30 epochs for LTP to prune most networks on ImageNet. This is in contrast to other methods that search for per-layer thresholds via a computationally intensive iterative pruning and fine-tuning process. Additionally, with a novel differentiable L0 regularization, LTP is able to operate effectively on architectures with batch-normalization. This is important since L1 and L2 penalties lose their regularizing effect in networks with batch-normalization. Finally, LTP generates a trail of progressively sparser networks from which the desired pruned network can be picked based on sparsity and performance requirements. These features allow LTP to achieve state-of-the-art compression rates on ImageNet networks such as AlexNet (26.4× compression with 79.1% Top-5 accuracy) and ResNet50 (9.1× compression with 92.0% Top-5 accuracy). We also show that LTP effectively prunes newer architectures, such as EfficientNet, MobileNetV2 and MixNet.
Other authors
arXiv preprint arXiv:1911.12491 Nov 2019
Quantization and Knowledge distillation (KD) methods are widely used to reduce memory and power consumption of deep neural networks (DNNs), especially for resource-constrained edge devices. Although their combination is quite promising to meet these requirements, it may not work as desired. It is mainly because the regularization effect of KD further diminishes the already reduced representation power of a quantized model. To address this short-coming, we propose Quantization-aware Knowledge…
Quantization and Knowledge distillation (KD) methods are widely used to reduce memory and power consumption of deep neural networks (DNNs), especially for resource-constrained edge devices. Although their combination is quite promising to meet these requirements, it may not work as desired. It is mainly because the regularization effect of KD further diminishes the already reduced representation power of a quantized model. To address this short-coming, we propose Quantization-aware Knowledge Distillation (QKD) wherein quantization and KD are care-fully coordinated in three phases. First, Self-studying (SS) phase fine-tunes a quantized low-precision student network without KD to obtain a good initialization. Second, Co-studying (CS) phase tries to train a teacher to make it more quantizaion-friendly and powerful than a fixed teacher. Finally, Tutoring (TU) phase transfers knowledge from the trained teacher to the student. We extensively evaluate our method on ImageNet and CIFAR-10/100 datasets and show an ablation study on networks with both standard and depthwise-separable convolutions. The proposed QKD outperformed existing state-of-the-art methods (e.g., 1.3% improvement on ResNet-18 with W4A4, 2.6% on MobileNetV2 with W4A4). Additionally, QKD could recover the full-precision accuracy at as low as W3A3 quantization on ResNet and W6A6 quantization on MobilenetV2.
Other authors
-
See publication Annotation-cost Minimization for Medical Image Segmentation using Suggestive Mixed Supervision Fully Convolutional Networks
Medical Imaging meets NeurIPS 2018 Workshop December 8, 2018
For medical image segmentation, most fully convolutional networks (FCNs) need strong supervision through a large sample of high-quality dense segmentations, which is taxing in terms of costs, time and logistics involved. This burden of annotation can be alleviated by exploiting weak inexpensive annotations such as bounding boxes and anatomical landmarks. However, it is very difficult to \textit{a priori} estimate the optimal balance between the number of annotations needed for each supervision…
For medical image segmentation, most fully convolutional networks (FCNs) need strong supervision through a large sample of high-quality dense segmentations, which is taxing in terms of costs, time and logistics involved. This burden of annotation can be alleviated by exploiting weak inexpensive annotations such as bounding boxes and anatomical landmarks. However, it is very difficult to \textit{a priori} estimate the optimal balance between the number of annotations needed for each supervision type that leads to maximum performance with the least annotation cost. To optimize this cost-performance trade off, we present a budget-based cost-minimization framework in a mixed-supervision setting via dense segmentations, bounding boxes, and landmarks. We propose a linear programming (LP) formulation combined with uncertainty and similarity based ranking strategy to judiciously select samples to be annotated next for optimal performance. In the results section, we show that our proposed method achieves comparable performance to state-of-the-art approaches with significantly reduced cost of annotations.
Other authors
-
See publication Catseyes: Categorizing Seismic Structures with Tessellated Scattering Wavelet Networks
2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) January 29, 2018
As field seismic data sizes are dramatically increasing toward exabytes, automating the labeling of “structural monads” - corresponding to geological patterns and yielding subsurface interpretation - in a huge amount of available information would drastically reduce interpretation time. Since customary designed features may not account for gradual deformations observable in seismic data, we propose to adapt the wavelet-based scattering network methodology with a tessellation of geophysical…
As field seismic data sizes are dramatically increasing toward exabytes, automating the labeling of “structural monads” - corresponding to geological patterns and yielding subsurface interpretation - in a huge amount of available information would drastically reduce interpretation time. Since customary designed features may not account for gradual deformations observable in seismic data, we propose to adapt the wavelet-based scattering network methodology with a tessellation of geophysical images. Its invariances are expected to be able to thwart the effect of the tectonics. The sparse structure of extracted feature vectors suggest to resort to dimension reduction methods before classification. The most promising one is based on a tessellated version of scattering decompositions, combined with a standard affine PCA classifier. Extensive comparative results on a four-class seismic database show the effectiveness of the proposed method in terms of seismic data labeling and object retrieval, in affordable computational time.
Other authors
12th IAPR Workshop on Document Analysis Systems 2016, Santorini, Greece April 14, 2016
Document digitization is becoming increasingly crucial. In this work, we propose a shape based approach for automatic stamp verification/detection in document images using an unsupervised feature learning. Given a small set of training images, our algorithm learns an appropriate shape representation using an unsupervised clustering. Experimental results demonstrate the effectiveness of our framework in challenging scenarios.
Other authors
Courses
Advanced Computer Vision
-
Algorithms in Medical Image Processing
-
Computer Vision
-
Estimation and Identification
-
Information Retrieval
-
Machine Learning
-
Probability and Random Processes
-
Reinforcement Learning
-
Honors & Awards
Cargill Global Scholar 2014-15 and 2015-16
Institute of International Education
May 2015
Recipient of the Cargill Global Scholarship awarded by the Institute of International Education to the top 10 students of India for exemplary academics and leadership skills.
IIT-JEE Mains and IIT-JEE Advanced 2013
-
Jun 2013
AIR 12 in JEE Mains and AIR 155 in JEE Advanced
International Astronomy Olympiad 2013 training camp
-
May 2013
Selected among India's top 30 students for the OCSC (Orientation Cum Selection Camp) 2013
Kishore Vaigyanik Protsahan Yojana (KVPY) Fellowship 2013
-
Feb 2013
Became a KVPY scholar with AIR 60 in the examination
Selected to appear for the prestigious INPhO, INChO and INMO examinations
-
Feb 2013
Awarded to be in top 300 of the nation in all the above examinations
Awarded a “Visharad” degree in Tabla by the Akhil Bhartiya Gandharva Mahavidyalaya, Miraj in 2011
-
May 2011
A "Visharad" in Tabla is equivalent to a Bachelor of Music
Other similar profiles
Explore top content on LinkedIn
Find curated posts and insights for relevant topics all in one place.