Saksham Suri (original) (raw)

Research

I am interested in solving problems using less supervision and uncurated as well as synthetic data. Recently I have been working on improving recognition using generation especially using diffusion models as synthetic data sources. I have previously explored tasks across recognition and generation focusing on different supervision strategies and propose modified architectures and losses to utilize the data better under different settings.

larp LARP: Tokenizing Videos 🎬 with a Learned Autoregressive Generative Prior 🚀 Hanyu Wang,Saksham Suri,Yixuan Ren,Hao Chen,Abhinav Shrivastava International Conference on Learning Representations (ICLR), 2025 Project Page | Paper Code We present LARP, a novel video tokenizer designed to overcome limitations in current video tokenization methods for autoregressive (AR) generative models.
maps MAPS: Memory Augmented Panoptic Segmentation Vatsal Agarwal,Saksham Suri,Max Ehrlich,Abhinav Shrivastava Under Review Paper Explore potential of VLM-generated neural memory for panoptic segmentation.
uvis UVIS: Unsupervised Video Instance Segmentation Shuaiyi Huang,Saksham Suri,Kamal Gupta,Saketh Rambhatla,Ser-nam Lim,Abhinav Shrivastava CVPR Workshop on Learning With Limited Labelled Data for Image and Video Understanding, 2024 Paper We propose an unsupervised approach for video instance segmentation, leveraging self-supervised learning methods to improve object instance tracking and segmentation across video frames.
gen2det Gen2Det: Generate to Detect Saksham Suri,Fanyi Xiao,Animesh Sinha,Sean Chang Culatana,Raghuraman Krishnamoorth,Chenchen Zhu,Abhinav Shrivastava Synthetic Data for Computer Vision Workshop @ CVPR 2024 Paper Utilizing synthetic data from state-of-the-art diffusion models to improve object detection and segmentation performance.
grit LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors Saksham Suri*,Matthew Walmer*,Kamal Gupta,Abhinav Shrivastava Under Submission Project Page | Paper Code Self-supervised and lightweight technique to learn a feature transform for generating dense features from pre-trained ViTs.
grit GRIT: GAN Residuals for Image-to-Image Translation Saksham Suri*,Moustafa Meshry*,Larry S. Davis,Abhinav Shrivastava Winter Conference on Applications of Computer Vision (WACV), 2024 Project Page | Paper Decouple the optimization of reconstruction and adversarial losses by synthesizing an image as a combination of its reconstruction (low-frequency) and GAN residual (high-frequency) components.
diff2lip Diff2Lip: Audio Conditioned Diffusion Models for Lip-Synchronization Soumik Mukhopadhyay,Saksham Suri,Ravi Teja Gadde,Abhinav Shrivastava Winter Conference on Applications of Computer Vision (WACV), 2024 Project Page We propose Diff2Lip an audio-conditioned diffusion-based model which is able to do lip synchronization in-the-wild while preserving image fidelity and identity.
saod SparseDet: Improving Sparsely Annotated Object Detection with Pseudo-positive Mining Saksham Suri*,Saketh Rambhatla*,Rama Chellappa,Abhinav Shrivastava IEEE/CVF International Conference on Computer Vision (ICCV) , 2023 Project Page | Paper Code Propose an end-to-end system that learns to separate the proposals into labeled and unlabeled regions using Pseudo-positive mining to tackle sparsely annotated object detection.
vit_analysis Teaching Matters: Investigating the Role of Supervision in Vision Transformers Matthew Walmer*,Saksham Suri*,Kamal Gupta,Abhinav Shrivastava IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2023 Project Page | Paper Code Study the effect of supervision and losses in trianing ViTs through attention, feature and downstream task based analysis.
gan Towards Discovery and Attribution of Open-world GAN Generated Images Sharath Girish*,Saksham Suri*,Saketh Rambhatla,Abhinav Shrivastava IEEE/CVF International Conference on Computer Vision (ICCV) , 2021 Project Page | Paper arXiv Proposed an iterative algorithm for discovering images generated from GANs in an open world setup. Also show applications in an online never ending discovery.
gan Learned Spatial Representations for Few-shot Talking-Head Synthesis Moustafa Meshry,Saksham Suri,Larry S. Davis,Abhinav Shrivastava IEEE/CVF International Conference on Computer Vision (ICCV) , 2021 Project Page | Paper arXiv We propose a novel framework which disentangles spatial and style information for image synthesis. A latent spatial layout for the target image is generated, which is used to produce per-pixel style modulation parameters for the final synthesis..
prl Improving Face Recognition Performance using TeCS2 Dictionary Saksham Suri,Anush Sankaran,Mayank Vatsa,Richa Singh Pattern Recognition Letters, 2020 Paper Incorporating task agnostic color, shape, texture and symmetry attributes to task specific deep learning classifiers for face recognition.
icip An Interpretable Generative Model for Handwritten Digits Synthesis Yao Zhu,Saksham Suri, Pranav Kulkarni, Yueru Chen,Jiali Duan,C. -C. Jay Kuo International Conference on Image Processing (ICIP) , 2019 Paper Propose a non deep learning based approach to handwritten digit synthesis which is more interpretable and does not require back-propogation.
ad Angel or Demon? Characterizing Variations Across Twitter Timeline of Technical Support Campaigners S. Gupta, G. S. Bhatia,Saksham Suri, D. Kuchhal, P. Gupta, M. Ahamad, M. Gupta, P. Kumaraguru The Journal of Web Science Vol.6 , 2019 Paper Analyzing and identifying the presence of fake tech support accounts on twitter.
btas On matching faces with alterations due to plastic surgery and disguise Saksham Suri,Anush Sankaran,Mayank Vatsa,Richa Singh IEEE International Conference on Biometrics Theory, Applications and Systems (BTAS) , 2018 Paper A novel approach to perform face recognition in the presence of plastic surgery and disguise.