Saksham Suri (original) (raw)

Research

I am interested in solving problems using less supervision and uncurated as well as synthetic data. Recently I have been working on improving recognition using generation especially using diffusion models as synthetic data sources. I have previously explored tasks across recognition and generation focusing on different supervision strategies and propose modified architectures and losses to utilize the data better under different settings.

	LARP: Tokenizing Videos 🎬 with a Learned Autoregressive Generative Prior 🚀 Hanyu Wang,Saksham Suri,Yixuan Ren,Hao Chen,Abhinav Shrivastava International Conference on Learning Representations (ICLR), 2025 Project Page \| Paper	Code We present LARP, a novel video tokenizer designed to overcome limitations in current video tokenization methods for autoregressive (AR) generative models.
	MAPS: Memory Augmented Panoptic Segmentation Vatsal Agarwal,Saksham Suri,Max Ehrlich,Abhinav Shrivastava Under Review Paper Explore potential of VLM-generated neural memory for panoptic segmentation.
	UVIS: Unsupervised Video Instance Segmentation Shuaiyi Huang,Saksham Suri,Kamal Gupta,Saketh Rambhatla,Ser-nam Lim,Abhinav Shrivastava CVPR Workshop on Learning With Limited Labelled Data for Image and Video Understanding, 2024 Paper We propose an unsupervised approach for video instance segmentation, leveraging self-supervised learning methods to improve object instance tracking and segmentation across video frames.
	Gen2Det: Generate to Detect Saksham Suri,Fanyi Xiao,Animesh Sinha,Sean Chang Culatana,Raghuraman Krishnamoorth,Chenchen Zhu,Abhinav Shrivastava Synthetic Data for Computer Vision Workshop @ CVPR 2024 Paper Utilizing synthetic data from state-of-the-art diffusion models to improve object detection and segmentation performance.
	LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors Saksham Suri,Matthew Walmer,Kamal Gupta,Abhinav Shrivastava Under Submission Project Page \| Paper	Code Self-supervised and lightweight technique to learn a feature transform for generating dense features from pre-trained ViTs.
	GRIT: GAN Residuals for Image-to-Image Translation Saksham Suri,Moustafa Meshry,Larry S. Davis,Abhinav Shrivastava Winter Conference on Applications of Computer Vision (WACV), 2024 Project Page \| Paper Decouple the optimization of reconstruction and adversarial losses by synthesizing an image as a combination of its reconstruction (low-frequency) and GAN residual (high-frequency) components.
	Diff2Lip: Audio Conditioned Diffusion Models for Lip-Synchronization Soumik Mukhopadhyay,Saksham Suri,Ravi Teja Gadde,Abhinav Shrivastava Winter Conference on Applications of Computer Vision (WACV), 2024 Project Page We propose Diff2Lip an audio-conditioned diffusion-based model which is able to do lip synchronization in-the-wild while preserving image fidelity and identity.
	SparseDet: Improving Sparsely Annotated Object Detection with Pseudo-positive Mining Saksham Suri,Saketh Rambhatla,Rama Chellappa,Abhinav Shrivastava IEEE/CVF International Conference on Computer Vision (ICCV) , 2023 Project Page \| Paper	Code Propose an end-to-end system that learns to separate the proposals into labeled and unlabeled regions using Pseudo-positive mining to tackle sparsely annotated object detection.
	Teaching Matters: Investigating the Role of Supervision in Vision Transformers Matthew Walmer,Saksham Suri,Kamal Gupta,Abhinav Shrivastava IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2023 Project Page \| Paper	Code Study the effect of supervision and losses in trianing ViTs through attention, feature and downstream task based analysis.
	Towards Discovery and Attribution of Open-world GAN Generated Images Sharath Girish,Saksham Suri,Saketh Rambhatla,Abhinav Shrivastava IEEE/CVF International Conference on Computer Vision (ICCV) , 2021 Project Page \| Paper	arXiv Proposed an iterative algorithm for discovering images generated from GANs in an open world setup. Also show applications in an online never ending discovery.
	Learned Spatial Representations for Few-shot Talking-Head Synthesis Moustafa Meshry,Saksham Suri,Larry S. Davis,Abhinav Shrivastava IEEE/CVF International Conference on Computer Vision (ICCV) , 2021 Project Page \| Paper	arXiv We propose a novel framework which disentangles spatial and style information for image synthesis. A latent spatial layout for the target image is generated, which is used to produce per-pixel style modulation parameters for the final synthesis..
	Improving Face Recognition Performance using TeCS2 Dictionary Saksham Suri,Anush Sankaran,Mayank Vatsa,Richa Singh Pattern Recognition Letters, 2020 Paper Incorporating task agnostic color, shape, texture and symmetry attributes to task specific deep learning classifiers for face recognition.
	An Interpretable Generative Model for Handwritten Digits Synthesis Yao Zhu,Saksham Suri, Pranav Kulkarni, Yueru Chen,Jiali Duan,C. -C. Jay Kuo International Conference on Image Processing (ICIP) , 2019 Paper Propose a non deep learning based approach to handwritten digit synthesis which is more interpretable and does not require back-propogation.
	Angel or Demon? Characterizing Variations Across Twitter Timeline of Technical Support Campaigners S. Gupta, G. S. Bhatia,Saksham Suri, D. Kuchhal, P. Gupta, M. Ahamad, M. Gupta, P. Kumaraguru The Journal of Web Science Vol.6 , 2019 Paper Analyzing and identifying the presence of fake tech support accounts on twitter.
	On matching faces with alterations due to plastic surgery and disguise Saksham Suri,Anush Sankaran,Mayank Vatsa,Richa Singh IEEE International Conference on Biometrics Theory, Applications and Systems (BTAS) , 2018 Paper A novel approach to perform face recognition in the presence of plastic surgery and disguise.