Himanshu Sharad Bhatt | Indraprastha Institute of Information Technology(IIIT), Delhi (original) (raw)

Papers by Himanshu Sharad Bhatt

2019 International Conference on Document Analysis and Recognition (ICDAR)

Duplicate invoice payment is one of the most prominent challenges encountered by accounts payable... more Duplicate invoice payment is one of the most prominent challenges encountered by accounts payable operations, and whenever it occurs, it costs to the company. Due to large volume and variety of invoices across multiple suppliers, it is not pragmatic to manually examine every invoice to check if it is legitimate and has not been previously financed. This paper presents Digital Auditor (DA), an automated framework for detecting duplicate invoices. It is based on two principles 1) converting invoices into structured templates by extracting relevant information from the invoices and organizing it as key-value pairs and 2) machine learning based duplicate detection algorithm which compares corresponding fields between two invoices and identifies duplicate invoice pairs. Digital Auditor efficiently identifies duplicate pairs, and thus alleviates laborious manual efforts and time in inspecting the invoices against the previously paid invoices. To demonstrate the efficacy of Digital Auditor, this paper presents comprehensive experimental results and key observations from user-trials by business professionals on a large sample of invoices from a non-production environment.

Proceedings of the 28th ACM Conference on Hypertext and Social Media, 2017

Owing to the tremendous increase in the volume and variety of user generated content, train-once-... more Owing to the tremendous increase in the volume and variety of user generated content, train-once-apply-forever models are insufficient for supervised learning tasks. The need is to develop algorithms that can adapt across domains by leveraging labeled data from source domain(s) and efficiently perform the task in the unlabeled target domain. Towards this, we present a novel two-stage neural network learning algorithm for domain adaptation which learns a multi-part hidden layer where individual parts contribute differently to the tasks in source and target domains. The multiple parts of the representation (i.e. hidden layer) are learned while being cognizant of what characteristics to transfer across domains and what to preserve within domains for enhanced performance. The first stage embroils around learning a two-part representation i.e. source specific and common representations in a manner such that the former do not detract the ability of the later to represent the target domain. In the second stage, the generalized common representation is further iteratively extended with discriminating target specific characteristics to adapt to the target domain. We empirically demonstrate that the learned representations, in different arrangements, outperform existing domain adaptation algorithms in the source classification as well as the cross-domain classification tasks on the user generated content from different domains on the web.

Proceedings of the 3rd ACM India Joint International Conference on Data Science & Management of Data (8th ACM IKDD CODS & 26th COMAD), 2021

We demonstrate AART (AI Assisted Review Tool) which is a novel solution for any iterative documen... more We demonstrate AART (AI Assisted Review Tool) which is a novel solution for any iterative document review process like marketing creative review. AART presents multiple novel features including creating a rich structured representation for marketing documents, efficiently comparing multiple documents to identify differences, interpreting reviewer’s comments (natural language text) and transforming it into pseudo-instructions, and establishing cause-effect relation between the comments and differences. The interactive GUI of AART further provide multiple unique features that assists reviewers to perform their job more efficiently with reduced efforts.

arXiv (Cornell University), Sep 16, 2021

NLP research has been focused on NER extraction and how to efficiently extract them from a senten... more NLP research has been focused on NER extraction and how to efficiently extract them from a sentence. However, generating relevant context of entities from a sentence has remained under-explored. In this work we introduce the task CONTEXT-NER in which relevant context of an entity has to be generated. The extracted context may not be found exactly as a substring in the sentence. We also introduce the EDGAR10-Q dataset for the same, which is a corpus of 1,500 publicly traded companies. It is a manually created complex corpus and one of the largest in terms of number of sentences and entities (1 M and 2.8 M). We introduce a baseline approach that leverages phrase generation algorithms and uses the pre-trained BERT model to get 33% ROUGE-L score. We also do a one shot evaluation with GPT-3 and get 39% score, signifying the hardness and future scope of this task. We hope that addition of this dataset and our study will pave the way for further research in this domain. 1 .

We present a transfer learning approach for Title Detection in FinToC 2020 challenge. Our propose... more We present a transfer learning approach for Title Detection in FinToC 2020 challenge. Our proposed approach relies on the premise that the geometric layout and character features of the titles and non-titles can be learnt separately from a large corpus, and their learning can then be transferred to a domain-specific dataset. On a domain-specific dataset, we train a Deep Neural Net on the text of the document along with a pre-trained model for geometric and character features. We achieved an F-Score of 83.25 on the test set and secured top rank in the title detection task in FinToC 2020 (Bentabet et al., 2020)

ArXiv, 2016

Automatic short answer grading (ASAG) techniques are designed to automatically assess short answe... more Automatic short answer grading (ASAG) techniques are designed to automatically assess short answers to questions in natural language, having a length of a few words to a few sentences. Supervised ASAG techniques have been demonstrated to be effective but suffer from a couple of key practical limitations. They are greatly reliant on instructor provided model answers and need labeled training data in the form of graded student answers for every assessment task. To overcome these, in this paper, we introduce an ASAG technique with two novel features. We propose an iterative technique on an ensemble of (a) a text classifier of student answers and (b) a classifier using numeric features derived from various similarity measures with respect to model answers. Second, we employ canonical correlation analysis based transfer learning on a common feature representation to build the classifier ensemble for questions having no labelled data. The proposed technique handsomely beats all winning su...

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019

Learning representations such that the source and target distributions appear as similar as possi... more Learning representations such that the source and target distributions appear as similar as possible has benefited transfer learning tasks across several applications. Generally it requires labeled data from the source and only unlabeled data from the target to learn such representations. While these representations act like a bridge to transfer knowledge learned in the source to the target; they may lead to negative transfer when the source specific characteristics detract their ability to represent the target data. We present a novel neural network architecture to simultaneously learn a two-part representation which is based on the principle of segregating source specific representation from the common representation. The first part captures the source specific characteristics while the second part captures the truly common representation. Our architecture optimizes an objective function which acts adversarial for the source specific part if it contributes towards the cross-domain learning. We empirically show that two parts of the representation, in different arrangements, outperforms existing learning algorithms on the source learning as well as cross-domain tasks on multiple datasets.

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2018

Getting manually labeled data in each domain is always an expensive and a time consuming task. Cr... more Getting manually labeled data in each domain is always an expensive and a time consuming task. Cross-domain sentiment analysis has emerged as a demanding concept where a labeled source domain facilitates a sentiment classifier for an unlabeled target domain. However, polarity orientation (positive or negative) and the significance of a word to express an opinion often differ from one domain to another domain. Owing to these differences, crossdomain sentiment classification is still a challenging task. In this paper, we propose that words that do not change their polarity and significance represent the transferable (usable) information across domains for cross-domain sentiment classification. We present a novel approach based on χ 2 test and cosine-similarity between context vector of words to identify polarity preserving significant words across domains. Furthermore, we show that a weighted ensemble of the classifiers enhances the cross-domain classification performance.

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2016

Advances in transfer learning have let go the limitations of traditional supervised machine learn... more Advances in transfer learning have let go the limitations of traditional supervised machine learning algorithms for being dependent on annotated training data for training new models for every new domain. However, several applications encounter scenarios where models need to transfer/adapt across domains when the label sets vary both in terms of count of labels as well as their connotations. This paper presents first-of-its-kind transfer learning algorithm for cross-domain classification with multiple source domains and disparate label sets. It starts with identifying transferable knowledge from across multiple domains that can be useful for learning the target domain task. This knowledge in the form of selective labeled instances from different domains is congregated to form an auxiliary training set which is used for learning the target domain task. Experimental results validate the efficacy of the proposed algorithm against strong baselines on a real world social media and the 20 Newsgroups datasets.

Proceedings of the Nineteenth Conference on Computational Natural Language Learning, 2015

Supervised machine learning classification algorithms assume both train and test data are sampled... more Supervised machine learning classification algorithms assume both train and test data are sampled from the same domain or distribution. However, performance of the algorithms degrade for test data from different domain. Such cross domain classification is arduous as features in the test domain may be different and absence of labeled data could further exacerbate the problem. This paper proposes an algorithm to adapt classification model by iteratively learning domain specific features from the unlabeled test data. Moreover, this adaptation transpires in a similarity aware manner by integrating similarity between domains in the adaptation setting. Cross-domain classification experiments on different datasets, including a real world dataset, demonstrate efficacy of the proposed algorithm over state-of-theart.

Encyclopedia of Biometrics, 2015

Pattern Recognition, 2015

Biometrics, the science of verifying the identity of individuals, is increasingly being used in s... more Biometrics, the science of verifying the identity of individuals, is increasingly being used in several applications such as assisting law enforcement agencies to control crime and fraud. Existing techniques are unable to provide significant levels of accuracy in uncontrolled noisy environments. Further, scalability is another challenge due to variations in data distribution with changing conditions. This paper presents an adaptive context switching algorithm coupled with online learning to address both these challenges. The proposed framework, termed as QFuse, uses the quality of input images to dynamically select the best biometric matcher or fusion algorithm to verify the identity of an individual. The proposed algorithm continuously updates the selection process using online learning to address the scalability and accommodate the variations in data distribution. The results on the WVU multimodal database and a large real world multimodal database obtained from a law enforcement agency show the efficacy of the proposed framework.

2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS), 2013

Matching sketches with digital face images is one of the highly sought face recognition applicati... more Matching sketches with digital face images is one of the highly sought face recognition applications. An important aspect, though less explored till now, is matching age separated sketches with digital face images. Several law enforcement agencies are progressively using composite sketches for apprehending individuals. This research proposes an algorithm for matching composite sketches with digital face images across different age. It extracts discriminative shape, orientation, and texture features from local regions of a face using image moments and histogram of oriented gradients. The complementary information from these two features is further combined at match score level for efficiently matching composite sketches with digital face images across different age. To study the effects of age variations, this research also presents a composite sketch database with age separated sketches and digital face images. The results on a large gallery experiment suggest that the proposed algorithm efficiently encodes discriminative information from local facial regions useful for matching composite sketches with age separated digital face images.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2010

Face recognition systems has engrossed much attention and has been applied in various domains, pr... more Face recognition systems has engrossed much attention and has been applied in various domains, primarily for surveillance, security, access control and law enforcement. In recent years much advancement have been made in face recognition techniques to cater to the challenges such as pose, expression, illumination, aging and disguise. However, due to advances in technology, there are new emerging challenges for which the performance of face recognition systems degrades and plastic/cosmetic surgery is one of them. In this paper we comment on the effect of plastic surgery on face recognition algorithms and various social, ethical and engineering challenges associated with it.

Emerging Techniques and Challenges for Hand-Based Biometrics, ETCHB 2010, 2010

In large scale deployment of fingerprint recognition systems, especially in Indian environment, t... more In large scale deployment of fingerprint recognition systems, especially in Indian environment, there are some challenges involved. Along with sensor noise and poor image quality, presence of scars, warts, and deteriorating ridge/minutiae patterns in fingerprints from rural population affect the data distribution. In other words, quality of fingerprint patterns, particularly belonging to rural Indian population, may differ from standard urban or western population and may be difficult to process. Since there is no study that analyzes fingerprint images in Indian context, this paper presents an analytical study using standard fingerprint image quality assessment tool and fingerprint databases collected from the rural and urban Indian population. On a database of over 0.25 million images, we observe the patterns that are worn and damaged cause poor quality ridges and therefore can affect the performance. Also, region specific causes such as manual labor and Lawsonia Inermis also degrade the quality of fingerprints.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society, Jan 9, 2014

Face recognition algorithms are generally trained for matching high resolution images and they pe... more Face recognition algorithms are generally trained for matching high resolution images and they perform well for similar resolution test data. However, the performance of such systems degrade when a low resolution face image captured in unconstrained settings such as videos from cameras in a surveillance scenario are matched with high resolution gallery images. The primary challenge here is to extract discriminating features from limited biometric content in low resolution images and match it to information rich high resolution face images. The problem of cross-resolution face matching is further alleviated when there is limited labeled positive data for training face recognition algorithms. In this paper, the problem of cross-resolution face matching is addressed where low resolution images are matched with high resolution gallery. A co-transfer learning framework is proposed which is a cross-pollination of transfer learning and co-training paradigms and is applied for cross-resolut...

2010 Fourth IEEE International Conference on Biometrics: Theory, Applications and Systems (BTAS), 2010

AbstractExisting face recognition systems have demon-strated success in constrained settings wit... more AbstractExisting face recognition systems have demon-strated success in constrained settings with limited variability in illumination, pose, and expression. However, these incremental improvements are not sufficient to transcend the challenging applications such as identifying missing ...

2010 Fourth IEEE International Conference on Biometrics: Theory, Applications and Systems (BTAS), 2010

This paper presents an efficient algorithm for matching sketches with digital face images. The al... more This paper presents an efficient algorithm for matching sketches with digital face images. The algorithm extracts discriminating information present in local facial regions at different levels of granularity. Both sketches and digital images are decomposed into multi-resolution pyramid to conserve high frequency information which forms the discriminating facial patterns. Extended uniform circular local binary pattern based descriptors use these patterns to form a unique signature of the face image. Further, for matching, a genetic optimization based approach is proposed to find the optimum weights corresponding to each facial region. The information obtained from different levels of Laplacian pyramid are combined to improve the identification accuracy. Experimental results on sketch-digital image pairs from the CUHK and IIIT-D databases show that the proposed algorithm can provide better identification performance compared to existing algorithms.

2011 International Joint Conference on Biometrics (IJCB), 2011

Multibiometric systems fuse the evidence (e.g., match scores) pertaining to multiple biometric mo... more Multibiometric systems fuse the evidence (e.g., match scores) pertaining to multiple biometric modalities or classifiers. Most score-level fusion schemes discussed in the literature require the processing (i.e., feature extraction and matching) of every modality prior to invoking the fusion scheme. This paper presents a framework for dynamic classifier selection and fusion based on the quality of the gallery and probe images associated with each modality with multiple classifiers. The quality assessment algorithm for each biometric modality computes a quality vector for the gallery and probe images that is used for classifier selection. These vectors are used to train Support Vector Machines (SVMs) for decision making. In the proposed framework, the biometric modalities are arranged sequentially such that the stronger biometric modality has higher priority for being processed. Since fusion is required only when all unimodal classifiers are rejected by the SVM classifiers, the average computational time of the proposed framework is significantly reduced. Experimental results on different multimodal databases involving face and fingerprint show that the proposed quality-based classifier selection framework yields good performance even when the quality of the biometric sample is sub-optimal.

2019 International Conference on Document Analysis and Recognition (ICDAR)

Proceedings of the 28th ACM Conference on Hypertext and Social Media, 2017

Proceedings of the 3rd ACM India Joint International Conference on Data Science & Management of Data (8th ACM IKDD CODS & 26th COMAD), 2021

arXiv (Cornell University), Sep 16, 2021

ArXiv, 2016

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2018

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2016

Proceedings of the Nineteenth Conference on Computational Natural Language Learning, 2015

Encyclopedia of Biometrics, 2015

Pattern Recognition, 2015

2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS), 2013

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2010

Emerging Techniques and Challenges for Hand-Based Biometrics, ETCHB 2010, 2010

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society, Jan 9, 2014

2010 Fourth IEEE International Conference on Biometrics: Theory, Applications and Systems (BTAS), 2010

2011 International Joint Conference on Biometrics (IJCB), 2011