Jishnu Ray Chowdhury - Profile on Academia.edu (original) (raw)

Papers by Jishnu Ray Chowdhury

arXiv (Cornell University), Feb 1, 2024

In this paper, we study the inductive biases of two major approaches to augmenting Transformers w... more In this paper, we study the inductive biases of two major approaches to augmenting Transformers with a recurrent mechanism-(1) the approach of incorporating a depth-wise recurrence similar to Universal Transformers; and (2) the approach of incorporating a chunk-wise temporal recurrence like Temporal Latent Bottleneck. Furthermore, we propose and investigate novel ways to extend and combine the above methods-for example, we propose a global mean-based dynamic halting mechanism for Universal Transformer and an augmentation of Temporal Latent Bottleneck with elements from Universal Transformer. We compare the models and probe their inductive biases in several diagnostic tasks such as Long Range Arena (LRA), flip-flop language modeling, ListOps, and Logical Inference.

arXiv (Cornell University), Nov 6, 2023

Binary Balanced Tree Recursive Neural Networks (BBT-RvNNs) enforce sequence composition according... more Binary Balanced Tree Recursive Neural Networks (BBT-RvNNs) enforce sequence composition according to a preset balanced binary tree structure. Thus, their nonlinear recursion depth (which is the tree depth) is just log 2 n (n being the sequence length). Such logarithmic scaling makes BBT-RvNNs efficient and scalable on long sequence tasks such as Long Range Arena (LRA). However, such computational efficiency comes at a cost because BBT-RvNNs cannot solve simple arithmetic tasks like ListOps. On the flip side, RvNN models (e.g., Beam Tree RvNN) that do succeed on ListOps (and other structure-sensitive tasks like formal logical inference) are generally several times more expensive (in time and space) than even Recurrent Neural Networks. In this paper, we introduce a novel framework -Recursion in Recursion (RIR) to strike a balance between the two sides -getting some of the benefits from both worlds. In RIR, we use a form of two-level nested recursion -where the outer recursion is a k-ary balanced tree model with another recursive model (inner recursion) implementing its cell function. For the inner recursion, we choose Beam Tree RvNNs. To adjust Beam Tree RvNNs within RIR we also propose a novel strategy of beam alignment. Overall, this entails that the total recursive depth in RIR is upper-bounded by k log k n. Our best RIR-based model is the first model that demonstrates high (≥ 90%) length-generalization performance on ListOps while at the same time being scalable enough to be trainable on long sequence inputs from LRA (it can reduce the memory usage of the original Beam Tree RvNN by hundreds of times). Moreover, in terms of accuracy in the LRA language tasks, it performs competitively with Structured State Space Models (SSMs) without any special initialization -outperforming Transformers by a large margin. On the other hand, while SSMs can marginally outperform RIR on LRA, they (SSMs) fail to length-generalize on ListOps. Our code is available at: . 37th Conference on Neural Information Processing Systems (NeurIPS 2023).

arXiv (Cornell University), Dec 2, 2021

Keyphrase generation is the task of generating phrases (keyphrases) that summarize the main topic... more Keyphrase generation is the task of generating phrases (keyphrases) that summarize the main topics of a given document. Keyphrases can be either present or absent from the given document. While the extraction of present keyphrases has received much attention in the past, only recently a stronger focus has been placed on the generation of absent keyphrases. However, generating absent keyphrases is challenging; even the best methods show only a modest degree of success. In this paper, we propose a model-agnostic approach called keyphrase dropout (or KPDROP) to improve absent keyphrase generation. In this approach, we randomly drop present keyphrases from the document and turn them into artificial absent keyphrases during training. We test our approach extensively and show that it consistently improves the absent performance of strong baselines in both supervised and resource-constrained semi-supervised settings 1 .

arXiv (Cornell University), May 29, 2023

Keyphrase generation is the task of summarizing the contents of any given article into a few sali... more Keyphrase generation is the task of summarizing the contents of any given article into a few salient phrases (or keyphrases). Existing works for the task mostly rely on large-scale annotated datasets, which are not easy to acquire. Very few works address the problem of keyphrase generation in low-resource settings, but they still rely on a lot of additional unlabeled data for pretraining and on automatic methods for pseudo-annotations. In this paper, we present data augmentation strategies specifically to address keyphrase generation in purely resource-constrained domains. We design techniques that use the full text of the articles to improve both present and absent keyphrase generation. We test our approach comprehensively on three datasets and show that the data augmentation strategies consistently improve the state-of-the-art performance. We release our source code at https://github. com/kgarg8/kpgen-lowres-data-aug.

arXiv (Cornell University), Jul 20, 2023

Beam Tree Recursive Neural Network (BT-RvNN) was recently proposed as an extension of Gumbel Tree... more Beam Tree Recursive Neural Network (BT-RvNN) was recently proposed as an extension of Gumbel Tree RvNN and it was shown to achieve state-of-the-art length generalization performance in ListOps while maintaining comparable performance on other tasks. However, although better than previous approaches in terms of memory usage, BT-RvNN can be still exorbitantly expensive. In this paper, we identify the main bottleneck in BT-RvNN's memory usage to be the entanglement of the scorer function and the recursive cell function. We propose strategies to remove this bottleneck and further simplify its memory usage. Overall, our strategies not only reduce the memory usage of BT-RvNN by 10 − 16 times but also create a new state-of-the-art in ListOps while maintaining similar performance in other tasks. In addition, we also propose a strategy to utilize the induced latent-tree node representations produced by BT-RvNN to turn BT-RvNN from a sentence encoder of the form f : IR n×d → IR d into a token contextualizer of the form f : IR n×d → IR n×d. Thus, our proposals not only open up a path for further scalability of RvNNs but also standardize a way to use BT-RvNNs as another building block in the deep learning toolkit that can be easily stacked or interfaced with other popular models such as Transformers and Structured State Space models. Our code is available at the link: https://github.com/JRC1995/BeamRecursionFamily.

arXiv (Cornell University), May 31, 2023

We propose Beam Tree Recursive Cell (BT-Cell)-a backpropagation-friendly framework to extend Recu... more We propose Beam Tree Recursive Cell (BT-Cell)-a backpropagation-friendly framework to extend Recursive Neural Networks (RvNNs) with beam search for latent structure induction. We further extend this framework by proposing a relaxation of the hard top-k operators in beam search for better propagation of gradient signals. We evaluate our proposed models in different out-of-distribution splits in both synthetic and realistic data. Our experiments show that BT-Cell achieves near-perfect performance on several challenging structure-sensitive synthetic tasks like ListOps and logical inference while maintaining comparable performance in realistic data against other RvNN-based models. Additionally, we identify a previously unknown failure case for neural models in generalization to unseen number of arguments in ListOps. The code is available at: https://github.com/JRC1995/ BeamTreeRecursiveCells.

arXiv (Cornell University), May 31, 2023

We explore different ways to utilize positionbased cross-attention in seq2seq networks to enable ... more We explore different ways to utilize positionbased cross-attention in seq2seq networks to enable length generalization in algorithmic tasks. We show that a simple approach of interpolating the original and reversed encoded representations combined with relative attention allows near-perfect length generalization for both forward and reverse lookup tasks or copy tasks that had been generally hard to tackle. We also devise harder diagnostic tasks where the relative distance of the ideal attention position varies with timestep. In such settings, the simple interpolation trick with relative attention is not sufficient. We introduce novel variants of location attention building on top of Dubois et al. (2020) to address the new diagnostic tasks. We also show the benefits of our approaches for length generalization in SCAN (Lake & Baroni, 2018) and CFQ (Keysers et al., 2020). Our code is available on GitHub 1 .

arXiv (Cornell University), Jan 5, 2020

Tweet hashtags have the potential to improve the search for information during disaster events. H... more Tweet hashtags have the potential to improve the search for information during disaster events. However, there is a large number of disaster-related tweets that do not have any user-provided hashtags. Moreover, only a small number of tweets that contain actionable hashtags are useful for disaster response. To facilitate progress on automatic identification (or extraction) of disaster hashtags for Twitter data, we construct a unique dataset of disaster-related tweets annotated with hashtags useful for filtering actionable information. Using this dataset, we further investigate Long Short-Term Memory-based models within a Multi-Task Learning framework. The best performing model achieves an F1-score as high as 92.22%. The dataset, code, and other resources are available on Github. 1

Data Augmentation for Low-Resource Keyphrase Generation

Findings of the Association for Computational Linguistics: ACL 2023

Findings of the Association for Computational Linguistics: EMNLP 2022

arXiv (Cornell University), Apr 26, 2023

Keyphrase generation aims at generating topical phrases from a given text either by copying from ... more Keyphrase generation aims at generating topical phrases from a given text either by copying from the original text (present keyphrases) or by producing new keyphrases (absent keyphrases) that capture the semantic meaning of the text. Encoder-decoder models are most widely used for this task because of their capabilities for absent keyphrase generation. However, there has been little to no analysis on the performance and behavior of such models for keyphrase generation. In this paper, we study various tendencies exhibited by three strong models: T5 (based on a pre-trained transformer), CatSeq-Transformer (a non-pretrained Transformer), and ExHiRD (based on a recurrent neural network). We analyze prediction confidence scores, model calibration, and the effect of token position on keyphrases generation. Moreover, we motivate and propose a novel metric framework, SoftKeyScore, to evaluate the similarity between two sets of keyphrases by using softscores to account for partial matching and semantic similarity. We find that SoftKeyScore is more suitable than the standard F 1 metric for evaluating two sets of given keyphrases.

ArXiv, 2022

We study the task of predicting a set of salient questions from a given paragraph without any pri... more We study the task of predicting a set of salient questions from a given paragraph without any prior knowledge of the precise answer. We make two main contributions. First, we propose a new method to evaluate a set of predicted questions against the set of references by using the Hungarian algorithm to assign predicted questions to references before scoring the assigned pairs. We show that our proposed evaluation strategy has better theoretical and practical properties compared to prior methods because it can properly account for the coverage of references. Second, we compare different strategies to utilize a pre-trained seq2seq model to generate and select a set of questions related to a given paragraph. The code is available1.

ArXiv, 2021

Keyphrase generation aims at generating phrases (keyphrases) that best describe a given document.... more Keyphrase generation aims at generating phrases (keyphrases) that best describe a given document. In scholarly domains, current approaches to this task are neural approaches and have largely worked with only the title and abstract of the articles. In this work, we explore whether the integration of additional data from semantically similar articles or from the full text of the given article can be helpful for a neural keyphrase generation model. We discover that adding sentences from the full text particularly in the form of summary of the article can significantly improve the generation of both types of keyphrases that are either present or absent from the title and abstract. The experimental results on the three acclaimed models along with one of the latest transformer models suitable for longer documents, Longformer Encoder-Decoder (LED) validate the observation. We also present a new large-scale scholarly dataset FULLTEXTKP for keyphrase generation, which we use for our experime...

ArXiv, 2022

Paraphrase generation is a fundamental and long-standing task in natural language processing. In ... more Paraphrase generation is a fundamental and long-standing task in natural language processing. In this paper, we concentrate on two contributions to the task: (1) we propose Retrieval Augmented Prompt Tuning (RAPT) as a parameterefficient method to adapt large pre-trained language models for paraphrase generation; (2) we propose Novelty Conditioned RAPT (NC-RAPT) as a simple model-agnostic method of using specialized prompt tokens for controlled paraphrase generation with varying levels of lexical novelty. By conducting extensive experiments on four datasets, we demonstrate the effectiveness of the proposed approaches for retaining the semantic content of the original text while inducing lexical novelty in the generation.

Facilitating Cluster Counting in Multi-dimensional Feature Space by Intermediate Information Grouping

Previously, we showed that dividing 2D datasets into grid boxes could give satisfactory estimatio... more Previously, we showed that dividing 2D datasets into grid boxes could give satisfactory estimation of cluster count by detecting local maxima in data density relative to nearby grid boxes. The algorithm was robust for datasets with clusters of different sizes and distributions deviating from Gaussian distribution to a certain degree.

ArXiv, 2021

Keyphrase generation is the task of generating phrases (keyphrases) that summarize the main topic... more Keyphrase generation is the task of generating phrases (keyphrases) that summarize the main topics of a given document. The generated keyphrases can be either present or absent from the text of the given document. While the extraction of present keyphrases has received much attention in the past, only recently a stronger focus has been placed on the generation of absent keyphrases. However, generating absent keyphrases is very challenging; even the best methods show only a modest degree of success. In this paper, we propose an approach, called keyphrase dropout (or KPDROP), to improve absent keyphrase generation. We randomly drop present keyphrases from the document and turn them into artificial absent keyphrases during training. We test our approach extensively and show that it consistently improves the absent performance of strong baselines in keyphrase generation.

Facilitating Cluster Counting in Multi-dimensional Feature Space by Intermediate Information Grouping

Recursive Neural Networks (RvNNs), which compose sequences according to their underlying hierarch... more Recursive Neural Networks (RvNNs), which compose sequences according to their underlying hierarchical syntactic structure, have performed well in several natural language processing tasks compared to similar models without structural biases. However, traditional RvNNs are incapable of inducing the latent structure in a plain text sequence on their own. Several extensions have been proposed to overcome this limitation. Nevertheless, these extensions tend to rely on surrogate gradients or reinforcement learning at the cost of higher bias or variance. In this work, we propose Continuous Recursive Neural Network (CRvNN) as a backpropagation-friendly alternative to address the aforementioned limitations. This is done by incorporating a continuous relaxation to the induced structure. We demonstrate that CRvNN achieves strong performance in challenging synthetic tasks such as logical inference (Bowman et al., 2015b) and ListOps (Nangia & Bowman, 2018). We also show that CRvNN performs comp...

The World Wide Web Conference on - WWW '19, 2019

While keyphrase extraction has received considerable attention in recent years, relatively few st... more While keyphrase extraction has received considerable attention in recent years, relatively few studies exist on extracting keyphrases from social media platforms such as Twitter, and even fewer for extracting disaster-related keyphrases from such sources. During a disaster, keyphrases can be extremely useful for filtering relevant tweets that can enhance situational awareness. Previously, joint training of two different layers of a stacked Recurrent Neural Network for keyword discovery and keyphrase extraction had been shown to be effective in extracting keyphrases from general Twitter data. We improve the model's performance on both general Twitter data and disaster-related Twitter data by incorporating contextual word embeddings, POS-tags, phonetics, and phonological features. Moreover, we discuss the shortcomings of the often used F1-measure for evaluating the quality of predicted keyphrases with respect to the ground truth annotations. Instead of the F1-measure, we propose the use of embedding-based metrics to better capture the correctness of the predicted keyphrases. In addition, we also present a novel extension of an embedding-based metric. The extension allows one to better control the penalty for the difference in the number of ground-truth and predicted keyphrases.

Cross-Lingual Disaster-related Multi-label Tweet Classification with Manifold Mixup

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

arXiv (Cornell University), Feb 1, 2024

arXiv (Cornell University), Nov 6, 2023

arXiv (Cornell University), Dec 2, 2021

arXiv (Cornell University), May 29, 2023

arXiv (Cornell University), Jul 20, 2023

arXiv (Cornell University), May 31, 2023

arXiv (Cornell University), Jan 5, 2020

Data Augmentation for Low-Resource Keyphrase Generation

Findings of the Association for Computational Linguistics: ACL 2023

Findings of the Association for Computational Linguistics: EMNLP 2022

arXiv (Cornell University), Apr 26, 2023

ArXiv, 2022

ArXiv, 2021

ArXiv, 2022

Facilitating Cluster Counting in Multi-dimensional Feature Space by Intermediate Information Grouping

ArXiv, 2021

Keyphrase generation is the task of generating phrases (keyphrases) that summarize the main topic... more Keyphrase generation is the task of generating phrases (keyphrases) that summarize the main topics of a given document. The generated keyphrases can be either present or absent from the text of the given document. While the extraction of present keyphrases has received much attention in the past, only recently a stronger focus has been placed on the generation of absent keyphrases. However, generating absent keyphrases is very challenging; even the best methods show only a modest degree of success. In this paper, we propose an approach, called keyphrase dropout (or KPDROP), to improve absent keyphrase generation. We randomly drop present keyphrases from the document and turn them into artificial absent keyphrases during training. We test our approach extensively and show that it consistently improves the absent performance of strong baselines in keyphrase generation.

Facilitating Cluster Counting in Multi-dimensional Feature Space by Intermediate Information Grouping

The World Wide Web Conference on - WWW '19, 2019

Cross-Lingual Disaster-related Multi-label Tweet Classification with Manifold Mixup

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop