Dr Irshad Bhat - Academia.edu (original) (raw)
Papers by Dr Irshad Bhat
We investigate the problem of parsing conversational data of morphologically-rich languages such ... more We investigate the problem of parsing conversational data of morphologically-rich languages such as Hindi where argument scrambling occurs frequently. We evaluate a state-of-the-art non-linear transition-based parsing system on a new dataset containing 506 dependency trees for sentences from Bollywood (Hindi) movie scripts and Twitter posts of Hindi monolingual speakers. We show that a dependency parser trained on a newswire treebank is strongly biased towards the canonical structures and degrades when applied to conversational data. Inspired by Transformational Generative Grammar (Chomsky, 1965), we mitigate the sampling bias by generating all theoretically possible alternative word orders of a clause from the existing (kernel) structures in the treebank. Training our parser on canonical and transformed structures improves performance on conversational data by around 9% LAS over the baseline newswire parser.
In this paper we present our submission for FIRE 2016 Shared Task on Code Mixed Entity Extraction... more In this paper we present our submission for FIRE 2016 Shared Task on Code Mixed Entity Extraction in Indian Languages. We describe a Neural Network system for Entity Extraction in Hindi-English Code Mixed text. Our method uses distributed word representations as features for the Neural Network and therefore, can easily be replicated across languages. Our system ranked first place for Hindi-English with an F1-score of 68.24% .
Proceedings of the 2nd Workshop on New Frontiers in Summarization, 2019
In recent years, the task of Question Answering over passages, also pitched as a reading comprehe... more In recent years, the task of Question Answering over passages, also pitched as a reading comprehension, has evolved into a very active research area. A reading comprehension system extracts a span of text, comprising of named entities, dates, small phrases, etc., which serve as the answer to a given question. However, these spans of text would result in an unnatural reading experience in a conversational system. Usually, dialogue systems solve this issue by using template-based language generation. These systems, though adequate for a domain specific task, are too restrictive and predefined for a domain independent system. In order to present the user with a more conversational experience, we propose a pointer generator based full-length answer generator which can be used with most QA systems. Our system generates a full-length answer given a question and the extracted factoid/span answer without relying on the passage from where the answer was extracted. We also present a dataset of 315,000 question, factoid answer and full-length answer triples. We have evaluated our system using ROUGE-1,2,L and BLEU and achieved 74.05 BLEU score and 86.25 Rogue-L score.
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, 2017
In this paper, we propose efficient and less resource-intensive strategies for parsing of code-mi... more In this paper, we propose efficient and less resource-intensive strategies for parsing of code-mixed data. These strategies are not constrained by in-domain annotations, rather they leverage pre-existing monolingual annotated resources for training. We show that these methods can produce significantly better results as compared to an informed baseline. Besides, we also present a data set of 450 Hindi and English code-mixed tweets of Hindi multilingual speakers for evaluation. The data set is manually annotated with Universal Dependencies.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2018
Instructional texts, such as articles in wikiHow, describe the actions necessary to accomplish a ... more Instructional texts, such as articles in wikiHow, describe the actions necessary to accomplish a certain goal. In wikiHow and other resources, such instructions are subject to revision edits on a regular basis. Do these edits improve instructions only in terms of style and correctness, or do they provide clarifications necessary to follow the instructions and to accomplish the goal? We describe a resource and first studies towards answering this question. Specifically, we create wikiHowToImprove, a collection of revision histories for about 2.7 million sentences from about 246000 wikiHow articles. We describe human annotation studies on categorizing a subset of sentence-level edits and provide baseline models for the task of automatically distinguishing “older” from “newer” revisions of a sentence.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
wikiHow is a resource of how-to guides that describe the steps necessary to accomplish a goal. Gu... more wikiHow is a resource of how-to guides that describe the steps necessary to accomplish a goal. Guides in this resource are regularly edited by a community of users, who try to improve instructions in terms of style, clarity and correctness. In this work, we test whether the need for such edits can be predicted automatically. For this task, we extend an existing resource of textual edits with a complementary set of approx. 4 million sentences that remain unedited over time and report on the outcome of two revision modeling experiments.
Journal of Luminescence, 2015
We have synthesized the Zn 0.95 Co 0.05 O nanoparticles using a sol-gel method with different pH ... more We have synthesized the Zn 0.95 Co 0.05 O nanoparticles using a sol-gel method with different pH values under acid base and basic base reactions. The structural properties were characterized using powder X-ray diffraction (XRD) and scanning electron microscope (SEM) while the optical properties were characterized using Ultraviolet/Visible (UV/Vis) spectroscopy, Fourier transform infra red (FTIR) spectroscopy and Raman spectroscopy. Both samples are found in single phase with wurtzite structure. SEM micrographs show spherical nanoparticles for the sample synthesized under acid base reaction (pH ¼ 6) and nanorods for the sample synthesized under basic base reaction (pH ¼9). The band gap estimated using UV/Vis spectra shows an increase with the increase in pH value. FTIR spectra show that typical stretching mode peak of ZnO at 497 cm À 1. On Co doping this absorption peak shifted to 446 cm À 1 (for pH ¼ 6 sample) and to 478 cm À 1 (for pH ¼ 9 sample). Raman spectra of cobalt doped samples reveal presence of small amounts of ZnCo 2 O 4 and Co 3 O 4. Fluorescence spectra display blue shift in near band emission peak of ZnO with cobalt doping and pH variation.
We investigate the problem of parsing conversational data of morphologically-rich languages such ... more We investigate the problem of parsing conversational data of morphologically-rich languages such as Hindi where argument scrambling occurs frequently. We evaluate a state-of-the-art non-linear transition-based parsing system on a new dataset containing 506 dependency trees for sentences from Bollywood (Hindi) movie scripts and Twitter posts of Hindi monolingual speakers. We show that a dependency parser trained on a newswire treebank is strongly biased towards the canonical structures and degrades when applied to conversational data. Inspired by Transformational Generative Grammar (Chomsky, 1965), we mitigate the sampling bias by generating all theoretically possible alternative word orders of a clause from the existing (kernel) structures in the treebank. Training our parser on canonical and transformed structures improves performance on conversational data by around 9% LAS over the baseline newswire parser.
In this paper we present our submission for FIRE 2016 Shared Task on Code Mixed Entity Extraction... more In this paper we present our submission for FIRE 2016 Shared Task on Code Mixed Entity Extraction in Indian Languages. We describe a Neural Network system for Entity Extraction in Hindi-English Code Mixed text. Our method uses distributed word representations as features for the Neural Network and therefore, can easily be replicated across languages. Our system ranked first place for Hindi-English with an F1-score of 68.24% .
Proceedings of the 2nd Workshop on New Frontiers in Summarization, 2019
In recent years, the task of Question Answering over passages, also pitched as a reading comprehe... more In recent years, the task of Question Answering over passages, also pitched as a reading comprehension, has evolved into a very active research area. A reading comprehension system extracts a span of text, comprising of named entities, dates, small phrases, etc., which serve as the answer to a given question. However, these spans of text would result in an unnatural reading experience in a conversational system. Usually, dialogue systems solve this issue by using template-based language generation. These systems, though adequate for a domain specific task, are too restrictive and predefined for a domain independent system. In order to present the user with a more conversational experience, we propose a pointer generator based full-length answer generator which can be used with most QA systems. Our system generates a full-length answer given a question and the extracted factoid/span answer without relying on the passage from where the answer was extracted. We also present a dataset of 315,000 question, factoid answer and full-length answer triples. We have evaluated our system using ROUGE-1,2,L and BLEU and achieved 74.05 BLEU score and 86.25 Rogue-L score.
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, 2017
In this paper, we propose efficient and less resource-intensive strategies for parsing of code-mi... more In this paper, we propose efficient and less resource-intensive strategies for parsing of code-mixed data. These strategies are not constrained by in-domain annotations, rather they leverage pre-existing monolingual annotated resources for training. We show that these methods can produce significantly better results as compared to an informed baseline. Besides, we also present a data set of 450 Hindi and English code-mixed tweets of Hindi multilingual speakers for evaluation. The data set is manually annotated with Universal Dependencies.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2018
Instructional texts, such as articles in wikiHow, describe the actions necessary to accomplish a ... more Instructional texts, such as articles in wikiHow, describe the actions necessary to accomplish a certain goal. In wikiHow and other resources, such instructions are subject to revision edits on a regular basis. Do these edits improve instructions only in terms of style and correctness, or do they provide clarifications necessary to follow the instructions and to accomplish the goal? We describe a resource and first studies towards answering this question. Specifically, we create wikiHowToImprove, a collection of revision histories for about 2.7 million sentences from about 246000 wikiHow articles. We describe human annotation studies on categorizing a subset of sentence-level edits and provide baseline models for the task of automatically distinguishing “older” from “newer” revisions of a sentence.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
wikiHow is a resource of how-to guides that describe the steps necessary to accomplish a goal. Gu... more wikiHow is a resource of how-to guides that describe the steps necessary to accomplish a goal. Guides in this resource are regularly edited by a community of users, who try to improve instructions in terms of style, clarity and correctness. In this work, we test whether the need for such edits can be predicted automatically. For this task, we extend an existing resource of textual edits with a complementary set of approx. 4 million sentences that remain unedited over time and report on the outcome of two revision modeling experiments.
Journal of Luminescence, 2015
We have synthesized the Zn 0.95 Co 0.05 O nanoparticles using a sol-gel method with different pH ... more We have synthesized the Zn 0.95 Co 0.05 O nanoparticles using a sol-gel method with different pH values under acid base and basic base reactions. The structural properties were characterized using powder X-ray diffraction (XRD) and scanning electron microscope (SEM) while the optical properties were characterized using Ultraviolet/Visible (UV/Vis) spectroscopy, Fourier transform infra red (FTIR) spectroscopy and Raman spectroscopy. Both samples are found in single phase with wurtzite structure. SEM micrographs show spherical nanoparticles for the sample synthesized under acid base reaction (pH ¼ 6) and nanorods for the sample synthesized under basic base reaction (pH ¼9). The band gap estimated using UV/Vis spectra shows an increase with the increase in pH value. FTIR spectra show that typical stretching mode peak of ZnO at 497 cm À 1. On Co doping this absorption peak shifted to 446 cm À 1 (for pH ¼ 6 sample) and to 478 cm À 1 (for pH ¼ 9 sample). Raman spectra of cobalt doped samples reveal presence of small amounts of ZnCo 2 O 4 and Co 3 O 4. Fluorescence spectra display blue shift in near band emission peak of ZnO with cobalt doping and pH variation.