Dr B Venkata Seshu Kumari (original) (raw)

Uploads

Papers by Dr B Venkata Seshu Kumari

Research paper thumbnail of Improving the Usability of Statistical Parsers by Incorporating Linguistic Constraints

Statistical systems with high accuracy are very useful in real-world applications. If these syste... more Statistical systems with high accuracy are very useful in real-world applications. If these systems can capture basic linguistic information, then the usefulness of these statistical systems improve a lot. This paper is an attempt at incorporating linguistic constraints in statistical dependency parsing. We consider a simple linguistic constraint that a verb should not have multiple subjects/objects as its children in the dependency tree. We first describe the importance of this constraint considering Machine Translation systems which use dependency parser output, as an example application. We then show how the current state-of-the-art dependency parsers violate this constraint. We present two new methods to handle this constraint. We evaluate our methods on the state-of-the-art dependency parsers for Hindi.

Research paper thumbnail of Exploring Different Approaches for Parsing Telugu

Handbook of Experimental Pharmacology, 2019

In this paper we explore different approaches for parsing Telugu. We consider three popular depen... more In this paper we explore different approaches for parsing Telugu. We consider three popular dependency parsers namely, MaltParser, MSTParser and TurboParser. We first experiment with different parser and feature settings and show the impact of different settings. We then explore different ways of ensembling these parsers. We also provide a detailed analysis of the performance of all the approaches on major dependency labels and different distance ranges. We report our results on test data of Telugu dependency treebank provided in the ICON 2010 tools contest on Indian languages dependency parsing. We obtain state-of-the art performance of 91.8% in unlabelled attachment score and 70.0% in labelled attachment score.

Research paper thumbnail of Two approaches for incorporating linguistic constraints to improve the usability of Telugu dependency parser

International Journal of Applied Pattern Recognition, 2016

Statistical systems with high accuracy are very useful in real-world applications. If these syste... more Statistical systems with high accuracy are very useful in real-world applications. If these systems can capture basic linguistic information, then the usefulness of these statistical systems improves a lot. This paper is an attempt at incorporating linguistic constraints in statistical dependency parsing. We consider a simple linguistic constraint that a verb should not have multiple subjects or direct objects as its children in the dependency tree. We first describe the importance of this constraint considering machine translation systems which use dependency parser output, as an example application. We then show how the current state-of-the-art dependency parsers violate this constraint. We describe two methods to handle this constraint. We evaluate our methods on the state-of-the-art dependency Telugu parser. Our results show that we can build a statistical parser which handles linguistic constraints and thus be more useful in real-world applications without compromising accuracy.

Research paper thumbnail of Telugu dependency parsing using different statistical parsers

Journal of King Saud University - Computer and Information Sciences, 2017

In this paper we explore different statistical dependency parsers for parsing Telugu. We consider... more In this paper we explore different statistical dependency parsers for parsing Telugu. We consider five popular dependency parsers namely, MaltParser, MSTParser, TurboParser, ZPar and Easy-First Parser. We experiment with different parser and feature settings and show the impact of different settings. We also provide a detailed analysis of the performance of all the parsers on major dependency labels. We report our results on test data of Telugu dependency treebank provided in the ICON 2010 tools contest on Indian languages dependency parsing. We obtain state-of-the art performance of 91.8% in unlabeled attachment score and 70.0% in labeled attachment score. To the best of our knowledge ours is the only work which explored all the five popular dependency parsers and compared the performance under different feature settings for Telugu.

Research paper thumbnail of Improving Telugu Dependency Parsing using Combinatory Categorial Grammar Supertags

ACM Transactions on Asian and Low-Resource Language Information Processing, 2015

We show that Combinatory Categorial Grammar (CCG) supertags can improve Telugu dependency parsing... more We show that Combinatory Categorial Grammar (CCG) supertags can improve Telugu dependency parsing. In this process, we first extract a CCG lexicon from the dependency treebank. Using both the CCG lexicon and the dependency treebank, we create a CCG treebank using a chart parser. Exploring different morphological features of Telugu, we develop a supertagger using maximum entropy models. We provide CCG supertags as features to the Telugu dependency parser (MST parser). We get an improvement of 1.8% in the unlabelled attachment score and 2.2% in the labelled attachment score. Our results show that CCG supertags improve the MST parser, especially on verbal arguments for which it has weak rates of recovery.

Research paper thumbnail of Hindi Dependency Parsing using a combined model of Malt and MST

In this paper we present our experiments in parsing Hindi. We first explored Malt and MST parsers... more In this paper we present our experiments in parsing Hindi. We first explored Malt and MST parsers. Considering pros of both these parsers, we developed a hybrid approach combining the output of these two parsers in an intuitive manner. We report our results on both development and test data provided in the Hindi Shared Task on Parsing at workshop on MT and parsing in Indian Languages, Coling 2012. Our system secured labeled attachment score of 90.66% and 80.77% for gold standard and automatic tracks respectively. These accuracies are 3 best and 5 best for gold standard and automatic tracks respectively.

Research paper thumbnail of Improving the Usability of Statistical Parsers by Incorporating Linguistic Constraints

Statistical systems with high accuracy are very useful in real-world applications. If these syste... more Statistical systems with high accuracy are very useful in real-world applications. If these systems can capture basic linguistic information, then the usefulness of these statistical systems improve a lot. This paper is an attempt at incorporating linguistic constraints in statistical dependency parsing. We consider a simple linguistic constraint that a verb should not have multiple subjects/objects as its children in the dependency tree. We first describe the importance of this constraint considering Machine Translation systems which use dependency parser output, as an example application. We then show how the current state-of-the-art dependency parsers violate this constraint. We present two new methods to handle this constraint. We evaluate our methods on the state-of-the-art dependency parsers for Hindi.

Research paper thumbnail of Exploring Different Approaches for Parsing Telugu

Handbook of Experimental Pharmacology, 2019

In this paper we explore different approaches for parsing Telugu. We consider three popular depen... more In this paper we explore different approaches for parsing Telugu. We consider three popular dependency parsers namely, MaltParser, MSTParser and TurboParser. We first experiment with different parser and feature settings and show the impact of different settings. We then explore different ways of ensembling these parsers. We also provide a detailed analysis of the performance of all the approaches on major dependency labels and different distance ranges. We report our results on test data of Telugu dependency treebank provided in the ICON 2010 tools contest on Indian languages dependency parsing. We obtain state-of-the art performance of 91.8% in unlabelled attachment score and 70.0% in labelled attachment score.

Research paper thumbnail of Two approaches for incorporating linguistic constraints to improve the usability of Telugu dependency parser

International Journal of Applied Pattern Recognition, 2016

Statistical systems with high accuracy are very useful in real-world applications. If these syste... more Statistical systems with high accuracy are very useful in real-world applications. If these systems can capture basic linguistic information, then the usefulness of these statistical systems improves a lot. This paper is an attempt at incorporating linguistic constraints in statistical dependency parsing. We consider a simple linguistic constraint that a verb should not have multiple subjects or direct objects as its children in the dependency tree. We first describe the importance of this constraint considering machine translation systems which use dependency parser output, as an example application. We then show how the current state-of-the-art dependency parsers violate this constraint. We describe two methods to handle this constraint. We evaluate our methods on the state-of-the-art dependency Telugu parser. Our results show that we can build a statistical parser which handles linguistic constraints and thus be more useful in real-world applications without compromising accuracy.

Research paper thumbnail of Telugu dependency parsing using different statistical parsers

Journal of King Saud University - Computer and Information Sciences, 2017

In this paper we explore different statistical dependency parsers for parsing Telugu. We consider... more In this paper we explore different statistical dependency parsers for parsing Telugu. We consider five popular dependency parsers namely, MaltParser, MSTParser, TurboParser, ZPar and Easy-First Parser. We experiment with different parser and feature settings and show the impact of different settings. We also provide a detailed analysis of the performance of all the parsers on major dependency labels. We report our results on test data of Telugu dependency treebank provided in the ICON 2010 tools contest on Indian languages dependency parsing. We obtain state-of-the art performance of 91.8% in unlabeled attachment score and 70.0% in labeled attachment score. To the best of our knowledge ours is the only work which explored all the five popular dependency parsers and compared the performance under different feature settings for Telugu.

Research paper thumbnail of Improving Telugu Dependency Parsing using Combinatory Categorial Grammar Supertags

ACM Transactions on Asian and Low-Resource Language Information Processing, 2015

We show that Combinatory Categorial Grammar (CCG) supertags can improve Telugu dependency parsing... more We show that Combinatory Categorial Grammar (CCG) supertags can improve Telugu dependency parsing. In this process, we first extract a CCG lexicon from the dependency treebank. Using both the CCG lexicon and the dependency treebank, we create a CCG treebank using a chart parser. Exploring different morphological features of Telugu, we develop a supertagger using maximum entropy models. We provide CCG supertags as features to the Telugu dependency parser (MST parser). We get an improvement of 1.8% in the unlabelled attachment score and 2.2% in the labelled attachment score. Our results show that CCG supertags improve the MST parser, especially on verbal arguments for which it has weak rates of recovery.

Research paper thumbnail of Hindi Dependency Parsing using a combined model of Malt and MST

In this paper we present our experiments in parsing Hindi. We first explored Malt and MST parsers... more In this paper we present our experiments in parsing Hindi. We first explored Malt and MST parsers. Considering pros of both these parsers, we developed a hybrid approach combining the output of these two parsers in an intuitive manner. We report our results on both development and test data provided in the Hindi Shared Task on Parsing at workshop on MT and parsing in Indian Languages, Coling 2012. Our system secured labeled attachment score of 90.66% and 80.77% for gold standard and automatic tracks respectively. These accuracies are 3 best and 5 best for gold standard and automatic tracks respectively.