Dr B Venkata Seshu Kumari (original) (raw)
Uploads
Papers by Dr B Venkata Seshu Kumari
Statistical systems with high accuracy are very useful in real-world applications. If these syste... more Statistical systems with high accuracy are very useful in real-world applications. If these systems can capture basic linguistic information, then the usefulness of these statistical systems improve a lot. This paper is an attempt at incorporating linguistic constraints in statistical dependency parsing. We consider a simple linguistic constraint that a verb should not have multiple subjects/objects as its children in the dependency tree. We first describe the importance of this constraint considering Machine Translation systems which use dependency parser output, as an example application. We then show how the current state-of-the-art dependency parsers violate this constraint. We present two new methods to handle this constraint. We evaluate our methods on the state-of-the-art dependency parsers for Hindi.
Handbook of Experimental Pharmacology, 2019
In this paper we explore different approaches for parsing Telugu. We consider three popular depen... more In this paper we explore different approaches for parsing Telugu. We consider three popular dependency parsers namely, MaltParser, MSTParser and TurboParser. We first experiment with different parser and feature settings and show the impact of different settings. We then explore different ways of ensembling these parsers. We also provide a detailed analysis of the performance of all the approaches on major dependency labels and different distance ranges. We report our results on test data of Telugu dependency treebank provided in the ICON 2010 tools contest on Indian languages dependency parsing. We obtain state-of-the art performance of 91.8% in unlabelled attachment score and 70.0% in labelled attachment score.
International Journal of Applied Pattern Recognition, 2016
Statistical systems with high accuracy are very useful in real-world applications. If these syste... more Statistical systems with high accuracy are very useful in real-world applications. If these systems can capture basic linguistic information, then the usefulness of these statistical systems improves a lot. This paper is an attempt at incorporating linguistic constraints in statistical dependency parsing. We consider a simple linguistic constraint that a verb should not have multiple subjects or direct objects as its children in the dependency tree. We first describe the importance of this constraint considering machine translation systems which use dependency parser output, as an example application. We then show how the current state-of-the-art dependency parsers violate this constraint. We describe two methods to handle this constraint. We evaluate our methods on the state-of-the-art dependency Telugu parser. Our results show that we can build a statistical parser which handles linguistic constraints and thus be more useful in real-world applications without compromising accuracy.
Journal of King Saud University - Computer and Information Sciences, 2017
In this paper we explore different statistical dependency parsers for parsing Telugu. We consider... more In this paper we explore different statistical dependency parsers for parsing Telugu. We consider five popular dependency parsers namely, MaltParser, MSTParser, TurboParser, ZPar and Easy-First Parser. We experiment with different parser and feature settings and show the impact of different settings. We also provide a detailed analysis of the performance of all the parsers on major dependency labels. We report our results on test data of Telugu dependency treebank provided in the ICON 2010 tools contest on Indian languages dependency parsing. We obtain state-of-the art performance of 91.8% in unlabeled attachment score and 70.0% in labeled attachment score. To the best of our knowledge ours is the only work which explored all the five popular dependency parsers and compared the performance under different feature settings for Telugu.
ACM Transactions on Asian and Low-Resource Language Information Processing, 2015
We show that Combinatory Categorial Grammar (CCG) supertags can improve Telugu dependency parsing... more We show that Combinatory Categorial Grammar (CCG) supertags can improve Telugu dependency parsing. In this process, we first extract a CCG lexicon from the dependency treebank. Using both the CCG lexicon and the dependency treebank, we create a CCG treebank using a chart parser. Exploring different morphological features of Telugu, we develop a supertagger using maximum entropy models. We provide CCG supertags as features to the Telugu dependency parser (MST parser). We get an improvement of 1.8% in the unlabelled attachment score and 2.2% in the labelled attachment score. Our results show that CCG supertags improve the MST parser, especially on verbal arguments for which it has weak rates of recovery.
In this paper we present our experiments in parsing Hindi. We first explored Malt and MST parsers... more In this paper we present our experiments in parsing Hindi. We first explored Malt and MST parsers. Considering pros of both these parsers, we developed a hybrid approach combining the output of these two parsers in an intuitive manner. We report our results on both development and test data provided in the Hindi Shared Task on Parsing at workshop on MT and parsing in Indian Languages, Coling 2012. Our system secured labeled attachment score of 90.66% and 80.77% for gold standard and automatic tracks respectively. These accuracies are 3 best and 5 best for gold standard and automatic tracks respectively.
Statistical systems with high accuracy are very useful in real-world applications. If these syste... more Statistical systems with high accuracy are very useful in real-world applications. If these systems can capture basic linguistic information, then the usefulness of these statistical systems improve a lot. This paper is an attempt at incorporating linguistic constraints in statistical dependency parsing. We consider a simple linguistic constraint that a verb should not have multiple subjects/objects as its children in the dependency tree. We first describe the importance of this constraint considering Machine Translation systems which use dependency parser output, as an example application. We then show how the current state-of-the-art dependency parsers violate this constraint. We present two new methods to handle this constraint. We evaluate our methods on the state-of-the-art dependency parsers for Hindi.
Handbook of Experimental Pharmacology, 2019
In this paper we explore different approaches for parsing Telugu. We consider three popular depen... more In this paper we explore different approaches for parsing Telugu. We consider three popular dependency parsers namely, MaltParser, MSTParser and TurboParser. We first experiment with different parser and feature settings and show the impact of different settings. We then explore different ways of ensembling these parsers. We also provide a detailed analysis of the performance of all the approaches on major dependency labels and different distance ranges. We report our results on test data of Telugu dependency treebank provided in the ICON 2010 tools contest on Indian languages dependency parsing. We obtain state-of-the art performance of 91.8% in unlabelled attachment score and 70.0% in labelled attachment score.
International Journal of Applied Pattern Recognition, 2016
Statistical systems with high accuracy are very useful in real-world applications. If these syste... more Statistical systems with high accuracy are very useful in real-world applications. If these systems can capture basic linguistic information, then the usefulness of these statistical systems improves a lot. This paper is an attempt at incorporating linguistic constraints in statistical dependency parsing. We consider a simple linguistic constraint that a verb should not have multiple subjects or direct objects as its children in the dependency tree. We first describe the importance of this constraint considering machine translation systems which use dependency parser output, as an example application. We then show how the current state-of-the-art dependency parsers violate this constraint. We describe two methods to handle this constraint. We evaluate our methods on the state-of-the-art dependency Telugu parser. Our results show that we can build a statistical parser which handles linguistic constraints and thus be more useful in real-world applications without compromising accuracy.
Journal of King Saud University - Computer and Information Sciences, 2017
In this paper we explore different statistical dependency parsers for parsing Telugu. We consider... more In this paper we explore different statistical dependency parsers for parsing Telugu. We consider five popular dependency parsers namely, MaltParser, MSTParser, TurboParser, ZPar and Easy-First Parser. We experiment with different parser and feature settings and show the impact of different settings. We also provide a detailed analysis of the performance of all the parsers on major dependency labels. We report our results on test data of Telugu dependency treebank provided in the ICON 2010 tools contest on Indian languages dependency parsing. We obtain state-of-the art performance of 91.8% in unlabeled attachment score and 70.0% in labeled attachment score. To the best of our knowledge ours is the only work which explored all the five popular dependency parsers and compared the performance under different feature settings for Telugu.
ACM Transactions on Asian and Low-Resource Language Information Processing, 2015
We show that Combinatory Categorial Grammar (CCG) supertags can improve Telugu dependency parsing... more We show that Combinatory Categorial Grammar (CCG) supertags can improve Telugu dependency parsing. In this process, we first extract a CCG lexicon from the dependency treebank. Using both the CCG lexicon and the dependency treebank, we create a CCG treebank using a chart parser. Exploring different morphological features of Telugu, we develop a supertagger using maximum entropy models. We provide CCG supertags as features to the Telugu dependency parser (MST parser). We get an improvement of 1.8% in the unlabelled attachment score and 2.2% in the labelled attachment score. Our results show that CCG supertags improve the MST parser, especially on verbal arguments for which it has weak rates of recovery.
In this paper we present our experiments in parsing Hindi. We first explored Malt and MST parsers... more In this paper we present our experiments in parsing Hindi. We first explored Malt and MST parsers. Considering pros of both these parsers, we developed a hybrid approach combining the output of these two parsers in an intuitive manner. We report our results on both development and test data provided in the Hindi Shared Task on Parsing at workshop on MT and parsing in Indian Languages, Coling 2012. Our system secured labeled attachment score of 90.66% and 80.77% for gold standard and automatic tracks respectively. These accuracies are 3 best and 5 best for gold standard and automatic tracks respectively.