Two-stage Approach for Hindi Dependency Parsing Using MaltParser (original) (raw)
Related papers
Two-stage Approach for Hindi Dependency Parsing Using
2013
In this paper, we present our approach towards dependency parsing of Hindi language as a part of Hindi Shared Task on Parsing, COLING 2012. Our approach includes the effect of using different settings available in Malt Parser following the two-step parsing strategy i.e. splitting the data into interChunks and intraChunks to obtain the best possible LAS 1, UAS 2 and LA 3 accuracy. Our system achieved best LAS of 90.99 % for Gold Standard track and second best LAS of 83.91 % for Automated data.
Dependency Parsing for Telugu Using Data-driven Parsers
2019
In this paper, we have developed manually annotated Telugu corpora by following DS guidelines (2009) and experimented our Telugu dependency treebank data on the data-driven parsers like Malt (Nivre et al., 2007a) and MST (McDonald et al. 2006) for parsing Telugu sentences. In the dependency, we link the head and dependents with their dependency relations (drels) by giving kāraka and non-kāraka relations to them. Telugu annotated data contains token with their morph information, pos, chunk and the drels. We have used our final Telugu treebank data in CONLL format for parsing in malt and MST parsers. We evaluated the labeled attachment score (LAS), unlabeled attachment score (UAS) and labeled accuracy (LA) for both the parsers and also compared their score in case of dependency relation too. Finally, we evaluated the most frequent errors which occurred after parsing the sentences and explained them with relevant examples with appropriate linguistic analysis, so that we can improve the...
Tamil Dependency Parsing: Results Using Rule Based and Corpus Based Approaches
Very few attempts have been reported in the literature on dependency parsing for Tamil. In this paper, we report results obtained for Tamil dependency parsing with rule-based and corpus-based approaches. We designed annotation scheme partially based on Prague Dependency Treebank (PDT) and manually annotated Tamil data (about 3000 words) with dependency relations. For corpus-based approach, we used two well known parsers MaltParser and MSTParser, and for the rule-based approach , we implemented series of linguistic rules (for resolving coordination , complementation, predicate identification and so on) to build dependency structure for Tamil sentences. Our initial results show that, both rule-based and corpus-based approaches achieved the accuracy of more than 74% for the unlabeled task and more than 65% for the labeled tasks. Rule-based parsing accuracy dropped considerably when the input was tagged automatically.
On the role of morphosyntactic features in Hindi dependency parsing
Proceedings of the NAACL …, 2010
This paper analyzes the relative importance of different linguistic features for data-driven dependency parsing of Hindi, using a feature pool derived from two state-of-the-art parsers. The analysis shows that the greatest gain in accuracy comes from the addition of morpho-syntactic features related to case, tense, aspect and modality. Combining features from the two parsers, we achieve a labeled attachment score of 76.5%, which is 2 percentage points better than the previous state of the art. We finally provide a detailed ...
Urdu Dependency Parser: A Data-Driven Approach
cle.org.pk
In this paper, we present what we believe to be the first data-driven dependency parser for Urdu. The parser was trained and tuned using MaltParser system, a system for data-driven dependency parsing. The Urdu dependency treebank (UDT) is used for ...