English Parsers: Some Information based observations (original) (raw)

Evaluation of Two Bengali Dependency Parsers

In this paper we have addressed two dependency parsers for a free-word order Indian language, namely Bengali. One of the parsers is a grammar-driven one whereas the second parser is a datadriven one. The grammar-driven parser is an extension of a previously developed parser whereas the data driven parser is the MaltParser customized for Bengali. Both the parsers are evaluated on two datasets: ICON NLP Tool Contest data and Dataset-II (developed by us). The evaluation shows that the grammar-based parser outperforms the MaltParser on ICON data based on which the demand frames of the Bengali verbs were developed but its performance degrades while dealing with completely unknown data, i.e. dataset-II. However, MaltParser performs better on dataset-II and the whole data. Evaluation and error analysis further reveals that the parsers show some complimentary capabilities, which indicates a future scope for their integration to improve the overall parsing efficiency.

Annotation Projection-based Dependency Parser Development for Nepali

ACM Transactions on Asian and Low-Resource Language Information Processing

Building computational resources and tools for the under-resourced languages is strenuous for any Natural Language Processing (NLP) task. This paper presents the first dependency parser for an under-resourced Indian language, Nepali. A prerequisite for developing a parser for a language is a corpus annotated with the desired linguistic representations known as a treebank. With an aim of cross-lingual learning and typological research, we use a Bengali treebank to build a Bengali-Nepali parallel corpus and apply the method of annotation projection from the Bengali treebank to build a treebank for Nepali. With the developed treebank, MaltParser (with all algorithms for projective dependency structures) and a Neural network-based parser have been used to build Nepali parser models. The Neural network-based parser produced state-of-the-art results with 81.2 Unlabeled Attachment Score (UAS), 73.2 Label Accuracy (LA) and, 66.1 Labeled Attachment Score (LAS) on the gold test data. The pars...

Annotation and Issues in Building an English Dependency Treebank

2011

"The Paninian Grammar framework was given by Panini for his analysis of Sanskrit Language and it is finding its extensive application on languages other than Sanskrit, about two thousand five hundred years after its formulation. The work presented in this paper is one such application that extends Paninian Grammar (PG; also known as CPG:Computational Paninian Grammar) to English, a fixed word order language. In doing so, it presents how CPG can account for English. At present, 2000 sentences have been annotated as part of this effort, using the Hyderabad Dependency Treebank (HyDT) Annotation Scheme for Hindi and other Indian languages, which is modelled on CPG. In the course of this paper we talk about CPG and the CPG based HyDT annotation scheme and its evolution over the years. We also talk about the annotation of the English language data per the HyDT Annotation Scheme and how its application to English varies from Hindi. Further, we discuss our handling of some constructions of English, and some anomalies in the language that pose a challenge to the application of CPG to English, as is."

Thesis - Construction Grammar approach for Tamil dependency parsing

Syntactic parsing in NLP is the task of working out the grammatical structure of sentences. Some of the purely formal approaches to parsing such as phrase structure grammar, dependency grammar have been successfully employed for a variety of languages. While phrase structure based constituent analysis is possible for fixed order languages such as English, dependency analysis between the grammatical units have been suitable for many free word order languages such as Indian languages. All these parsing approaches rely on identifying the linguistic units based on their formal syntactic properties and establishing the relationships between such units in the form of a tree. Dravidian languages which are spoken in Southern India are morphologically-rich, agglutinative languages whose characterization on purely structural terms such as adjectives, adverbs, conjunctions, postpositions as well as traditional interpretations of tense and finiteness pose problems in their syntactic analysis which are well-discussed in literature. We propose that the morpho-syntactic structures of Dravidian languages are better analysed from the theoretical perspectives of “Cognitive Grammar” or “Construction Grammar” where every grammatical structure is treated as a symbol that directly maps to meaningful conceptualizations. In other words, natural language is not treated as a formal system but as a functional system that is entirely symbolic or semiotic right from lexicon to grammar. Through linguistic evidences we point out that morpho-syntactic structures in Dravidian languages have their basis in meaningful discourse conceptualizations. Subsequently we hierarchically arrange all these conceptualizations into construction schemas that exhibit multiple-inheritance relationships and we explain all concrete morpho-syntactic structures as instances of these schemas. Based on this fresh theoretical grounding, we model parsing as automatic identification of meaningful dependency relations between such meaningful construction units. We formulated an annotation scheme for labelling the construction units and dependency relations that can exist between these units. Our approach to full parser annotation shows an average MALT LAS of 82.21% on Tamil gold annotated corpus of 935 sentences in a five-fold validation experiment. We conducted experiments by varying training data size, annotation scheme, length of a sentence in terms of number of chunks, granularity of tags and report the parser results of these scenarios. Finally, we build a pipeline with splitter, construction labeller, grouper as intermediate layers before MALT parser input and release the working full parser module.

Dependency Framework for Marathi Parser

2016

This paper describes the Framework of Dependency Grammar for Marathi Parser. Dependency grammar is a grammar formalism, which is a capture direct relations between word to word in the sentence. The parser is a tools, which is automatic analysis sentence and draw a syntactic tree of sentence. The grammar formalism is mechanism to developing parser. Today in filed of computational linguistics, natural language processing and artificial intelligent have two kind of grammar formalism which is Phrase structure grammar and Dependency grammar. Both grammar formalism have their own limitation to developing a parser. In this paper I will use computational Panini grammar approach of dependency grammar. Computational Panini grammar has 37 dependency tag-set and those tag-sets are useful to annotate the Indian languages such as Hindi, Telugu and Bangla. However, I have to examine those dependency tag-set to Marathi and annotate a corpus which is useful to develop a Marathi parser. To annotate d...

Dependency Parsing in Bangla

Concepts, Methodologies, Tools, and Applications

A grammar-driven dependency parsing has been attempted for Bangla (Bengali). The free-word order nature of the language makes the development of an accurate parser very difficult. The Paninian grammatical model has been used to tackle the free-word order problem. The approach is to simplify complex and compound sentences and then to parse simple sentences by satisfying the Karaka demands of the Demand Groups (Verb Groups). Finally, parsed structures are rejoined with appropriate links and Karaka labels. The parser has been trained with a Treebank of 1000 annotated sentences and then evaluated with un-annotated test data of 150 sentences. The evaluation shows that the proposed approach achieves 90.32% and 79.81% accuracies for unlabeled and labeled attachments, respectively.

A Rule-based Dependency Parser for Telugu: An Experiment with Simple Sentences

Translation Today, 2021

This paper is an attempt in building a rule-based dependency parser for Telugu which can parse simple sentences. This study adopts Pāṇini's Grammatical (PG) tradition i.e., the dependency model to parse sentences. A detailed description of mapping semantic relations to vibhaktis (case suffixes and postpositions) in Telugu using PG is presented. The paper describes the algorithm and the linguistic knowledge employed while developing the parser. The research further provides results, which suggest that enriching the current parser with linguistic inputs can increase the accuracy and tackle ambiguity better than existing data-driven methods.