Vaijayanthi Sarma - Academia.edu (original) (raw)

Papers by Vaijayanthi Sarma

Research paper thumbnail of Malayalam and core Dravidian phonology: A view from early language acquisition

Research paper thumbnail of Mapping Commission Errors to Grammatical Development: A Case Study of Malayalam

Languages, Jan 16, 2023

This article is an open access article distributed under the terms and conditions of the Creative... more This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY

Research paper thumbnail of Unlocking the verbal spine in Malayalam: Past tense is key

Glossa: a journal of general linguistics

The focus of the paper is the internal structure of the verbal spine. Our aim is to show, using d... more The focus of the paper is the internal structure of the verbal spine. Our aim is to show, using data from Malayalam, how verbal structures can be built, allowing for a new understanding of verb alternations as well as the verbal spine. The paper provides a novel analysis of the various v/Voice features, and proposes the adjunction of a √AGENT to the functional heads v and Voice. The difference in the loci of adjunction of the √AGENT is shown to be directly correlated with the differences in the semantics of the verb with consequences for the argument structure, specifically, the external argument at Spec, VoiceP. This analysis not only unpacks the verbal spine to help build a uniform account of verb alternations, but also provides for a clear understanding of the complex past tense morphology in the language. The contextually driven allomorphy of the past tense is shown to be directly linked to the morphosyntactic features and the expression of the exponents at v/Voice. The paper al...

Research paper thumbnail of Acquisition of Malayalam inflections: Complexity of morphosyntactic rules and its impact on developing grammars

First Language

In this article, we present an analysis of the complexity of grammatical constraints and their im... more In this article, we present an analysis of the complexity of grammatical constraints and their impact on early language acquisition of inflectional morphemes in Malayalam. We use the natural speech production data of two monolingual children acquiring Malayalam between the ages 1;9–2;10 and 2;3–3;0 and three bilingual children acquiring Malayalam-English between the ages 1;9–2;8, 2;0–3;0 and 1;10–2;11 to recover the underlying grammatical constraints that govern the correct productions as well as errors across monolingual and bilingual contexts. We find rules that reference lexico-semantic properties to be particularly challenging to young children.

Research paper thumbnail of Imperative markers: A comparison of monolingual and bilingual early language acquisition

Bare' verbs are in abundance in early acquisition data. The presence of these verbs is one of the... more Bare' verbs are in abundance in early acquisition data. The presence of these verbs is one of the reasons for hypothesizing the lack of agreement inflections or projections in early language acquisition theories. The affirmative, singular, imperative form is the only 'bare' form in the adult grammar and we argue that child grammar is no different (Lakshmanan, 2006). The bare forms in the data are not really bare. It provides clues to how the grammar is organized internally and how the typologies of the languages that a child is exposed to might influence early utterances. Imperatives are produced as an order, command or request. Usually, these productions in children are accompanied by gestures or a peak in intonation (Sarma, 1999). Cross-linguistically, imperatives are morphologically sparse (Schwager, 2011). The subject of an imperative is optional even in languages that require them ordinarily, for example, English. In this paper, we focus on the morphological marking of the imperative mood and not the periphrastic imperative structures. Drawing on evidence from longitudinal monolingual and bilingual first language acquisition data, we will argue that typological differences play out differently in monolingual and bilingual inflectional development taking Malayalam and English imperatives as cases in point. We shall attempt to explain the occurrence of certain imperative forms in the language context by tracing the differences and similarities in the spontaneous acquisition data of two monolingual children and one bilingual child.

Research paper thumbnail of How many branches to the syntactic tree? Disagreements over agreement

North East Linguistics Society, 1995

Research paper thumbnail of South and Southeast Asian Psycholinguistics: Infant-directed speech: social and linguistic pathways in tonal and non-tonal languages

Research paper thumbnail of Mediating the medium: When language becomes an impediment to learning

The complicated language policy of India engages with education in various ways throughout the en... more The complicated language policy of India engages with education in various ways throughout the entire period of formal learning. The Three Language Formula is implemented across schools, but tertiary (university-level) education is, dominantly, through the medium of English. At this level, differences in fluency become magnified and have wide-ranging impact on both the individuals and the institutions. The specific challenges for individuals have to do with transacting all learning activities in English and ensuring access to employment or other higher educational opportunities despite this difficult transition. The challenge for the institutions is to achieve the core aim of successfully training students in the sciences, without loss of resources in replicating language skills instruction or compromising on the educational outcomes. This paper will consider the mediation carried out in a science and technology institution, the particular challenges posed by English as a medium of instruction, and some effective ways in which learning can be enabled within the constraints posed by the curriculum and the available financial, infrastructural, and human resources, using principled introspection, dynamic course modification to learner responses, and outside-the-curriculum language instruction. Specifically, the paper discusses how the passive language competence of the students was extended into more active use in the course of study with encouraging results.

Research paper thumbnail of IIT Bombay

We analyse Hindi complex predicates and propose linguistic tests for their de-tection. This analy... more We analyse Hindi complex predicates and propose linguistic tests for their de-tection. This analysis enables us to iden-tify a category of V+V complex predi-cates called lexical compound verbs (LCpdVs) which need to be stored in the dictionary. Based on the linguistic analy-sis, a simple automatic method has been devised for extracting LCpdVs from cor-pora. We achieve an accuracy of around 98 % in this task. The LCpdVs thus ex-tracted may be used to automatically augment lexical resources like wordnets, an otherwise time consuming and labour-intensive process

Research paper thumbnail of IIT Bombay

We analyse Hindi complex predicates and propose linguistic tests for their detection. This analys... more We analyse Hindi complex predicates and propose linguistic tests for their detection. This analysis enables us to identify a category of V+V complex predicates called lexical compound verbs (LCpdVs) which need to be stored in the dictionary. Based on the linguistic analysis, a simple automatic method has been devised for extracting LCpdVs from corpora. We achieve an accuracy of around 98 % in this task. The LCpdVs thus extracted may be used to automatically augment lexical resources like wordnets, an otherwise time consuming and labourintensive process 1

Research paper thumbnail of Agreement and Word Order: Issues in the Syntax and Acquisition of Tamil

This dissertation focuses 011 the syntax of Tamil, a Dravidian language. The main issues discusse... more This dissertation focuses 011 the syntax of Tamil, a Dravidian language. The main issues discussed in the dissertation may be broadly classified into (a) those concerning the TP-illtern1 structure and (b) those concerning the TP-external structure. The aim is to provide as complete an account as possible of the syntactic issues under consideration in both adult syntax and developmental syntax. With respect to the TP-internal structure, the case and agreement properties in the syntax of Tamil are indicated in a wide variety of constructions, including finite (nominative and dative subjects, imperatives) and non-finite (verbal participles, infinitivals) sentences, and the theoretical processes necessary for the assignment of case and the determining of verb agreement are established. Evidence is given for the TP-internal positions of the various argument DPs, including diagnostic tests for subjects aid (especially, nominative) objects. Agreement facts and

Research paper thumbnail of Hindi noun inflection and distributed morphology

Proceedings of the International Conference on Head-Driven Phrase Structure Grammar, 2010

This paper primarily presents an analysis of nominal inflection in Hindi within the framework of ... more This paper primarily presents an analysis of nominal inflection in Hindi within the framework of Distributed Morphology (Halle & Marantz 1993, 1994 and Harley and Noyer 1999). Müller (2002, 2003, 2004) for German, Icelandic and Russian nouns respectively and Weisser (2006) for Croatian nouns have also used Distributed Morphology (henceforth DM) to analyze nominal inflectional morphology. This paper will discuss in detail the inflectional categories and inflectional classes, the morphological processes operating at syntax, the distribution of vocabulary items and the readjustment rules required to describe Hindi nominal inflection. Earlier studies on Hindi inflectional morphology (Guru 1920, Vajpeyi 1958, Upreti 1964, etc.) were greatly influenced by the Paninian tradition (classical Sanskrit model) and work with Paninian constructs such as root and stem. They only provide descriptive studies of Hindi nouns and verbs and their inflections without discussing the role or status of affi...

Research paper thumbnail of Verbal Inflection in Hindi: A Distributed Morphology Approach

In this paper, we provide a complete description of Hindi verbal inflection within the framework ... more In this paper, we provide a complete description of Hindi verbal inflection within the framework of Distributed morphology. We discuss the categories that are visible on the verb itself and on associated auxiliaries. We show how both analysis and generation are possible using this model. We also discuss the implementation of such linguistically motivated analysis in a morphological analyzer for Hindi, one of the several NLP tools that we have developed for Hindi, and discuss the outcomes of such an implementation. In this paper, we present an analysis of Hindi verbal inflection in the framework of Distributed Morphology (DM) (Halle and Marantz 1993 1994, Harley and Noyer 2003). The analysis presented in this paper demonstrates how DM may be used to provide a systematic and economical account of verbal inflection in Hindi. This DM analysis also lends itself straightforwardly to implementation in a Hindi Morphological Analyzer. We begin by detailing the different inflectional categori...

Research paper thumbnail of 1 Complex Predicates in Indian Language

Wordnets, which are repositories of lexical semantic knowledge containing semantically linked syn... more Wordnets, which are repositories of lexical semantic knowledge containing semantically linked synsets and lexically linked words, are indispensable for work on computational linguistics and natural language processing. While building wordnets for Hindi and Marathi, two major IndoEuropean languages, we observed that the verb hierarchy in the Princeton Wordnet was rather shallow. We set to constructing a verb knowledge base for Hindi, which arranges the Hindi verbs in a hierarchy of is-a (hypernymy) relation. We realized that there are unique Indian language phenomena that bear upon the lexicalization vs. syntactically derived choice. One such example is the occurrence of conjunct and compound verbs (called Complex Predicates) which are found in all Indian languages. This paper presents our experience in the construction of lexical knowledge bases for Indian languages with special attention to Hindi. The question of storing or deriving complex predicates has been dealt with linguistic...

Research paper thumbnail of Plural Problems in the Nominal Morphology of Marathi

In this paper, we describe the two tests developed and designed for Marathi using non-words, a) p... more In this paper, we describe the two tests developed and designed for Marathi using non-words, a) plural formation for non-words b) intuition test for gender assignment in which subjects were asked to assign gender to non-words. We look at the distribution of nouns across noun classes and genders and discuss the congruence between the problematic classes as observed in the tests and the actual class distribution and frequency in the language.

Research paper thumbnail of Non-Canonical Word Order: Topic and Focus in Adult and Child Tamil

Word Order and Scrambling

... targeted. We also show that these operations are similar to independent Topic andCleft constr... more ... targeted. We also show that these operations are similar to independent Topic andCleft constructions in Tamil. We ... 1997). (vii) There are independent cleft constructions in Tamil with the focused nom-inal to the right of the verb. (viii ...

Research paper thumbnail of Marking plurals: the acquisition of nominal number inflection in Marathi

South and Southeast Asian Psycholinguistics

Research paper thumbnail of Complex predicates in Indian languages and wordnets

Language Resources and Evaluation, 2007

Wordnets, which are repositories of lexical semantic knowledge containing semantically linked syn... more Wordnets, which are repositories of lexical semantic knowledge containing semantically linked synsets and lexically linked words, are indispensable for work on computational linguistics and natural language processing. While building wordnets for Hindi and Marathi, two major Indo-European languages, we observed that the verb hierarchy in the Princeton Wordnet was rather shallow. We set to constructing a verb knowledge base for Hindi, which arranges the Hindi verbs in a hierarchy of is-a (hypernymy) relation. We realized that there are unique Indian language phenomena that bear upon the lexicalization vs. syntactically derived choice. One such example is the occurrence of conjunct and compound verbs (called Complex Predicates) which are found in all Indian languages. This paper presents our experience in the construction of lexical knowledge bases for Indian languages with special attention to Hindi. The question of storing or deriving complex predicates has been dealt with linguistically and computationally. We have constructed empirical tests to decide if a combination of two words, the second of which is a verb, is a complex predicate or not. Such tests will provide a principled way of deciding the status of complex predicates in Indian language wordnets. An additional application of this work is the possibility of automatic augmentations to the Wordnet using corpora, a topic of great interest in current research.

Research paper thumbnail of Case, agreement and word order: issues in the syntax and acquisition of Taml

Research paper thumbnail of Hindi Compound Verbs and Their Automatic Extraction

Computational …, 2008

We analyse Hindi complex predicates and propose linguistic tests for their detection. This analys... more We analyse Hindi complex predicates and propose linguistic tests for their detection. This analysis enables us to identify a category of V+V complex predicates called lexical compound verbs (LCpdVs) which need to be stored in the dictionary. Based on the linguistic analysis, a simple automatic method has been devised for extracting LCpdVs from corpora. We achieve an accuracy of around 98% in this task. The LCpdVs thus extracted may be used to automatically augment lexical resources like wordnets, an otherwise time consuming and labourintensive process

Research paper thumbnail of Malayalam and core Dravidian phonology: A view from early language acquisition

Research paper thumbnail of Mapping Commission Errors to Grammatical Development: A Case Study of Malayalam

Languages, Jan 16, 2023

This article is an open access article distributed under the terms and conditions of the Creative... more This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY

Research paper thumbnail of Unlocking the verbal spine in Malayalam: Past tense is key

Glossa: a journal of general linguistics

The focus of the paper is the internal structure of the verbal spine. Our aim is to show, using d... more The focus of the paper is the internal structure of the verbal spine. Our aim is to show, using data from Malayalam, how verbal structures can be built, allowing for a new understanding of verb alternations as well as the verbal spine. The paper provides a novel analysis of the various v/Voice features, and proposes the adjunction of a √AGENT to the functional heads v and Voice. The difference in the loci of adjunction of the √AGENT is shown to be directly correlated with the differences in the semantics of the verb with consequences for the argument structure, specifically, the external argument at Spec, VoiceP. This analysis not only unpacks the verbal spine to help build a uniform account of verb alternations, but also provides for a clear understanding of the complex past tense morphology in the language. The contextually driven allomorphy of the past tense is shown to be directly linked to the morphosyntactic features and the expression of the exponents at v/Voice. The paper al...

Research paper thumbnail of Acquisition of Malayalam inflections: Complexity of morphosyntactic rules and its impact on developing grammars

First Language

In this article, we present an analysis of the complexity of grammatical constraints and their im... more In this article, we present an analysis of the complexity of grammatical constraints and their impact on early language acquisition of inflectional morphemes in Malayalam. We use the natural speech production data of two monolingual children acquiring Malayalam between the ages 1;9–2;10 and 2;3–3;0 and three bilingual children acquiring Malayalam-English between the ages 1;9–2;8, 2;0–3;0 and 1;10–2;11 to recover the underlying grammatical constraints that govern the correct productions as well as errors across monolingual and bilingual contexts. We find rules that reference lexico-semantic properties to be particularly challenging to young children.

Research paper thumbnail of Imperative markers: A comparison of monolingual and bilingual early language acquisition

Bare' verbs are in abundance in early acquisition data. The presence of these verbs is one of the... more Bare' verbs are in abundance in early acquisition data. The presence of these verbs is one of the reasons for hypothesizing the lack of agreement inflections or projections in early language acquisition theories. The affirmative, singular, imperative form is the only 'bare' form in the adult grammar and we argue that child grammar is no different (Lakshmanan, 2006). The bare forms in the data are not really bare. It provides clues to how the grammar is organized internally and how the typologies of the languages that a child is exposed to might influence early utterances. Imperatives are produced as an order, command or request. Usually, these productions in children are accompanied by gestures or a peak in intonation (Sarma, 1999). Cross-linguistically, imperatives are morphologically sparse (Schwager, 2011). The subject of an imperative is optional even in languages that require them ordinarily, for example, English. In this paper, we focus on the morphological marking of the imperative mood and not the periphrastic imperative structures. Drawing on evidence from longitudinal monolingual and bilingual first language acquisition data, we will argue that typological differences play out differently in monolingual and bilingual inflectional development taking Malayalam and English imperatives as cases in point. We shall attempt to explain the occurrence of certain imperative forms in the language context by tracing the differences and similarities in the spontaneous acquisition data of two monolingual children and one bilingual child.

Research paper thumbnail of How many branches to the syntactic tree? Disagreements over agreement

North East Linguistics Society, 1995

Research paper thumbnail of South and Southeast Asian Psycholinguistics: Infant-directed speech: social and linguistic pathways in tonal and non-tonal languages

Research paper thumbnail of Mediating the medium: When language becomes an impediment to learning

The complicated language policy of India engages with education in various ways throughout the en... more The complicated language policy of India engages with education in various ways throughout the entire period of formal learning. The Three Language Formula is implemented across schools, but tertiary (university-level) education is, dominantly, through the medium of English. At this level, differences in fluency become magnified and have wide-ranging impact on both the individuals and the institutions. The specific challenges for individuals have to do with transacting all learning activities in English and ensuring access to employment or other higher educational opportunities despite this difficult transition. The challenge for the institutions is to achieve the core aim of successfully training students in the sciences, without loss of resources in replicating language skills instruction or compromising on the educational outcomes. This paper will consider the mediation carried out in a science and technology institution, the particular challenges posed by English as a medium of instruction, and some effective ways in which learning can be enabled within the constraints posed by the curriculum and the available financial, infrastructural, and human resources, using principled introspection, dynamic course modification to learner responses, and outside-the-curriculum language instruction. Specifically, the paper discusses how the passive language competence of the students was extended into more active use in the course of study with encouraging results.

Research paper thumbnail of IIT Bombay

We analyse Hindi complex predicates and propose linguistic tests for their de-tection. This analy... more We analyse Hindi complex predicates and propose linguistic tests for their de-tection. This analysis enables us to iden-tify a category of V+V complex predi-cates called lexical compound verbs (LCpdVs) which need to be stored in the dictionary. Based on the linguistic analy-sis, a simple automatic method has been devised for extracting LCpdVs from cor-pora. We achieve an accuracy of around 98 % in this task. The LCpdVs thus ex-tracted may be used to automatically augment lexical resources like wordnets, an otherwise time consuming and labour-intensive process

Research paper thumbnail of IIT Bombay

We analyse Hindi complex predicates and propose linguistic tests for their detection. This analys... more We analyse Hindi complex predicates and propose linguistic tests for their detection. This analysis enables us to identify a category of V+V complex predicates called lexical compound verbs (LCpdVs) which need to be stored in the dictionary. Based on the linguistic analysis, a simple automatic method has been devised for extracting LCpdVs from corpora. We achieve an accuracy of around 98 % in this task. The LCpdVs thus extracted may be used to automatically augment lexical resources like wordnets, an otherwise time consuming and labourintensive process 1

Research paper thumbnail of Agreement and Word Order: Issues in the Syntax and Acquisition of Tamil

This dissertation focuses 011 the syntax of Tamil, a Dravidian language. The main issues discusse... more This dissertation focuses 011 the syntax of Tamil, a Dravidian language. The main issues discussed in the dissertation may be broadly classified into (a) those concerning the TP-illtern1 structure and (b) those concerning the TP-external structure. The aim is to provide as complete an account as possible of the syntactic issues under consideration in both adult syntax and developmental syntax. With respect to the TP-internal structure, the case and agreement properties in the syntax of Tamil are indicated in a wide variety of constructions, including finite (nominative and dative subjects, imperatives) and non-finite (verbal participles, infinitivals) sentences, and the theoretical processes necessary for the assignment of case and the determining of verb agreement are established. Evidence is given for the TP-internal positions of the various argument DPs, including diagnostic tests for subjects aid (especially, nominative) objects. Agreement facts and

Research paper thumbnail of Hindi noun inflection and distributed morphology

Proceedings of the International Conference on Head-Driven Phrase Structure Grammar, 2010

This paper primarily presents an analysis of nominal inflection in Hindi within the framework of ... more This paper primarily presents an analysis of nominal inflection in Hindi within the framework of Distributed Morphology (Halle & Marantz 1993, 1994 and Harley and Noyer 1999). Müller (2002, 2003, 2004) for German, Icelandic and Russian nouns respectively and Weisser (2006) for Croatian nouns have also used Distributed Morphology (henceforth DM) to analyze nominal inflectional morphology. This paper will discuss in detail the inflectional categories and inflectional classes, the morphological processes operating at syntax, the distribution of vocabulary items and the readjustment rules required to describe Hindi nominal inflection. Earlier studies on Hindi inflectional morphology (Guru 1920, Vajpeyi 1958, Upreti 1964, etc.) were greatly influenced by the Paninian tradition (classical Sanskrit model) and work with Paninian constructs such as root and stem. They only provide descriptive studies of Hindi nouns and verbs and their inflections without discussing the role or status of affi...

Research paper thumbnail of Verbal Inflection in Hindi: A Distributed Morphology Approach

In this paper, we provide a complete description of Hindi verbal inflection within the framework ... more In this paper, we provide a complete description of Hindi verbal inflection within the framework of Distributed morphology. We discuss the categories that are visible on the verb itself and on associated auxiliaries. We show how both analysis and generation are possible using this model. We also discuss the implementation of such linguistically motivated analysis in a morphological analyzer for Hindi, one of the several NLP tools that we have developed for Hindi, and discuss the outcomes of such an implementation. In this paper, we present an analysis of Hindi verbal inflection in the framework of Distributed Morphology (DM) (Halle and Marantz 1993 1994, Harley and Noyer 2003). The analysis presented in this paper demonstrates how DM may be used to provide a systematic and economical account of verbal inflection in Hindi. This DM analysis also lends itself straightforwardly to implementation in a Hindi Morphological Analyzer. We begin by detailing the different inflectional categori...

Research paper thumbnail of 1 Complex Predicates in Indian Language

Wordnets, which are repositories of lexical semantic knowledge containing semantically linked syn... more Wordnets, which are repositories of lexical semantic knowledge containing semantically linked synsets and lexically linked words, are indispensable for work on computational linguistics and natural language processing. While building wordnets for Hindi and Marathi, two major IndoEuropean languages, we observed that the verb hierarchy in the Princeton Wordnet was rather shallow. We set to constructing a verb knowledge base for Hindi, which arranges the Hindi verbs in a hierarchy of is-a (hypernymy) relation. We realized that there are unique Indian language phenomena that bear upon the lexicalization vs. syntactically derived choice. One such example is the occurrence of conjunct and compound verbs (called Complex Predicates) which are found in all Indian languages. This paper presents our experience in the construction of lexical knowledge bases for Indian languages with special attention to Hindi. The question of storing or deriving complex predicates has been dealt with linguistic...

Research paper thumbnail of Plural Problems in the Nominal Morphology of Marathi

In this paper, we describe the two tests developed and designed for Marathi using non-words, a) p... more In this paper, we describe the two tests developed and designed for Marathi using non-words, a) plural formation for non-words b) intuition test for gender assignment in which subjects were asked to assign gender to non-words. We look at the distribution of nouns across noun classes and genders and discuss the congruence between the problematic classes as observed in the tests and the actual class distribution and frequency in the language.

Research paper thumbnail of Non-Canonical Word Order: Topic and Focus in Adult and Child Tamil

Word Order and Scrambling

... targeted. We also show that these operations are similar to independent Topic andCleft constr... more ... targeted. We also show that these operations are similar to independent Topic andCleft constructions in Tamil. We ... 1997). (vii) There are independent cleft constructions in Tamil with the focused nom-inal to the right of the verb. (viii ...

Research paper thumbnail of Marking plurals: the acquisition of nominal number inflection in Marathi

South and Southeast Asian Psycholinguistics

Research paper thumbnail of Complex predicates in Indian languages and wordnets

Language Resources and Evaluation, 2007

Wordnets, which are repositories of lexical semantic knowledge containing semantically linked syn... more Wordnets, which are repositories of lexical semantic knowledge containing semantically linked synsets and lexically linked words, are indispensable for work on computational linguistics and natural language processing. While building wordnets for Hindi and Marathi, two major Indo-European languages, we observed that the verb hierarchy in the Princeton Wordnet was rather shallow. We set to constructing a verb knowledge base for Hindi, which arranges the Hindi verbs in a hierarchy of is-a (hypernymy) relation. We realized that there are unique Indian language phenomena that bear upon the lexicalization vs. syntactically derived choice. One such example is the occurrence of conjunct and compound verbs (called Complex Predicates) which are found in all Indian languages. This paper presents our experience in the construction of lexical knowledge bases for Indian languages with special attention to Hindi. The question of storing or deriving complex predicates has been dealt with linguistically and computationally. We have constructed empirical tests to decide if a combination of two words, the second of which is a verb, is a complex predicate or not. Such tests will provide a principled way of deciding the status of complex predicates in Indian language wordnets. An additional application of this work is the possibility of automatic augmentations to the Wordnet using corpora, a topic of great interest in current research.

Research paper thumbnail of Case, agreement and word order: issues in the syntax and acquisition of Taml

Research paper thumbnail of Hindi Compound Verbs and Their Automatic Extraction

Computational …, 2008

We analyse Hindi complex predicates and propose linguistic tests for their detection. This analys... more We analyse Hindi complex predicates and propose linguistic tests for their detection. This analysis enables us to identify a category of V+V complex predicates called lexical compound verbs (LCpdVs) which need to be stored in the dictionary. Based on the linguistic analysis, a simple automatic method has been devised for extracting LCpdVs from corpora. We achieve an accuracy of around 98% in this task. The LCpdVs thus extracted may be used to automatically augment lexical resources like wordnets, an otherwise time consuming and labourintensive process