Amba Kulkarni | University of Hyderabad (original) (raw)

Sanskrit Computational Linguistics by Amba Kulkarni

Research paper thumbnail of Pāņini's Ashţādhyāyī: A Computer Scientist's Perspective

These are the presentation slides of my talk at NIAS, Bangalore

Research paper thumbnail of Subject in English is Abhihita

![Research paper thumbnail of How free is `free' word order in Sanskrit?](https://attachments.academia-assets.com/40779634/thumbnails/1.jpg)

Sanskrit being inflectionally rich, the conventional wisdom about Sanskrit word order is that it ... more Sanskrit being inflectionally rich, the conventional
wisdom about Sanskrit word order is that it is free. The
concept of sannidhi (proximity), one of the necessary fac-
tors in the process of verbal cognition, provides a con-
straint on the word order of Sanskrit. We study the free
word order of Sanskrit in the light of the dependency
framework. The weak non-projectivity condition on de-
pendency graphs captures the sannidhi constraint. Gillon
worked within the framework of phrase-structure syntax
and noted that the freeness is constrained by clause bound-
aries. In an examination of the cases of dislocation ob-
served by Gillon and all verses of the Bhagavadg ̄ıt ̄a , we
notice that two relations, viz. adjectival and genitive, are
more frequently involved in sannidhi violation. We con-
clude that the relations involved in sannidhi violation cor-
respond to utthaapya-aakaa.nk.saa(expectancy which is to be
raised) barring a few exceptional cases

Research paper thumbnail of Discourse Level Tagger for Mahabhasya - a Sanskrit Commentary on Panini's Grammar

Mahābhās . ya is an important commentary on Pān . ini's grammar for Sanskrit and is highly struct... more Mahābhās . ya is an important commentary on Pān . ini's grammar for Sanskrit and is highly structured. The traditional scholars have tagged it manually showing its underlying discourse structure. The traditional grammar also discusses clues for discourse level annotations. Taking into account these clues we have developed an automatic tagger for tagging the Mahābhās . ya. This tagger is described in this paper, along with its performance evaluation. We have also extended this tag-set to on another important textŚābarabhās . ya.

Research paper thumbnail of Computer Simulation of Ashtadhyayi: Some insights

Pān . ini's As . t .ā dhyāyī is often compared to a computer program for its rigour and coverage ... more Pān . ini's As . t .ā dhyāyī is often compared to a computer program for its rigour and coverage of the then prevalent Sanskrit language. The emergence of computer science has given a new dimension to the Pān . inian studies as is evident from the recent efforts by Mishra [?], Hyman [?] and Scharf [?]. Ours is an attempt to discover programming concepts, techniques and paradigms employed by Pān . ini. We discuss how the three sūtras: pūrvatrāsiddham 8.2.1, asiddhavad atrābhāt 6.4.22, and s . atvatukor asiddhah . 6.1.86 play a major role in the ordering of the sūtras and provide a model which can be best described with privacy of data spaces. For conflict resolution, we use two criteria: utsarga-apavāda relation between sūtras, and the word integrity principle. However, this needs further revision. The implementation is still in progress. The current implementation of inflectional morphology to derive a speech form is discussed in detail.

Research paper thumbnail of Panini An Information Scientist

Research paper thumbnail of Information Coding in a language: Some insights from Pan. inian Grammar

The knowledge of how a language codes information, how much information it codes and where it cod... more The knowledge of how a language codes information, how much information it codes and where it codes the information is very crucial for a computational linguist working in the area of Natural Language Processing and in particular Machine Translation.

Research paper thumbnail of Introduction Accents and their importance Applicability of Accents Conclusion Importance of Accent in Pāṇinīya Dhātupāṭha

Research paper thumbnail of Use of Amarakosha and Hindi wordnet in building a Network of Sanskrit Words

Sanskrit has a rich source of lexical resources in the form of various kinds of dictionaries, and... more Sanskrit has a rich source of lexical resources in the form of various kinds of dictionaries, and a thesaurus in the form of Amarakośa.

Research paper thumbnail of The Knowledge Structure in Sanskrit ko'sas

The Sanskrit kośas such as Amarakośa, Vaijayantikośa etc. have a built in knowledge structure of ... more The Sanskrit kośas such as Amarakośa, Vaijayantikośa etc. have a built in knowledge structure of its own which apart from revealing the ontological classication, provides a holistic view of various concepts. Knowledge in these kośas concerns with many non-observational, culture specic facts. In this paper we present a few representative examples of the concept clusters from the two Sanskrit kośas; Amarakośa and Vaijayantkośa. There is a necessity to make these valuable resources available in suitable e-form so that the NLP community working in Indian Languages can be benitted. Adidevādhyāyah . (supreme diety) Lokapālādhyāyah . (guardian deities) Yaks .ā dhyāyah . (semi-divine beings) • Antariks . akakān . d . ah . (sky) Jyotiradhyāyah . (light) Meghādhyāyah . (cloud) Khagādhyāyah . (bird) Sabdādhyāyah . (sound) • Būmikān . d . ah . (earth) Deśādhyāyah . (place) Sailādhyāyah . (hill) Vanādhyāyah . (forest) Paśusa ngrahādhyāyah . (animals) Manus . yādhyāyah . (mankind) Brāhman .ā dhyāyah . (priest tribe) Ks . atriyādhyāyah . (military tribe) Vaiśyādhyāyah . (bussiness tribe) Sūdrādhyāyah . (mixed class)

Research paper thumbnail of The Knowledge Structure in Amarakośa

Sanskrit Computational Linguistics, Jan 1, 2010

Amarakośa is the most celebrated and authoritative ancient thesaurus of Sanskrit. It is one of th... more Amarakośa is the most celebrated and authoritative ancient thesaurus of Sanskrit. It is one of the books which an Indian child learning through Indian traditional educational system memorizes as early as his first year of formal learning. Though it appears as a linear list of words, close inspection of it shows a rich organisation of words expressing various relations a word bears with other words. Thus when a child studies Amarakośa further, the linear list of words unfolds into a knowledge web. In this paper we describe our effort to make the implicit knowledge in Amarakośa explicit. A model for storing such structure is discussed and a web tool is described that answers the queries by reconstructing the links among words from the structured tables dynamically.

Research paper thumbnail of Developing network of Sanskrit words across Part-Of-Speech categories

… of National Seminar …, Jan 1, 2009

Research paper thumbnail of Use of Amarakosha and Hindi wordnet in building a Network of Sanskrit Words Akshar Bharati, Amba Kulkarni and Shivaja Nair

Sanskrit has a rich source of lexical resources in the form of various kinds of dictionaries, and... more Sanskrit has a rich source of lexical resources in the form of various kinds of dictionaries, and a thesaurus in the form of Amarakośa.

Research paper thumbnail of Comparison of Pan. inıya Dhatuvr. ttis

Abstract. In this paper we note the importance of positing a canonical form for verbal root and i... more Abstract. In this paper we note the importance of positing a canonical form for verbal root and its meaning to facilitate the comparison of various Dhatuvr. ttis. We also provide some quantitative measure of the differences in the Dhatuvr. ttis after correlating four Dhatuvr. ttis using canonical forms of roots and meanings. Keywords: Pan. inıya Dhatupat. ha, canonical form, quantitative analysis.

Research paper thumbnail of Sanskrit Morphological analyzer: Some Issues

Bh. K Festschrift volume by LSI, Jan 1, 2009

Research paper thumbnail of Building a Wide Coverage Morphological Analyser for Sanskrit: A Practical Approach

For an inflectionally rich language like Sanskrit, any NLP application demands a good morphologic... more For an inflectionally rich language like Sanskrit, any NLP application demands a good morphological analyzer. Though Sanskrit is the best-analyzed language in the world, a good coverage morphological analyzer for it is still not available. This paper points out the complexity involved in building a wide coverage analyzer for Sanskrit and then describes a morphological analyzer that has been built using the available eresources, based on ad-hoc principles. The coverage of this analyzer is around 95%. Though for practical applications, this is not an acceptable figure, it can however be used as a stepping-stone to develop other modules such as sandhi splitter, search engine, etc. At a later stage, it may be replaced by a module that is based on the classic aÀt¡dhy¡y¢.

Research paper thumbnail of Clues from As. t. ŻadhyŻayŻı for compound type identification

Abstract. As. tŻadhyŻayŻı has a section of rules which provide conditions for compound formation.... more Abstract. As. tŻadhyŻayŻı has a section of rules which provide conditions for compound formation. These rules are presented from generation point of view. We study these conditions from the point of view of compound type identification. A rule based classifier based on these rules is developed whose performance on some of the compound types is encouraging. These conditions also suggest the type of information lexical databases should contain for automatic language analysis, including a compound classifier.

Research paper thumbnail of Sanskrit Compound Paraphrase Generator

Sanskrit is very rich in compound formation unlike modern Indian Languages. The compound formatio... more Sanskrit is very rich in compound formation unlike modern Indian Languages. The compound formation being productive it forms an open-set and as such it is also not possible to list all the compounds in a dictionary. The compound formation involves a mandatory sandhi. But mere sandhi splitting does not help a reader in identifying the meaning of a compound, since typically a compound does not code the relation between its components explicitly. To understand the meaning of a compound, it is necessary to identify its components and discover the relation between them. An expression providing the meaning of a compound is called a paraphrase.

Research paper thumbnail of Sanskrit Compound Processor

Sanskrit Computational Linguistics, Jan 1, 2010

Research paper thumbnail of Statistical Constituency parser for Sanskrit compounds Amba Kulkarni and Anil Kumar

Sanskrit is very rich in compound formation. Typically a compound does not code the relation betw... more Sanskrit is very rich in compound formation. Typically a compound does not code the relation between its components explicitly. To understand the meaning of a compound, it is necessary to identify its components, identify the way the components group together, discover the relations between them and finally generate a paraphrase of the compound. In this paper, we discuss our efforts in building a constituency parser for Sanskrit compounds. The average performance of this parser is 85%.

Research paper thumbnail of Pāņini's Ashţādhyāyī: A Computer Scientist's Perspective

These are the presentation slides of my talk at NIAS, Bangalore

Research paper thumbnail of Subject in English is Abhihita

![Research paper thumbnail of How free is `free' word order in Sanskrit?](https://attachments.academia-assets.com/40779634/thumbnails/1.jpg)

Sanskrit being inflectionally rich, the conventional wisdom about Sanskrit word order is that it ... more Sanskrit being inflectionally rich, the conventional
wisdom about Sanskrit word order is that it is free. The
concept of sannidhi (proximity), one of the necessary fac-
tors in the process of verbal cognition, provides a con-
straint on the word order of Sanskrit. We study the free
word order of Sanskrit in the light of the dependency
framework. The weak non-projectivity condition on de-
pendency graphs captures the sannidhi constraint. Gillon
worked within the framework of phrase-structure syntax
and noted that the freeness is constrained by clause bound-
aries. In an examination of the cases of dislocation ob-
served by Gillon and all verses of the Bhagavadg ̄ıt ̄a , we
notice that two relations, viz. adjectival and genitive, are
more frequently involved in sannidhi violation. We con-
clude that the relations involved in sannidhi violation cor-
respond to utthaapya-aakaa.nk.saa(expectancy which is to be
raised) barring a few exceptional cases

Research paper thumbnail of Discourse Level Tagger for Mahabhasya - a Sanskrit Commentary on Panini's Grammar

Mahābhās . ya is an important commentary on Pān . ini's grammar for Sanskrit and is highly struct... more Mahābhās . ya is an important commentary on Pān . ini's grammar for Sanskrit and is highly structured. The traditional scholars have tagged it manually showing its underlying discourse structure. The traditional grammar also discusses clues for discourse level annotations. Taking into account these clues we have developed an automatic tagger for tagging the Mahābhās . ya. This tagger is described in this paper, along with its performance evaluation. We have also extended this tag-set to on another important textŚābarabhās . ya.

Research paper thumbnail of Computer Simulation of Ashtadhyayi: Some insights

Pān . ini's As . t .ā dhyāyī is often compared to a computer program for its rigour and coverage ... more Pān . ini's As . t .ā dhyāyī is often compared to a computer program for its rigour and coverage of the then prevalent Sanskrit language. The emergence of computer science has given a new dimension to the Pān . inian studies as is evident from the recent efforts by Mishra [?], Hyman [?] and Scharf [?]. Ours is an attempt to discover programming concepts, techniques and paradigms employed by Pān . ini. We discuss how the three sūtras: pūrvatrāsiddham 8.2.1, asiddhavad atrābhāt 6.4.22, and s . atvatukor asiddhah . 6.1.86 play a major role in the ordering of the sūtras and provide a model which can be best described with privacy of data spaces. For conflict resolution, we use two criteria: utsarga-apavāda relation between sūtras, and the word integrity principle. However, this needs further revision. The implementation is still in progress. The current implementation of inflectional morphology to derive a speech form is discussed in detail.

Research paper thumbnail of Panini An Information Scientist

Research paper thumbnail of Information Coding in a language: Some insights from Pan. inian Grammar

The knowledge of how a language codes information, how much information it codes and where it cod... more The knowledge of how a language codes information, how much information it codes and where it codes the information is very crucial for a computational linguist working in the area of Natural Language Processing and in particular Machine Translation.

Research paper thumbnail of Introduction Accents and their importance Applicability of Accents Conclusion Importance of Accent in Pāṇinīya Dhātupāṭha

Research paper thumbnail of Use of Amarakosha and Hindi wordnet in building a Network of Sanskrit Words

Sanskrit has a rich source of lexical resources in the form of various kinds of dictionaries, and... more Sanskrit has a rich source of lexical resources in the form of various kinds of dictionaries, and a thesaurus in the form of Amarakośa.

Research paper thumbnail of The Knowledge Structure in Sanskrit ko'sas

The Sanskrit kośas such as Amarakośa, Vaijayantikośa etc. have a built in knowledge structure of ... more The Sanskrit kośas such as Amarakośa, Vaijayantikośa etc. have a built in knowledge structure of its own which apart from revealing the ontological classication, provides a holistic view of various concepts. Knowledge in these kośas concerns with many non-observational, culture specic facts. In this paper we present a few representative examples of the concept clusters from the two Sanskrit kośas; Amarakośa and Vaijayantkośa. There is a necessity to make these valuable resources available in suitable e-form so that the NLP community working in Indian Languages can be benitted. Adidevādhyāyah . (supreme diety) Lokapālādhyāyah . (guardian deities) Yaks .ā dhyāyah . (semi-divine beings) • Antariks . akakān . d . ah . (sky) Jyotiradhyāyah . (light) Meghādhyāyah . (cloud) Khagādhyāyah . (bird) Sabdādhyāyah . (sound) • Būmikān . d . ah . (earth) Deśādhyāyah . (place) Sailādhyāyah . (hill) Vanādhyāyah . (forest) Paśusa ngrahādhyāyah . (animals) Manus . yādhyāyah . (mankind) Brāhman .ā dhyāyah . (priest tribe) Ks . atriyādhyāyah . (military tribe) Vaiśyādhyāyah . (bussiness tribe) Sūdrādhyāyah . (mixed class)

Research paper thumbnail of The Knowledge Structure in Amarakośa

Sanskrit Computational Linguistics, Jan 1, 2010

Amarakośa is the most celebrated and authoritative ancient thesaurus of Sanskrit. It is one of th... more Amarakośa is the most celebrated and authoritative ancient thesaurus of Sanskrit. It is one of the books which an Indian child learning through Indian traditional educational system memorizes as early as his first year of formal learning. Though it appears as a linear list of words, close inspection of it shows a rich organisation of words expressing various relations a word bears with other words. Thus when a child studies Amarakośa further, the linear list of words unfolds into a knowledge web. In this paper we describe our effort to make the implicit knowledge in Amarakośa explicit. A model for storing such structure is discussed and a web tool is described that answers the queries by reconstructing the links among words from the structured tables dynamically.

Research paper thumbnail of Developing network of Sanskrit words across Part-Of-Speech categories

… of National Seminar …, Jan 1, 2009

Research paper thumbnail of Use of Amarakosha and Hindi wordnet in building a Network of Sanskrit Words Akshar Bharati, Amba Kulkarni and Shivaja Nair

Sanskrit has a rich source of lexical resources in the form of various kinds of dictionaries, and... more Sanskrit has a rich source of lexical resources in the form of various kinds of dictionaries, and a thesaurus in the form of Amarakośa.

Research paper thumbnail of Comparison of Pan. inıya Dhatuvr. ttis

Abstract. In this paper we note the importance of positing a canonical form for verbal root and i... more Abstract. In this paper we note the importance of positing a canonical form for verbal root and its meaning to facilitate the comparison of various Dhatuvr. ttis. We also provide some quantitative measure of the differences in the Dhatuvr. ttis after correlating four Dhatuvr. ttis using canonical forms of roots and meanings. Keywords: Pan. inıya Dhatupat. ha, canonical form, quantitative analysis.

Research paper thumbnail of Sanskrit Morphological analyzer: Some Issues

Bh. K Festschrift volume by LSI, Jan 1, 2009

Research paper thumbnail of Building a Wide Coverage Morphological Analyser for Sanskrit: A Practical Approach

For an inflectionally rich language like Sanskrit, any NLP application demands a good morphologic... more For an inflectionally rich language like Sanskrit, any NLP application demands a good morphological analyzer. Though Sanskrit is the best-analyzed language in the world, a good coverage morphological analyzer for it is still not available. This paper points out the complexity involved in building a wide coverage analyzer for Sanskrit and then describes a morphological analyzer that has been built using the available eresources, based on ad-hoc principles. The coverage of this analyzer is around 95%. Though for practical applications, this is not an acceptable figure, it can however be used as a stepping-stone to develop other modules such as sandhi splitter, search engine, etc. At a later stage, it may be replaced by a module that is based on the classic aÀt¡dhy¡y¢.

Research paper thumbnail of Clues from As. t. ŻadhyŻayŻı for compound type identification

Abstract. As. tŻadhyŻayŻı has a section of rules which provide conditions for compound formation.... more Abstract. As. tŻadhyŻayŻı has a section of rules which provide conditions for compound formation. These rules are presented from generation point of view. We study these conditions from the point of view of compound type identification. A rule based classifier based on these rules is developed whose performance on some of the compound types is encouraging. These conditions also suggest the type of information lexical databases should contain for automatic language analysis, including a compound classifier.

Research paper thumbnail of Sanskrit Compound Paraphrase Generator

Sanskrit is very rich in compound formation unlike modern Indian Languages. The compound formatio... more Sanskrit is very rich in compound formation unlike modern Indian Languages. The compound formation being productive it forms an open-set and as such it is also not possible to list all the compounds in a dictionary. The compound formation involves a mandatory sandhi. But mere sandhi splitting does not help a reader in identifying the meaning of a compound, since typically a compound does not code the relation between its components explicitly. To understand the meaning of a compound, it is necessary to identify its components and discover the relation between them. An expression providing the meaning of a compound is called a paraphrase.

Research paper thumbnail of Sanskrit Compound Processor

Sanskrit Computational Linguistics, Jan 1, 2010

Research paper thumbnail of Statistical Constituency parser for Sanskrit compounds Amba Kulkarni and Anil Kumar

Sanskrit is very rich in compound formation. Typically a compound does not code the relation betw... more Sanskrit is very rich in compound formation. Typically a compound does not code the relation between its components explicitly. To understand the meaning of a compound, it is necessary to identify its components, identify the way the components group together, discover the relations between them and finally generate a paraphrase of the compound. In this paper, we discuss our efforts in building a constituency parser for Sanskrit compounds. The average performance of this parser is 85%.

Research paper thumbnail of Building Morphological Analysers and Generators for Indian Languages using FST

Research paper thumbnail of Semantic Processing of Compounds in Indian Languages Amba kulkarni

Compounds occur very frequently in Indian Languages. There are no strict orthographic conventions... more Compounds occur very frequently in Indian Languages. There are no strict orthographic conventions for compounds in modern Indian Languages. In this paper, Sanskrit compounding system is examined thoroughly and the insight gained from the Sanskrit grammar is applied for the analysis of compounds in Hindi and Marathi. It is interesting to note that compounding in Hindi deviates from that in Sanskrit in two aspects. The data analysed for Hindi does not contain any instance of Bahuvrīhi (exo-centric) compound. Second, Hindi data presents many cases where quite a lot of compounds require a verb as well as vibhakti(a case marker) for its paraphrasing. Compounds requiring a verb for paraphrasing are termed as madhyama-pada-lopī in Sanskrit, and they are found to be rare in Sanskrit.

Research paper thumbnail of Agreement in Hindi Conjunct Verbs

1 The conjunct verbs in Hindi pose a problem with respect to the agreement. Shapiro has observed ... more 1 The conjunct verbs in Hindi pose a problem with respect to the agreement. Shapiro has observed that when the nominal element of a conjunct verb functions as a direct object of the conjunct verb, then the verb shows an agreement with its nominal element. In this paper we give the syntactico-semantic criterion to decide whether the nominal elemnet of a verb is an argument of a conjunct verb or not and give rules for agreement decisions in such cases.

Research paper thumbnail of LERIL: Collaborative effort for creating lexical resources

Arxiv preprint cs/ …, Jan 1, 2003

The paper reports on efforts taken to create lexical resources pertaining to Indian languages, us... more The paper reports on efforts taken to create lexical resources pertaining to Indian languages, using the collaborative model. The lexical resources being developed are: (1) Transfer lexicon and grammar from English to several Indian languages.

Research paper thumbnail of AnnCorra: building tree-banks in Indian languages

Abstract This paper describes a dependency based tagging scheme for creating tree banks for India... more Abstract This paper describes a dependency based tagging scheme for creating tree banks for Indian languages. The scheme has been so designed that it is comprehensive, easy to use with linear notation and economical in typing effort. It is based on Paninian grammatical model.

Research paper thumbnail of Machine translation activities in India: A survey

In the Proceedings of workshop on …, Jan 1, 2002

Research paper thumbnail of Urdu-Hindi-Urdu Machine Translation: Some Problems

Abstract In this paper we discuss the problems in Urdu-Hindi-Urdu Machine Translation at various ... more Abstract In this paper we discuss the problems in Urdu-Hindi-Urdu Machine Translation at various levels. Though because of large common vocabulary it may sound that only transliteration can help to overcome the language barrier between Urdu and Hindi, the tendency of Urdu to use words from Persian and Arabic origin, and the tendency of Hindi to use words of Sanskrit origin, call for the use of proper Machine Translation System.

Research paper thumbnail of Enhancing effectiveness of sentence alignment in parallel corpora: Using MT heuristics

Abstract India is a multilingual, linguistically dense and diverse country with rich resources of... more Abstract India is a multilingual, linguistically dense and diverse country with rich resources of information. Parallel corpora have major role in multilingual natural language processing, computational linguistics, speech and information retrieval. This paper describes an alignment system for aligning English-Hindi texts in Gyan-Nidhi corpus at sentence level. The criteria used for alignment is combination of linguistic, statistical information and simple heuristics.

Research paper thumbnail of English from Hindi viewpoint: A Paaninian perspective

Research paper thumbnail of Phrase in English a Pada in Paninian Grammar Akshar Bharati, Sukhada, Dipti Sharma and Amba Kulkarni

Panini in his As . t .ā dhyāyī not only provides a grammar for Sanskrit but a grammar formalism t... more Panini in his As . t .ā dhyāyī not only provides a grammar for Sanskrit but a grammar formalism that can be applied to other languages as well. There is a tradition of grammars for various Indian languages written in this formalism. The use of computers as an information processing device demands a sound theory for processing the information in a language string. Pān . inian way of analysis of a langauge provides such a theory. In order to use Pān . inian theory for analysis of other languages, it is necessary to model these languages in terms of Pān . inian primitives such as pada, sup, tiṅ, kr . t, vibhakti, etc. This paper presents an attempt at modelling English in Pān . inian framework. In an earlier effort(Bharati,forthcoming) it was shown that the notion of subject in English corresponds to the notion of an abhihita with a few systematic exceptions.

Research paper thumbnail of English from Hindi viewpoint: A Paninian perspective

On the one hand with the world wide web spreading all over the world, information is now availabl... more On the one hand with the world wide web spreading all over the world, information is now available at the click of a mouse. However, most of the information is in English. In India hardly 5 10% of the population can understand English. Hence, if India has to take real advantage of the new technology, it is necessary to make this information available to the Indians in Indian languages. On the other hand, it is well known that Fully Automatic High Quality Machine Translation is impossible in near future.

Research paper thumbnail of Subject in English is abhihita

Research paper thumbnail of Anusaaraka: Machine Translation in Stages

Computing Research Repository, 2003

Fully-automatic general-purpose high-quality machine translation systems (FGH-MT) are extremely d... more Fully-automatic general-purpose high-quality machine translation systems (FGH-MT) are extremely difficult to build. In fact, there is no system in the world for any pair of languages which qualifies to be called FGH-MT. The reasons are not far to seek. Translation is a creative process which involves interpretation of the given text by the translator. Translation would also vary depending on the audience and the purpose for which it is meant. This would explain the difficulty of building a machine translation system. Since, the machine is not capable of interpreting a general text with sufficient accuracy automatically at present -let alone re-expressing it for a given audience, it fails to perform as FGH-MT. FOOTNOTE{The major difficulty that the machine facesin interpreting a given text is the lack of general world knowledge or common sense knowledge.} To understand the nature of the difficulty, let us consider the following sentence in Hindi: chAvala rAma khAtA hE rice(m.) Ram(m.) eats(m.) Ram eats rice.

Research paper thumbnail of Anusaaraka: overcoming the language barrier in India

Arxiv preprint cs/ …, Jan 1, 2003

The anusaaraka system makes text in one Indian language accessible in another Indian language. In... more The anusaaraka system makes text in one Indian language accessible in another Indian language. In the anusaaraka approach, the load is so divided between man and computer that the language load is taken by the machine, and the interpretation of the text is left to the man. The machine presents an image of the source text in a language close to the target language.In the image, some constructions of the source language (which do not have equivalents) spill over to the output. Some special notation is also devised. The user after some training learns to read and understand the output. Because the Indian languages are close, the learning time of the output language is short, and is expected to be around 2 weeks.

Research paper thumbnail of Anusaaraka An approach for MT taking insights from the Indian Grammatical Tradition

Research paper thumbnail of Anusaaraka: A better approach to Machine Translation {A case study for English-Hindi/Telugu}

Research paper thumbnail of Design and Architecture of 'Anusaaraka'-An Approach to Machine Translation

Satyam Techical Review, Jan 1, 2003

Most research in Machine translation is about having the computers completely bear the load of tr... more Most research in Machine translation is about having the computers completely bear the load of translating one human language into another. This paper looks at the machine translation problem afresh and observes that there is a need to share the load between man and machine, distinguish 'reliable' knowledge from the 'heuristics', provide a spectrum of outputs to serve different strata of people, and finally make use of existing resources instead of reinventing the wheel. This paper describes the architecture and design of 'Anusaaraka' based on the fundamental premise of sharing the load, resulting in "good enough" results according to the needs of the reader. The architecture differs from the conventional in three major ways:

Research paper thumbnail of Anusaaraka An approach to Machine Translation on 4th

Research paper thumbnail of Language Access: An Information Based Approach

Arxiv preprint cs/0308019, Jan 1, 2003

The anusaaraka system (a kind of machine translation system ) makes text in one Indian language a... more The anusaaraka system (a kind of machine translation system ) makes text in one Indian language accessible through another Indian language. The machine presents an image of the source text in a language close to the target language. In the image, some constructions of the source language (which do not have equivalents in the target language) spill over to the output. Some special notation is also devised.

Research paper thumbnail of Anusaaraka An Accessor cum Machine Translator

Research paper thumbnail of WSD of To-Infinitive into Hindi An Information Based Approach

Word Sense Disambiguation (WSD) is a major problem in Machine Translation (MT). There have been s... more Word Sense Disambiguation (WSD) is a major problem in Machine Translation (MT). There have been several attempts to handle WSD 1 . To develop WSD rules manually is laborious and time consuming. Moreover, if the rules are developed for bilingual WSD, they may be languagepair specific. If the rules are monolingual, it is difficult to decide the granularity for different senses. While the statistical method may be helpful in handling large volumes of language data, it does not give any linguistic insight about the languages involved in MT. There have been attempts to semiautomate the task of WSD 2 . However the rules which the machine learns are in hundreds and it becomes again difficult to gain any linguistic insight from these methods. Rulebased WSD, on the other hand, helps in linguistic analysis, but mostly works with the syntax of a sentence and is hence surface structure dependent. Moreover, in this method, rules are written to suit the systems running on available technology and may have to be changed if a better technology comes about. Of course, we cannot do without writing rules for WSD for running the MTS, but a deeper approach for language analysis dealing the language semantically, which would enlighten us about where the information about the language phenomenon is available would be of greater advantage. The results of such analysis can be always used along with further development in technology as the analysis is independent of technological constrains. Such a method of WSD may be called as Informationbased approach. We illustrate informationbased WSD with an example of sense disambiguation of toinfinitive in English into Hindi. First we look at few English sentences with toinfinitives and their Hindi

Research paper thumbnail of ijcai eng parser

Research paper thumbnail of English Parsers Some Information based observations

Last decade has seen introduction of several parsers for English ranging from rule based to stati... more Last decade has seen introduction of several parsers for English ranging from rule based to statistical based. In recent years there is also a growing trend towards producing dependency output in addition to the constituency trees. The dependency format is preferred over the constituency not only from evaluation point of view but also because of its suitability for a wide range of NLP tasks. However there is no consensus among the dependency parser developers on the number of dependency relations and names of these relations.

Research paper thumbnail of NLP saathii samskrit shaaswraamcaa upayoga in Marathi

  1. ÈèÏÚ×èÂÚÔÛ³: ËÚÏÂÜÍ ÕÚ×èÂèÏÚ¢ÂÞAE ÔÛÕáÖ£ ÔèÍÚ³ÏÁ, AEèÍÚÍ Ô ÌÜÌÚ¢×Ú ÍÚ¢ÂÞAE ËÚÖáÔÏ µØAE ¸Û¢ÂAE... more 1) ÈèÏÚ×èÂÚÔÛ³: ËÚÏÂÜÍ ÕÚ×èÂèÏÚ¢ÂÞAE ÔÛÕáÖ£ ÔèÍÚ³ÏÁ, AEèÍÚÍ Ô ÌÜÌÚ¢×Ú ÍÚ¢ÂÞAE ËÚÖáÔÏ µØAE ¸Û¢ÂAE ³áÑá µáÑá ¥Øá. ËÚÖá¸Ú ÌÝ´èÍ ¨ÈÍåµ ÌÚØÛÂÜ¸Ü ÄáÔÚÁ ¶áÔÚÁ ³ÏÁá ØÚ ¥Øá Øá ×̺ÞAE ¶á©AE ËÚÖáÂÜÑ AEÛÏAEÛÏÚÒèÍÚ ×¢³áÂÚ¢¸Ú (ÈÄÕ³èÂÛ Ô ÔÚ³èÍÕ³èÂÛ ÍÚ¢¸Ú) ÕÚ×èÂèÏÕÝÄèÅÏÛÂèÍÚ ¤ËèÍÚ× ÔèÍÚ³ÏÁÚÄÛ ÕÚ×èÂèÏÚ¢ÂÞAE ³áÑáÑÚ ¥ÀÒÂå. ËÚÖÚ ³ÚÌ ³ÕÜ ³ÏÂá, ¤ÃÔÚ Ô³èÂÚ ¥ÈÑèÍÚ ÌAEÚÂÜÑ ÔÛ¸ÚÏ ËÚÖá¸èÍÚ ÌÚÅèÍÌÚÂÞAE ÕèÏåÂèÍÚÈÏè͢ ³×á ÈåØå¸ÔÞ Õ³Âå, ØèÍÚ ×ÚÏ´èÍÚ ÈèÏÕèAEÚ¢¸Ú ÔÛ¸ÚÏ ÍÚ µèÏ¢ÃÚÂÞAE ³áÑáÑÚ ¥ÀÒÂå. ØèÍÚ ÕÚ×èÂèÏÚ¢¸Ú ¨ÈÍåµ ³áÔÒ ÂÂèÔºè¼ÚAEÚÂÜÑ ¸Ïè¸áÈÏè͢¸ ÌÏèÍÚÄÛ ØåÂÚ.ºÔÒºÔÒ 2000 ÔÏèÖÚ¢¸Ü ØÜ ÈÏ¢ÈÏÚ ¥º ÑÝÈè Øå ¸ÚÑÑÜ ¥Øá. ÈÏ¢ÂÝ ¥ÂÚ ×¢µÁ³Ú¸èÍÚ ¨ÈÑÊèÅÂáÌÝÒá Natural Language Processing ×Ú¾Ü ØèÍÚ ÕÚ×èÂèÏÚ¢¸Ú ¨ÈÍåµ ³ÏÞAE ¶áÁèÍÚ¸Ü ¬³ ¤ÄèÔÛÂÜÍ ×¢ÅÜ ¥ÈÑèÍÚÑÚ ÈèÏÚÈè »ÚÑÜ ¥Øá. ×¢µÁ³ Øá information processors ÌèØÁÞAE ÔÚÈÏÑá ºÚÂÚÂ. ¬³Ú ×èÔÏÞÈÚ ¨ÈÑÊèÅ ¤×ÑáÑÜ ÌÚØÛÂÜ Ôá¸ÞAE ÔáµÒèÍÚ ×èÔÏÞÈÚ ÂÜ ¨ÈÑÊèÅ ³ÏÞAE ÄáÁá Øá information processors ¸á ³ÚÏèÍ. ËÚÖáÌÅèÍá ¨ÈÑÊèÅ ¤×ÁÚÐèÍÚ ÌÚØÛÂÜÔÏ ºáÔèØÚ ×¢µÁ³ ÈèϳèÏÛÍÚ ³ÏÂå, ÂáÔèØÚ ÂèÍÚ ÈèϳèÏÛÍá× Natural Language Processing ¤×á ×¢ÊåÅÑá ºÚÂá. ØèÍÚ ³ÚÏèÍÚ×Ú¾Ü ×Úغ۳¸ ÔèÍÚ³ÏÁÚÄÛ ÕÚ×èÂèÏÚ¢¸Ú ¨ÈÍåµ Øå© Õ³Âå. ¥ÁÛ ÌèØÁÞAE¸ ØèÍÚ ¬³ÌáÔÚÄèÔÛÂÜÍ ×¢ÅÜ¸Ú ¥ÈÁ ÉÚÍÄÚ ³ÏÞAE ¶èÍÚÍÑÚ ØÔÚ. ¥º ¦¢½ÏAEá½ÔÏ ¦¢µè켆 ËÚÖá Èèϸ¢¿ ÈèÏÌÚÁÚ ÌÚØÛÂÜ ¨ÈÑÊèÅ ¥Øá. ×ÌÚºÚ¸Ü ËÏËÏÚ½ ØÜ ×ÌÚºÚÂÜÑ ¶½³Ú¢³¿á ¤×ÁÚÐèÍÚ ¨ÈÍåµÜ ÌÚØÛÂÜÔÏ ¤ÔÑ¢ÊÞAE ¤×Âá. ÂèÍÚÌÝÒá ØÜ ÌÚØÛÂÜ ºAE×ÚÌÚAEèÍÚ¢AEÚ ÂèÍÚ¢¸èÍÚ ÌÚÂßËÚÖá ¨ÈÑÊèÅ ³ÏÞAE ÄáÁá ØÜ ³ÚÒÚ¸Ü µÏº ¥Øá. ×¢µÁ³Ú¸èÍÚ ×ØÚÍèÍÚAEá ¦¢µè켆 ËÚÖáÂÜÑ ÌÚØÛÂÜ ÌÏÚ¾Ü ËÚÖÛ³Ú¢AEÚ ¨ÈÑÊèÅ ³ÏÞAE ÄáÁèÍÚ×Ú¾Ü ÔèÍÚ³ÏÁÚÄÛ ÕÚ×èÂèÏÚ¢¸Ú ÈÍåµ ³×Ú ³ÏÞ Õ³Âå Øá ¦¢µèϺÜ-ÌÏÚ¾Ü ¤AEÝ×ÚϳڸèÍÚ ×ØÚÍèÍÚAEá ¦Ãá ÔÛÕÄ ³áÑá ¥Øá. ØèÍÚ ¨È³èÏÌÚ¸Ú ¥Á´Ü ¬³ ÉÚÍÄÚ ÌèØÁºá ÍÚ ÕÚ×èÂèÏÚ¢¸èÍÚ ¤ËèÍÚ×ÚÑÚ ¨ÏèºÛÂÚÔ×èÃÚ ÈèÏÚÈè Øå §Ñ, Ô ËÚÖá¸èÍÚ ¤ËèÍÚ×Ú×Ú¾Ü Ñå³Ú¢ÌÅèÍá ¬³ ¨Âè×ÚØ AEÛÏèÌÚÁ Øå §Ñ. 2) ËÚÏÂÜÍ ÔèÍÚ³ÏÁÚÄÛ ÕÚ×èÂèÏÚ¢ÂÜÑ ³ÚØÜ ×¢³ÑèÈAEÚ¢¸Ú NLP ×Ú¾Ü ¨ÈÍåµ: Ô³èÂÚ ÕÊèÄÚ¢¸èÍÚ ÌÚÅèÍÌÚÂÞAE ÕèÏåÂèÍÚ¢ÕÜ ×¢ÔÚÄ ×ÚÅ ¤×Âå. ÈÏ¢ÂÝ ÕèÏåÂèÍÚÑÚ ÌÚÂèÏ ÕÊèÄÚ¢¸Ú ¤Ïèà ÑÚÔÂÚAEÚ 'ÕÊèÄÚ¢¸èÍÚ ÈÑܳ¿á' ºÚÁèÍÚ¸Ü µÏº ËÚ×Âá. ÄÚ. ÌÚ¢ºÏ ÌÚ×ÒÜ ´ÚÂá. ØèÍÚ ÔÚ³èÍÚ ÌÚ¢ºÏ ³ÏèÂÚ Ô ÌÚ×ÒÜ ³ÏèÌ ¥Øá, Øá ÕÊèÄÚ¢ÂÞAE ³Ý¾áØÜ Ôèͳè »ÚÑáÑá AEÚØÜ. ÕèÏåÂÚ ¥ÈÑèÍÚ ×ÚÌÚAEèÍ ºè¼ÚAEÚ¸èÍÚ ¥ÅÚÏÚÔÏ ³ÏèÂÚ ³åÁ Ô ³ÏèÌ ³åÁ Øá ¾ÏÔÂå. 'ÕÊèÄÚ¢¸èÍÚ ÈÑܳ¿á' ºÚ©AE ¤Ïèà ÑÚÔÁá ×ÄèÍÚ¸èÍÚ ¶½³áÑÚ ÂÏÜ ×¢µÁ³Ú× Õ³èÍ AEÚØÜ. ÕÊèÄ,ÕÊèÄ-×ÌÞØ,ÔÚ³èÍϸAEÚ ¦ÂèÍÚÄÛ ÌÚÅèÍÌÚ¢ÄèÔÚÏá ËÚÖÚ ³ÛÂÜ ÌÚØÛÂÜ Ôèͳè ³ÏÂá, ÔÚ³èÍÚ¸Ú ¤Ïèà ÑÚÔÂÚAEÚ ×ÚÌÚAEèÍ ºè¼ÚAEÚ¸Ú ³áÔèØÚ Ô ³×Ú ¨ÈÍåµ ØåÂå, ØèÍÚ µåÖè½Ü ºÏ ×èÈÖè½ÈÁá ³ÒÑèÍÚ ÂÏ ×¢µÁ³Ú³¿ÞAE ³åÁÂá ³ÚÌ ¥º ³ÏÔÞAE ¶á© Õ³Âå, ³åÁÂá AEÚØÜ Øá ³ÒÁèÍÚ× ÌÄ ØåÂá. ËÚÖÚ ÕÊèÄ,ÔÚ³èÍϸAEÚ ¥ÄÛ¢¸èÍÚ ÌÚÅèÍÌÚÂÞAE ³ÛÂÜ ÌÚØÛÂÜ Ôèͳè ³ÏÂá Øá ×̺ÚÔÞAE ¶áÂÚAEÚ ËÚÏÂÜÍ ÔèÍÚ³ÏÁÚÂÜÑ ×¢³ÑèÈAEÚ¢¸Ú ¥ÌèØÚ¢× ³×Ú ¨ÈÍåµ »ÚÑÚ ÍÚ¸Ü ØÜ ³ÚØÜ ¨ÄÚØÏÁá. ¤) ÈÄÕ³èÂÛ-ÔÚ³èÍÕ³èÂÛ: ÈèÏÂèÍá³ ËÚÖÚ ÌÚØÛÂÜ Ôèͳè ³ÏÁèÍÚ×Ú¾Ü ³ÚØÜ ×¢³áÂÚ¢¸Ú ¨ÈÍåµ

Research paper thumbnail of Encapsulating quantifiers with the typed variables

The asymmetry in the translation of Natural language sentences involving existential and universa... more The asymmetry in the translation of Natural language sentences involving existential and universal quantifiers is well known. It is possible to get rid of this asymmetry by postulating 'quantified typed' variables. In this presentation, we define a 'quantified typed' variable and the algebra associated with these variables to prove the deductions using the method of reductio-ad-absurdum.

Research paper thumbnail of Navya Nyaya for Scientists and Technologists A First Step

Research paper thumbnail of Telugu Spell-Checker

Spell Checker is an application which handles spelling errors and Spelling Variations (SV). All t... more Spell Checker is an application which handles spelling errors and Spelling Variations (SV). All the misspelt words are marked and allowed for correction. This system also can be used as an editor where the text is checked for spelling errors and suggestion for correction are provided. Telugu is an agglutinating language and has a very complex morphology which is coupled with prolific sandhi or morphophonemics. The sandhi that is noticed in Telugu is not limited to internal but also external. Both consonantal and vocalic sandhi are common and well-studied in Telugu [Krishnamurti, 1957, 1985]. To identify the specific sandhi type and split it appropriately is a very challenging task. External sandhi is a linguistic phenomenon which refers to a set of changes that occur at word boundaries. These changes are similar to phonological processes such as substition (modification by various means) deletion, and insertion. External sandhi is often orthographically reflected in Telugu. External sandhi in such cases, causes the formation of such forms which are morphologically unanalyzable, thus posing a problem for all kinds of NLP applications. In this paper, we discuss in detail the processes external sandhi in Telugu and the Computational tool the Spell Checker.

Research paper thumbnail of A TELUGU MORPHOLOGICAL ANALYZER

A Morphological Analyzer (MA) is a program which compiles and analyses words of a natural languag... more A Morphological Analyzer (MA) is a program which compiles and analyses words of a natural language into their roots and their constituent morpho-syntactic elements along with their attributes. The present paper demonstrates computational implementation of a Morphological Analyzer for Telugu. The algorithm used to build this MA is theoretically justified and is practically executed for Telugu in the context of Modern Standard Written variety. The present proposal is a demonstration of the optimal organization of linguistic database and its performance in computational environment by ensuring high precision and coverage in the parsing of wordforms. The current MA engine's coverage may range between 95-97% on a variety of corpora (3 million word length corpus).

Research paper thumbnail of Telugu Spell Checker

Spell Checker is an application which handles spelling errors and Spelling Variations (SV). All t... more Spell Checker is an application which handles spelling errors and Spelling Variations (SV). All the misspelt words are marked and allowed for correction. This system also can be used as an editor where the text is checked for spelling errors and suggestion for correction are provided. Telugu is an agglutinating language and has a very complex morphology which is coupled with prolific sandhi or morphophonemics. The sandhi that is noticed in Telugu is not limited to internal but also external. Both consonantal and vocalic sandhi are common and well studied in Telugu [Krishnamurti, 1957[Krishnamurti, , 1985. To identify the specific sandhi type and split it appropriately is a very challenging task. External sandhi is a linguistic phenomenon which refers to a set of changes that occur at word boundaries. These changes are similar to phonological processes such as su b st i ti o n (mo d i fi cat i on b y v ar i ou s me an s) d e l e t i o n , and insertion. External sandhi i s o f t e n orthographically reflected in Telugu. External sandhi in such cases, causes the formation of such forms which are morphologically unanalyzable, thus posing a problem for all kinds of NLP applications. In this paper, we discuss in detail the processes external sandhi in Telugu and the Computational tool the Spell Checker.

Research paper thumbnail of Recursion and Combinatorial Mathematics in Chandashaastra

Abstract: Contribution of Indian Mathematics since Vedic Period has been recognised by the Histor... more Abstract: Contribution of Indian Mathematics since Vedic Period has been recognised by the Historians. Pingala (200 BC) in his book on'Chandashaastra', a text related to the description and analysis of meters in poetic work, describes algorithms which deal with the Combinatorial Mathematics. These algorithms essentially deal with the conversion of Binary numbers to Decimal numbers and vice versa, finding the value of'n choose r', evaluating 2^ n, etc. All these algorithms are recursive in nature.

Research paper thumbnail of ISCII Plugin for displaying Indian langauge web pages

There is a chaos as far as the Indian languages in electronic form are concerned. Neither can one... more There is a chaos as far as the Indian languages in electronic form are concerned. Neither can one exchange the notes in Indian languages as conveniently as in English language, nor can one perform search on texts in Indian languages available over the web. This is so because the texts are being stored in font dependent glyph codes.

Research paper thumbnail of Devanagari lipi Ora sanganaka

Research paper thumbnail of Proposed Vedic Sanskrit Coding Scheme Some suggestions

Indian languages belong to four different families. However, as far as scripts are concerned, all... more Indian languages belong to four different families. However, as far as scripts are concerned, all of them (expect for PersoArabic script) are derived from the Brahmi script. Indian languages are compositionally syllabic. They have a scientific phonetic base and all the syllables are derived from the phonemes compositionally. They have flexibility and can be used as alphabetic or as syllabic as the need demands. Whereas the syllabic version is suitable for writing concisely, the alphabetic is suitable for performing the linguistic operations, such as sandhi operation, morphological analysis, sorting, searching, etc.

Research paper thumbnail of Discourse Level Tagger for Mahaabhaa.sya -a Sanskrit Commentary on Paa.n ini's Grammar

Mahaabhaa.sya is an important commen-tary on Paa.nini's grammar for Sanskrit and is highly st... more Mahaabhaa.sya is an important commen-tary on Paa.nini's grammar for Sanskrit and is highly structured. The tradi-tional scholars have tagged it manually showing its underlying discourse struc-ture. The traditional grammar also dis-cusses clues for discourse level annota-tions. Taking into account these clues we have developed an automatic tag-ger for tagging the Mah¯ abh¯ as . ya. This tagger is described in this paper, along with its performance evaluation. We have also extended this tag-set to on another important tex S'abarabhaa.sya.

Research paper thumbnail of NLP s=adth=i Samskrit 's=astr=a.mc=a upayoga(in Marathi)

Research paper thumbnail of Sanskrit Computational Linguistics, Third International Symposium, Hyderabad, India, January 15-17, 2009. Proceedings

Series Editors Randy Goebel, University of Alberta, Edmonton, Canada Jörg Siekmann, University of... more Series Editors Randy Goebel, University of Alberta, Edmonton, Canada Jörg Siekmann, University of Saarland, Saarbrücken, Germany Wolfgang Wahlster, DFKI and University of Saarland, Saarbrücken, Germany Volume Editors Amba Kulkarni University of Hyderabad, ...

Research paper thumbnail of Converting Phrase Structures to Dependency Structures in Sanskrit

Two annotations schemes for presenting the parsed structures are prevalent viz. the constituency ... more Two annotations schemes for presenting the parsed structures are prevalent viz. the constituency structure and the dependency structure. While the constituency trees mark the relations due to positions, the dependency relations mark the semantic dependencies. Free word order languages like Sanskrit pose more problems for constituency parses since the elements within a phrase are dislocated. In this work, we show how the enriched constituency tree with the information of displacement can help construct the unlabelled dependency tree automatically.

Research paper thumbnail of Building a wide coverage Sanskrit morphological analyzer: A practical approach

For an inflectionally rich language like Sanskrit, any NLP application demands a good morphologic... more For an inflectionally rich language like Sanskrit, any NLP application demands a good morphological analyzer. Though Sanskrit is the best-analyzed language in the world, a good coverage morphological analyzer for it is still not available. This paper points out the complexity involved in building a wide coverage analyzer for Sanskrit and then describes a morphological analyzer that has been built using the available e-resources, based on ad-hoc principles. The coverage of this analyzer is around 95%. Though for practical applications, this is not an acceptable figure, it can however be used as a stepping-stone to develop other modules such as sandhi splitter, search engine, etc. At a later stage, it may be replaced by a module that is based on the classic aÀt¡dhy¡y¢.

Research paper thumbnail of Grammarians' interface for English Parsers

Research paper thumbnail of Human Understandable Machine Learning

Research paper thumbnail of Analysis and Representation of Navya-Nyaaya Expressions

In this paper we present a semi-automatic computational tool to represent a Navya Nyāya expressio... more In this paper we present a semi-automatic computational tool to represent a Navya Nyāya expressions through Conceptual Graphs of Sowa. This tool consists of a domain specific segmenter, a semi-automatic constituency parser and a context free parser that translates an NN Expressions into a Conceptual Graph.

Research paper thumbnail of Segmentation of Navya-Nyaaya Expressions

Navya-Nyaaya (NN), a school of Indian logic and philosophy, has evolved a sophisticated language ... more Navya-Nyaaya (NN), a school of Indian logic and philosophy, has evolved a sophisticated language to deal with verbal cognition, logic and epistemology. This language is known for its use of long compounds, productive use of secondary derivational suffixes, and a special technical vocabulary. In
this paper we present a specially designed domain specific splitter to split the NN compounds into its components.

Research paper thumbnail of anuvada ke upakarana: sanganaka tathA bhaashaae (Tools of Translation: Computer and Languages)

Research paper thumbnail of Generating Converters between Fonts Semi-automatically

Research paper thumbnail of Sanskrit Morphological Analyser: Some Issues", special_issue = "in the Festscrift volume of Bh. Krishnamoorty

Research paper thumbnail of Comparative Study of P=adnin=iya Dh=atuvdrttis

Research paper thumbnail of Discourse Level Tagger for Mahaabhaa.sya -a Sanskrit Commentary on Paa.n ini's Grammar

Mahaabhaa.sya is an important commen-tary on Paa.nini's grammar for Sanskrit and is highly st... more Mahaabhaa.sya is an important commen-tary on Paa.nini's grammar for Sanskrit and is highly structured. The tradi-tional scholars have tagged it manually showing its underlying discourse struc-ture. The traditional grammar also dis-cusses clues for discourse level annota-tions. Taking into account these clues we have developed an automatic tag-ger for tagging the Mah¯ abh¯ as . ya. This tagger is described in this paper, along with its performance evaluation. We have also extended this tag-set to on another important tex S'abarabhaa.sya.

Research paper thumbnail of Geeta: Gold Standard Annotated Data, Analysis and its Application

Importance of gold standard data in the field of NLP is well-established. In this paper we descri... more Importance of gold standard data in the field of NLP is well-established. In this paper we describe the development of one such gold standard for Sanskrit annotated at various levels of linguistic analysis. We describe how such a domain specific gold standard data, in addition to being useful for training and evaluation, is also useful for teaching. With the help of a suitable interface of anus\={a}raka we demonstrate its usability for a linguist, and also for a learner.

Research paper thumbnail of Geeta_gold_annotated_data_Icon2013.pdf

Research paper thumbnail of New Challenges in Automatic Translation from an Indian Perspective

Research paper thumbnail of Natural Language Modelling:Course Material for Post Graduate Diploma in Computer Applications in Indian Languages

Research paper thumbnail of Machine Translation:Course Material for Post Graduate Diploma in Computer Applications in Indian Languages

Research paper thumbnail of AnnCorra

Proceedings of the 3rd workshop on Asian language resources and international standardization - COLING '02, 2002

This paper describes a dependency based tagging scheme for creating tree banks for Indian languag... more This paper describes a dependency based tagging scheme for creating tree banks for Indian languages. The scheme has been so designed that it is comprehensive, easy to use with linear notation and economical in typing effort. It is based on Paninian grammatical model.