Sandhi: The Rule Based Word Formation in Hindi

Natural Language processing (NLP) helps a machine to understand the human language. Due to various reasons human language identification and analysis is a very tedious task. One of them is meaning of the words. In NLP, to derive meaning from a sentence, words are treated as data. Therefore, the formation of words is important for NLP. Out of 447 languages, 22 are official languages in India. Hindi being the most popular and used, became the target choice for computerization. Sandhi is a process through which two or more independent words are joined to produce a new meaningful word. In this paper we present an algorithm that performs Sandhi and does Sandhi-Vichchhed (splitting compound words). The algorithm has been tested on 887 unique Hindi words that are compound i.e. Sandhi-Vichchhed can be applied to them.

The paribhāṣās arthavadgrahaṇe nānarthakasya, lakṣaṇapratipadoktayoḥ pratipadoktasyaiva grahaṇam and ekadeśavikṛtam ananyavat - Studies on some Metarules in Pāṇinian system

In his grammatical treatise, the Astadhyayi, Panini includes sutras that state guiding rules for the right interpretation and application of his other directly grammatical sutras. These sutras are called paribhasas. In addition to these paribhasas, the various commentaries on Panini frequently invoke supplementary paribhasas which are not stated explicitly in his Astadhyayi. These paribhasas have been a subject of study since early times after Panini and have also occupied modern scholars on Panini s grammar. In regard to most of them, it remains unsettled even today whether they are used in the Astadhyayi, where they apply, what is their role, and whether they are necessary in arriving at the desired grammatical form. Some scholars go even further and argue that none of such paribhasas were intended by Panini. This study aims to settle this question by dealing with three such of these paribhasas individually considering all the information available in the commentaries in their regard and examining the cases in which, according to commentaries, the paribhasas apply. I select the paribhasas arthavadgrahane nanarthakasya, laksanapratipadoktayoh pratipadoktasyaiva grahanam and ekadesavikrtam ananyavat, which are all considered nyayasiddha or lokanyayasiddha; they express logical and obvious principles which are found in daily life. On this basis, Paniniyas explain why Panini did not mention them in the Astadhyayi. I discuss each paribhasa separately and all the issues it involves. I present and explain the cases where the specified paribhasas are invoked in the major commentaries, the Mahabhasya, the Kasika and the Siddhantakaumudi and the arguments found in the commentaries concerning these cases. If available, I supply other solutions to the difficulties for which these paribhasas are invoked. The study aims to make the issue of these paribhasas clearer, which will help us to reach a solution to the key question, that is, whether Panini has presupposed them in his Astadhyayi. My study shows that Panini has presupposed the paribhasa ekadesavikrtam ananyavat (or a similar principle). He also may have used the paribhasa arthavadgrahane nanarthakasya (or a similar principle) as this paribhasa does not lead to undesired results. As for the paribhasa laksanapratipadoktayoh pratipadoktasyaiva grahanam (or a similar principle), the original scope of this paribhasa was clearly extended by later Paniniyas. Moreover, their interpretation of this paribhasa conflicts with Panini s procedure. If Panini has used this paribhasa, he has used it in a very limited way.

Hindi Word Sense Disambiguation

Machine Translation, 2000

Word Sense Disambiguation (WSD) is defined as the task of finding the correct sense of a word in a specific context. This is crucial for applications like Machine Translation and Information Extraction. While the work on automatic WSD for English is voluminous, to our knowledge, this is the first attempt for an Indian language at automatic WSD. We make use