moromi gogoi | DIBRUGARH UNIVERSITY (original) (raw)
Papers by moromi gogoi
International Journal of Recent Technology and Engineering (IJRTE), 2019
Knowledge is the most powerful weapon of a society. And in today’s world it is just a click away ... more Knowledge is the most powerful weapon of a society. And in today’s world it is just a click away from the mouse. There is abundance of knowledge and information in the form of newspaper , electronic newspaper ,articles, online journals, webpages , search results etc. And there is a wide range of news from all over the world. But then the choice of news varies from person to person. Some people may prefer sports news to amusement news and some people may prefer political news over sports news and likewise there can be a number of other choices. It completely relies on individual’s decision. Document Classification is the process of classifying a document into a number of predefined classes. In this paper we have done document classification of Assamese text using k-Nearest Neighbor. We have considered only four classes sports , politics , law and science. Our dataset consists of 200 documents collected from major Assamese newspaper . We have divided our data into 3:1. Majority of our...
International Journal of Computer Trends and Technology, Dec 25, 2015
Document classification has become an emerging technique in the field of research due to the abun... more Document classification has become an emerging technique in the field of research due to the abundance of documents available in digital form. Document classification can be used to organize data into smaller and meaningful classes. Correctly identifying a document into a particular class is still a huge challenge particularly in Assamese text as very few work has been done in this field . In this paper we have done document classification using Naïve bayes classifier. In regards to the various classifying approaches, Naïve Bayes is potentially good at serving as a document classification model due to its simplicity. The aim of this paper is to highlight the performance of employing Naïve Bayes in document classification. In this paper the document is classified into one of the four classes i.e. sports, politics , law and science. To build and evaluate the classification model, a total 200 documents is split into two datasets, namely training set and testing set, in which 60% of the documents is used as training set whereas the remaining 40% is used as the testing set. The results have been validated using statistical measures of precision , recall and their combination F-measure. Results show that Naïve Bayes is a good classifiers .
This paper discusses the linguistics foundations for developing a Bodo Wordnet, describing the Bo... more This paper discusses the linguistics foundations for developing a Bodo Wordnet, describing the Bodo language characteristics and properties specific to the development of Wordnet. The characteristics of the Bodo language in terms of its morphological and syntactic structure are outlined. Important characteristics related to building of Wordnet are discussed with examples. As the Bodo Wordnet is being developed as an expansion of the Hindi-English Wordnet, the experience gathered during the initial startup works are very important in carrying out the whole work. Such experiences during the building of core 2000 synsets are discussed in the paper, alongwith the challenges faced during linking.
Development of Wordnets of regional languages has been of great concern in recent years. This is ... more Development of Wordnets of regional languages has been of great concern in recent years. This is main ly due to the ever increasing demands and requirements of putting those languages as effective media of t he digital world, including the internet. As the technologies for putting regional languages in the digital media are being developed, research and development works related to Wordnets in those languages are also starting. Efforts have been take n t different level, including academic researchers and t government level for developing language technologies for the Assamese language, a scheduled language in India mainly spoken by the people in th e state of Assam. Basic technologies like, UNICODE compliant fonts and keyboards, CLDR, Corpus, Spelling Checker etc. have been developed. As a part of the Government of India efforts on Technology Development of Indian Languages, creation of Assamese Wordnets has been started. Thi s paper focuses on the foundations of the Assamese Wordn...
... Dr. Shikhar Kr. Sarma Department of Computer Science Gauhati University Guwahati 781014: Assa... more ... Dr. Shikhar Kr. Sarma Department of Computer Science Gauhati University Guwahati 781014: Assam, India sks001@gmail.com; sks@gauhati.ac.in ... 2. Chakrabarti Debasri, Narayan DipakKumar, Pandey Prabhakar, Bhattacharyya Pushpak. 2002. ...
International Journal of Recent Technology and Engineering (IJRTE), 2019
Knowledge is the most powerful weapon of a society. And in today’s world it is just a click away ... more Knowledge is the most powerful weapon of a society. And in today’s world it is just a click away from the mouse. There is abundance of knowledge and information in the form of newspaper , electronic newspaper ,articles, online journals, webpages , search results etc. And there is a wide range of news from all over the world. But then the choice of news varies from person to person. Some people may prefer sports news to amusement news and some people may prefer political news over sports news and likewise there can be a number of other choices. It completely relies on individual’s decision. Document Classification is the process of classifying a document into a number of predefined classes. In this paper we have done document classification of Assamese text using k-Nearest Neighbor. We have considered only four classes sports , politics , law and science. Our dataset consists of 200 documents collected from major Assamese newspaper . We have divided our data into 3:1. Majority of our...
International Journal of Computer Trends and Technology, Dec 25, 2015
Document classification has become an emerging technique in the field of research due to the abun... more Document classification has become an emerging technique in the field of research due to the abundance of documents available in digital form. Document classification can be used to organize data into smaller and meaningful classes. Correctly identifying a document into a particular class is still a huge challenge particularly in Assamese text as very few work has been done in this field . In this paper we have done document classification using Naïve bayes classifier. In regards to the various classifying approaches, Naïve Bayes is potentially good at serving as a document classification model due to its simplicity. The aim of this paper is to highlight the performance of employing Naïve Bayes in document classification. In this paper the document is classified into one of the four classes i.e. sports, politics , law and science. To build and evaluate the classification model, a total 200 documents is split into two datasets, namely training set and testing set, in which 60% of the documents is used as training set whereas the remaining 40% is used as the testing set. The results have been validated using statistical measures of precision , recall and their combination F-measure. Results show that Naïve Bayes is a good classifiers .
This paper discusses the linguistics foundations for developing a Bodo Wordnet, describing the Bo... more This paper discusses the linguistics foundations for developing a Bodo Wordnet, describing the Bodo language characteristics and properties specific to the development of Wordnet. The characteristics of the Bodo language in terms of its morphological and syntactic structure are outlined. Important characteristics related to building of Wordnet are discussed with examples. As the Bodo Wordnet is being developed as an expansion of the Hindi-English Wordnet, the experience gathered during the initial startup works are very important in carrying out the whole work. Such experiences during the building of core 2000 synsets are discussed in the paper, alongwith the challenges faced during linking.
Development of Wordnets of regional languages has been of great concern in recent years. This is ... more Development of Wordnets of regional languages has been of great concern in recent years. This is main ly due to the ever increasing demands and requirements of putting those languages as effective media of t he digital world, including the internet. As the technologies for putting regional languages in the digital media are being developed, research and development works related to Wordnets in those languages are also starting. Efforts have been take n t different level, including academic researchers and t government level for developing language technologies for the Assamese language, a scheduled language in India mainly spoken by the people in th e state of Assam. Basic technologies like, UNICODE compliant fonts and keyboards, CLDR, Corpus, Spelling Checker etc. have been developed. As a part of the Government of India efforts on Technology Development of Indian Languages, creation of Assamese Wordnets has been started. Thi s paper focuses on the foundations of the Assamese Wordn...
... Dr. Shikhar Kr. Sarma Department of Computer Science Gauhati University Guwahati 781014: Assa... more ... Dr. Shikhar Kr. Sarma Department of Computer Science Gauhati University Guwahati 781014: Assam, India sks001@gmail.com; sks@gauhati.ac.in ... 2. Chakrabarti Debasri, Narayan DipakKumar, Pandey Prabhakar, Bhattacharyya Pushpak. 2002. ...