A Survey on Text Categorization Techniques for Indian Regional Languages (original) (raw)

The rapid development of the Information technology has led to the collection of documents in Indian regional languages. To classify millions of documents manually is an expensive and time consuming task. Therefore, automatic text classifiers are constructed which sort a given set of documents into different classes and whose accuracy and time efficiency is much better than manual text classification. This paper presents a survey of text categorization techniques for Indian regional languages. Keywords-Text categorization, Clustering, Naïve Bayes, KNearest Neighbor, Support Vector Machine, Hybrid Approach.