Language Identification Using Combination of Machine Learning Algorithms and Vectorization Techniques (original) (raw)

Language Identification refers to the process of ascertaining and discerning the language found in a particular text or document. In this work, approaches for language identification, using Machine Learning Algorithms and Vectorization methods have been compared and contrasted. Three machine learning algorithms, along with two vectorization techniques have been used. The ML Algorithms used are Naïve bayes, Logistic Regression, and SVM (Support Vector Machine), and the vectorization techniques used are Term Frequency-Inverse Document Frequency (TF-IDF), and Count Vectorizer (Bag of Words (BoW)). This research put forwards the contrast and comparison of the above-mentioned classification algorithms and vectorization methods. It is also a web development-based work.