A Study on Machine Learning and Deep Learning Methods Using Feature Extraction for Bengali News Document Classification (original) (raw)

2021 Asian Conference on Innovation in Technology (ASIANCON), 2021

Abstract

News is newly received remarkable facts about current phenomenon. Miscellaneous facts are constantly happening in this world. Mass media helps to reach these facts to the common folks widely. As we are pushed forward to modern world, getting a convenient environment, Bengali mass media are also leaning towards digital platforms. In this article, some supervised machine learning approaches and deep learning approaches have been proposed for classifying Bengali news documents. We have used an open dataset for our work which contains more than three hundred thousand (3, 76, 211) Bengali text documents. Removing stop-words, dropping duplicate data, tokenizing, stemming etc have been commonly done as preprocessing steps. Bag-of-Words with TF-IDF and some Word Embedding approaches - Average Word2Vec, Glove & fastText have been used for feature extraction. We have trained our text corpus using supervised machine learning method and Deep learning method. Significantly, among these models, Support Vector Machine with average Word2Vec has achieved 97% accuracy and Bidirectional LSTM has achieved 96% accuracy.

Summit Haque hasn't uploaded this paper.

Let Summit know you want this paper to be uploaded.

Ask for this paper to be uploaded.