AN ENSEMBLE OF FEATURE SELECTION WITH DEEP LEARNING BASED AUTOMATED TAMIL DOCUMENT CLASSIFICATION MODELS An Ensemble of Feature Selection with Deep Learning based Automated Tamil Document Classification Models (original) (raw)

In recent times, the exponential growth of the Internet has resulted to an enormous number of electronic documents in several regional languages apart from English. Numerous documents in Tamil language are being generated from news, blogs, eBooks, and entertainment, the automated classification of Tamil documents is needed. Since the automated Tamil document classification is not discovered proficiently, this study focuses on the development of deep learning (DL) models for Tamil document classification. This paper introduces an ensemble of feature selection with DL based classification models for Tamil documents. The presented model primarily involves preprocessing to remove the unwanted data and improve the data quality to a certain extent. Besides, term frequency-inverse document frequency (TF-IDF) approach is used to extract the features from the Tamil documents. In addition, two feature selection (FS) techniques namely Chi Squared(CS) and Extra Tree (ET) Classifier models are employed. The proposed method also uses deep neural network (DNN) and convolutional neural network (CNN) models for classification purposes. A detailed experimentation analysis takes place using a Tamil document dataset gathered by our own. The experimental values showcased that the ETFS-CNN model has obtained effective classification outcome with the maximum accuracy of 90%, precision of 90.57%, recall of 90%, and F-score of 89.89%.