Text Classification: Improved Naive Bayes Approach (original) (raw)

Classification : Improved Naive Bayes Approach

2017

Automatic text classification is becoming increasingly vital nowadays. The two most common approaches for text classification using Naive Bayes Approach consider either Multivariate Bernoulli model or Multinomial model. The Multinomial model is usually observed to give better results for large number of features. We suggest some improvements in the Multivariate Bernoulli Naive Bayes Approach. We were able to achieve an average accuracy of about 99% as opposed to about 85% using basic Multivariate Naive Bayes Approach. We also reduced the number of features required making the classifier space and time efficient.

A comparative study of Naive Bayes Classifiers with improved technique on Text Classification

Experiment was carried out on imbalanced data having positive and negative labels as 0 and 1. These datasets after training were tested on Gaussian Naive Bayes, Bernoulli Naive Bayes and Multinomial Naive Bayes with improved technique using tf-idf and ngram. The results obtained were then compared with old model result that make use of BagofWords. On testing it is found that there is a 2-3% improvement in the model's performance.

A Novel Feature Selection Technique for Text Classification Using Naïve Bayes

International Scholarly Research Notices, 2014

With the proliferation of unstructured data, text classification or text categorization has found many applications in topic classification, sentiment analysis, authorship identification, spam detection, and so on. There are many classification algorithms available. Naïve Bayes remains one of the oldest and most popular classifiers. On one hand, implementation of naïve Bayes is simple and, on the other hand, this also requires fewer amounts of training data. From the literature review, it is found that naïve Bayes performs poorly compared to other classifiers in text classification. As a result, this makes the naïve Bayes classifier unusable in spite of the simplicity and intuitiveness of the model. In this paper, we propose a two-step feature selection method based on firstly a univariate feature selection and then feature clustering, where we use the univariate feature selection method to reduce the search space and then apply clustering to select relatively independent feature se...

An Overview on Implementation Using Hybrid Naïve Bayes Algorithm for Text Categorization

Automated Text categorization and class prediction is important for text categorization to reduce the feature size and to speed up the learning process of classifiers .Text classification is a growing interest in the research of text mining. Correctly identifying the Text into particular category is still presenting challenge because of large and vast amount of features in the dataset. In regards to the present classifying approaches, Naïve Bayes is probably smart at serving as a document classification model thanks to its simplicity. The aim of this Project is to spotlight the performance of Text categorization and sophistication prediction Naïve Bayes in Text classification.

A Survey of Naïve Bayes Machine Learning approach in Text Document Classification

Computing Research Repository, 2010

Text Document classification aims in associating one or more predefined categories based on the likelihood suggested by the training set of labeled documents. Many machine learning algorithms play a vital role in training the system with predefined categories among which Na\"ive Bayes has some intriguing facts that it is simple, easy to implement and draws better accuracy in large datasets

Multinomial Naive Bayes for Text Categorization Revisited

Lecture Notes in Computer Science, 2004

Abstract. This paper presents empirical results for several versions of the multinomial naive Bayes classifier on four text categorization prob-lems, and a way of improving it using locally weighted learning. More specifically, it compares standard multinomial naive Bayes to the recently ...

Efficient Text Categorization using Naïve Bayes Classification

2017

Text classification is the undertaking of naturally sorting an arrangement of archives into classifications from a predefined set. Content Classification is an information mining procedure used to anticipate bunch enrollment for information occurrences inside a given dataset. It is utilized for ordering information into various classes by thinking of some as compels. Rather than conventional component determination systems utilized for content archive grouping. We present another model in view of likelihood and over all class recurrence of term. The Naive Bayesian classifier depends on Bayes hypothesis with autonomy presumptions between indicators. A Naive Bayesian model is anything but difficult to work, with no confounded iterative parameter estimation which makes it especially valuable for substantial datasets. The paper demonstrates that the new probabilistic translation of tf×idf term weighting may prompt better comprehension of measurable positioning instruments. .

Application of Naïve Bayes, Decision Tree, and K-Nearest Neighbors for Automated Text Classification

Modern Applied Science, 2019

Nowadays, many applications that use large data have been developed due to the existence of the Internet of Things. These applications are translated into different languages and require automated text classification (ATC). The ATC process depends on the content of one or more predefined classes. However, this process is problematic for the Arabic translation of the data. This study aims to solve this issue by investigating the performances of three classification algorithms, namely, k-nearest neighbor (KNN), decision tree (DT), and naïve Bayes (NB) classifiers, on Saudi Press Agency datasets. Results showed that the NB algorithm outperformed DT and KNN algorithms in terms of precision, recall, and F1. In future works, a new algorithm that can improve the handling of the ATC problem will be developed.

Tackling the Poor Assumptions of Naive Bayes Text Classifiers

ICML 2003, 2003

Naive Bayes is often used as a baseline in text classification because it is fast and easy to implement. Its severe assumptions make such efficiency possible but also adversely affect the quality of its results. In this paper we propose simple, heuristic solutions to some of the problems with Naive Bayes classifiers, addressing both systemic issues as well as problems that arise because text is not actually generated according to a multinomial model. We find that our simple corrections result in a fast algorithm that is competitive with stateof-the-art text classification algorithms such as the Support Vector Machine.