Idiot's Bayes?Not So Stupid After All? (original) (raw)

The Naive Bayes Mystery: A classification detective story

Pattern Recognition Letters, 2005

Many studies have been made to compare the many different methods of supervised classification which have been developed. While conducting a large meta-analysis of such studies, we spotted some anomalous results relating to the Naive Bayes method. This paper describes our detailed investigation into these anomalies. We conclude that a very large comparative study probably mislabelled another method as Naive Bayes, and that the Statlog project used the right method, but possibly incorrectly reported its provenance. Such mistakes, while not too harmful in themselves, can become seriously misleading if blindly propagated by citations which do not examine the source material in detail.

Technical note: naive Bayes for regression

Machine Learning, 2000

Despite its simplicity, the naive Bayes learning scheme performs well on most classification tasks, and is often significantly more accurate than more sophisticated methods. Although the probability estimates that it produces can be inaccurate, it often assigns maximum probability to the correct class. This suggests that its good performance might be restricted to situations where the output is categorical. It is therefore interesting to see how it performs in domains where the predicted value is numeric, because in this case, predictions are more sensitive to inaccurate probability estimates.

The Optimality of Naive Bayes

2004

Naive Bayes is one of the most efficient and effective inductive learning algorithms for machine learning and data mining. Its competitive performance in classification is surprising, because the conditional independence assumption on which it is based, is rarely true in realworld applications. An open question is: what is the true reason for the surprisingly good performance of naive Bayes in classification? In this paper, we propose a novel explanation on the superb classification performance of naive Bayes. We show that, essentially, the dependence distribution; i.e., how the local dependence of a node distributes in each class, evenly or unevenly, and how the local dependencies of all nodes work together, consistently (supporting a certain classification) or inconsistently (canceling each other out), plays a crucial role. Therefore, no matter how strong the dependences among attributes are, naive Bayes can still be optimal if the dependences distribute evenly in classes, or if the dependences cancel each other out. We propose and prove a sufficient and necessary conditions for the optimality of naive Bayes. Further, we investigate the optimality of naive Bayes under the Gaussian distribution. We present and prove a sufficient condition for the optimality of naive Bayes, in which the dependence between attributes do exist. This provides evidence that dependence among attributes may cancel out each other. In addition, we explore when naive Bayes works well.

Alleviating naive Bayes attribute independence assumption by attribute weighting

J. Mach. Learn. Res., 2013

Despite the simplicity of the Naive Bayes classifier, it has continued to perform well against more sophisticated newcomers and has remained, therefore, of great interest to the machine learning community. Of numerous approaches to refining the naive Bayes classifier, attribute weighting has received less attention than it warrants. Most approaches, perhaps influenced by attribute weighting in other machine learning algorithms, use weighting to place more emphasis on highly predictive attributes than those that are less predictive. In this paper, we argue that for naive Bayes attribute weighting should instead be used to alleviate the conditional independence assumption. Based on this premise, we propose a weighted naive Bayes algorithm, called WANBIA, that selects weights to minimize either the negative conditional log likelihood or the mean squared error objective functions. We perform extensive evaluations and find that WANBIA is a competitive alternative to state of the art clas...

Naive Bayes for Regression

1998

Despite its simplicity, the naive Bayes learning scheme performs wellon most classification tasks, and is often significantly more accurate thanmore sophisticated methods. Although the probability estimates that itproduces can be inaccurate, it often assigns maximum probability to thecorrect class. This suggests that its good performance might be restrictedto situations where the output is categorical. It is therefore interesting tosee how

A ‘non-parametric’ version of the naive Bayes classifier

Knowledge-Based Systems, 2011

Many algorithms have been proposed for the machine learning task of classification. One of the simplest methods, the naive Bayes classifier, has often been found to give good performance despite the fact that its underlying assumptions (of independence and a Normal distribution of the variables) are perhaps violated. In previous work, we applied naive Bayes and other standard algorithms to a breast cancer database from Nottingham City Hospital in which the variables are highly non-Normal and found that the algorithm performed well when predicting a class that had been derived from the same data. However, when we then applied naive Bayes to predict an alternative clinical variable, it performed much worse than other techniques. This motivated us to propose an alternative method, based on naive Bayes, which removes the requirement for the variables to be Normally distributed, but retains the essential structure and other underlying assumptions of the method. We tested our novel algorithm on our breast cancer data and on three UCI datasets which also exhibited strong violations of Normality. We found our algorithm outperformed naive Bayes in all four cases and outperformed multinomial logistic regression (MLR) in two cases. We conclude that our method offers a competitive alternative to MLR and naive Bayes when dealing with data sets in which non-Normal distributions are observed.

One Dependence Augmented Naive Bayes

2005

In real-world data mining applications, an accurate ranking is as important as an accurate classification. Naive Bayes has been widely used in data mining as a simple and effective classification and ranking algorithm. Since its conditional independence assumption is rarely true, numerous algorithms have been proposed to improve naive Bayes, for example, SBC[1] and TAN[2]. Indeed, the experimental results show that SBC and TAN achieve a significant improvement in term of classification accuracy. However, unfortunately, our experiments also show that SBC and TAN perform even worse than naive Bayes in ranking measured by AUC[3,4](the area under the Receiver Operating Characteristics curve). This fact raises the question of whether we can improve Naive Bayes with both accurate classification and ranking? In this paper, responding to this question, we present a new learning algorithm called One Dependence Augmented Naive Bayes(ODANB). Our motivation is to develop a new algorithm to improve Naive Bayes’ performance not only on classification measured by accuracy but also on ranking measured by AUC. We experimentally tested our algorithm, using the whole 36 UCI datasets recommended by Weka[5], and compared it to Naive Bayes, SBC and TAN. The experimental results show that our algorithm outperforms all the other algorithms significantly in yielding accurate ranking, yet at the same time outperforms all the other algorithms slightly in terms of classification accuracy.

Improving simple bayes

1997

The simple Bayesian classi er (SBC), sometimes called Naive-Bayes, is built based on a conditional independence model of each attribute given the class. The model was previously shown to be surprisingly robust to obvious violations of this independence assumption, yielding accurate classi cation models even when there are clear conditional dependencies. We examine di erent approaches for handling unknowns and zero counts when estimating probabilities. Large scale experiments on 37 datasets were conducted to determine the e ects of these approaches and several interesting insights are given, including a new variant of the Laplace estimator that outperforms other methods for dealing with zero counts. Using the bias-variance decomposition 15, 10], we show that while the SBC has performed well on common benchmark datasets, its accuracy will not scale up as the dataset sizes grow. Even with these limitations in mind, the SBC can serve as an excellent t o o l for initial exploratory data analysis, especially when coupled with a visualizer that makes its structure comprehensible.

On the optimality of the simple Bayesian classifier under zero-one loss

1997

The simple Bayesian classifier is known to be optimal when attributes are independent given the class, but the question of whether other sufficient conditions for its optimality exist has so far not been explored. Empirical results showing that it performs surprisingly well in many domains containing clear attribute dependences suggest that the answer to this question may be positive. This article shows that, although the Bayesian classifier's probability estimates are only optimal under quadratic loss if the independence assumption holds, the classifier itself can be optimal under zero-one loss (misclassification rate) even when this assumption is violated by a wide margin. The region of quadratic-loss optimality of the Bayesian classifier is in fact a second-order infinitesimal fraction of the region of zero-one optimality. This implies that the Bayesian classifier has a much greater range of applicability than previously thought. For example, in this article it is shown to be optimal for learning conjunctions and disjunctions, even though they violate the independence assumption. Further, studies in artificial domains show that it will often outperform more powerful classifiers for common training set sizes and numbers of attributes, even if its bias is a priori much less appropriate to the domain. This article's results also imply that detecting attribute dependence is not necessarily the best way to extend the Bayesian classifier, and this is also verified empirically.

Edited Naive Bayes

INTELIGENCIA ARTIFICIAL, 2006

Naive Bayes is a well-known and studied algorithm both in statistics and machine learning. Bayesian learning algorithms represent each concept with a single probabilistic summary. This paper presents a variant of the Naive Bayes method, in which the original training set is augmented in the following fashion: Leave-One-Out procedure is applied over the training set, and incorrectly classified instances according to Naive Bayes model are duplicated. The augmented dataset is used to induce the model. The motivation behind this idea is that giving more importance to hard instances (in this case, duplicating them) might contribute to make the model more accurate over that subset of the instance space. We have tested this algorithm over 41 UCI datasets. The results suggest that the chance of obtaining a significant better performance than with the original Naive Bayes approach are much greater than the opposite.