The Optimality of Naive Bayes (original) (raw)
Related papers
Alleviating naive Bayes attribute independence assumption by attribute weighting
J. Mach. Learn. Res., 2013
Despite the simplicity of the Naive Bayes classifier, it has continued to perform well against more sophisticated newcomers and has remained, therefore, of great interest to the machine learning community. Of numerous approaches to refining the naive Bayes classifier, attribute weighting has received less attention than it warrants. Most approaches, perhaps influenced by attribute weighting in other machine learning algorithms, use weighting to place more emphasis on highly predictive attributes than those that are less predictive. In this paper, we argue that for naive Bayes attribute weighting should instead be used to alleviate the conditional independence assumption. Based on this premise, we propose a weighted naive Bayes algorithm, called WANBIA, that selects weights to minimize either the negative conditional log likelihood or the mean squared error objective functions. We perform extensive evaluations and find that WANBIA is a competitive alternative to state of the art clas...
One Dependence Augmented Naive Bayes
2005
In real-world data mining applications, an accurate ranking is as important as an accurate classification. Naive Bayes has been widely used in data mining as a simple and effective classification and ranking algorithm. Since its conditional independence assumption is rarely true, numerous algorithms have been proposed to improve naive Bayes, for example, SBC[1] and TAN[2]. Indeed, the experimental results show that SBC and TAN achieve a significant improvement in term of classification accuracy. However, unfortunately, our experiments also show that SBC and TAN perform even worse than naive Bayes in ranking measured by AUC[3,4](the area under the Receiver Operating Characteristics curve). This fact raises the question of whether we can improve Naive Bayes with both accurate classification and ranking? In this paper, responding to this question, we present a new learning algorithm called One Dependence Augmented Naive Bayes(ODANB). Our motivation is to develop a new algorithm to improve Naive Bayes’ performance not only on classification measured by accuracy but also on ranking measured by AUC. We experimentally tested our algorithm, using the whole 36 UCI datasets recommended by Weka[5], and compared it to Naive Bayes, SBC and TAN. The experimental results show that our algorithm outperforms all the other algorithms significantly in yielding accurate ranking, yet at the same time outperforms all the other algorithms slightly in terms of classification accuracy.
2003
Despite its simplicity, the naive Bayes classifier has surprised machine learning researchers by exhibiting good performance on a variety of learning problems. Encouraged by these results, researchers have looked to overcome naive Bayes' primary weakness-attribute independence-and improve the performance of the algorithm. This paper presents a locally weighted version of naive Bayes that relaxes the independence assumption by learning local models at prediction time. Experimental results show that locally weighted naive Bayes rarely degrades accuracy compared to standard naive Bayes and, in many cases, improves accuracy dramatically. The main advantage of this method compared to other techniques for enhancing naive Bayes is its conceptual and computational simplicity.
Attribute weighted Naive Bayes classifier using a local optimization
Neural Computing and Applications, 2013
The Naive Bayes classifier is a popular classification technique for data mining and machine learning. It has been shown to be very effective on a variety of data classification problems. However, the strong assumption that all attributes are conditionally independent given the class is often violated in real world applications. Numerous methods have been proposed in order to improve the performance of the Naive Bayes classifier by alleviating the attribute independence assumption. However, violation of the independence assumption can increase the expected error. Another alternative is assigning the weights for attributes. In this paper, we propose a novel attribute weighted Naive Bayes classifier by considering weights to the conditional probabilities. An objective function is modeled and taken into account, which is based on the structure of the Naive Bayes classifier and the attribute weights. The optimal weights are determined by a local optimization method using the quasisecant method. In the proposed approach, the Naive Bayes classifier is taken as a starting point. We report the results of numerical experiments on several real world data sets in binary classification, which show the efficiency of the proposed method.
On the optimality of the simple Bayesian classifier under zero-one loss
1997
The simple Bayesian classifier is known to be optimal when attributes are independent given the class, but the question of whether other sufficient conditions for its optimality exist has so far not been explored. Empirical results showing that it performs surprisingly well in many domains containing clear attribute dependences suggest that the answer to this question may be positive. This article shows that, although the Bayesian classifier's probability estimates are only optimal under quadratic loss if the independence assumption holds, the classifier itself can be optimal under zero-one loss (misclassification rate) even when this assumption is violated by a wide margin. The region of quadratic-loss optimality of the Bayesian classifier is in fact a second-order infinitesimal fraction of the region of zero-one optimality. This implies that the Bayesian classifier has a much greater range of applicability than previously thought. For example, in this article it is shown to be optimal for learning conjunctions and disjunctions, even though they violate the independence assumption. Further, studies in artificial domains show that it will often outperform more powerful classifiers for common training set sizes and numbers of attributes, even if its bias is a priori much less appropriate to the domain. This article's results also imply that detecting attribute dependence is not necessarily the best way to extend the Bayesian classifier, and this is also verified empirically.
Bayes Classification using an approximation to the Joint Probability Distribution of the Attributes
The Naive-Bayes classifier is widely used due to its simplicity and accuracy. However, it fails when, for at least one attribute value in a test sample, there are no corresponding training samples. This is known as the zero-frequency problem and is typically addressed using Laplace Smoothing. Laplace Smoothing does not take into account the statistical characteristics of the neighborhood of the attribute values of the test sample. Gaussian Naïve-Bayes addresses this but the resulting Gaussian model is formed from global information. We propose an approach that estimates conditional probabilities using information in the neighborhood of the test sample. We no longer make the assumption of independence of attributes and consider the joint probability distribution conditioned on the given class. We illustrate the performance on datasets taken from the University of California at Irvine Machine Learning Repository and demonstrate that the proposed approach is simple, robust and outperfo...
The Knowledge Engineering Review, 2010
Current classification problems that concern data sets of large and increasing size require scalable classification algorithms. In this study, we concentrate on several scalable, linear complexity classifiers that include one of the top 10 voted data mining methods, Naïve Bayes (NB), and several recently proposed semi-NB classifiers. These algorithms perform front-end discretization of the continuous features since by design they work only with nominal or discrete features. We address the lack of studies that investigate the benefits and drawbacks of discretization in the context of the subsequent classification. Our comprehensive empirical study considers 12 discretizers (two unsupervised and 10 supervised), seven classifiers (two classical NB and five semi-NB), and 16 data sets. We investigate the scalability of the discretizers and show that the fastest supervised discretizers fast class-attribute interdependency maximization (FCAIM), class-attribute interdependency maximization ...
Local-Global-Learning of Naive Bayesian Classifier
Innovative Computing, Information and …, 2009
Naive Bayes (NB) models are among the simplest probabilistic classifiers. However, they often perform surprisingly well in practice, even though they are based on the strong assumption that all attributes are conditionally independent given the class variable. The vast majority of research on NB models assume that the conditional probability tables in the model are either learned by maximum likelihood or Bayesian methods, even though it is well documented that learning NB models in this way may harm the expressiveness of the models. In this paper, we focus on an alternative technique for learning the conditional probability tables from data. Instead of frequency counting (which leads to maximum likelihood parameters), we propose a learning method that we call "local-global-learning". We learn the (local) conditional probability tables under the guidance of the (global) NB model learnt thus far. The conditional probabilities learned by local-global-learning are therefore geared towards maximizing the classification accuracy of the models instead of maximizing the likelihood of the training data. We show through extensive experiments that localglobal-learning can significantly improve the classification accuracy of NB models when compared to traditional maximum likelihood learning.
2002
Bayes Classifiers are widely used currently for recognition, identification and knowledge discovery. The fields of application are, for example, image processing, medicine, chemistry (QSAR). But by mysterious way the Naive Bayes Classifier usually gives a very nice and good presentation of a recognition. It can not be improved considerably by more complex models of Bayes Classifier. We demonstrate here a very nice and simple proof of the Naive Bayes Classifier optimality, that can explain this interesting fact.The derivation in the current paper is based on arXiv:cs/0202020v1
Learning accurate and concise naïve Bayes classifiers from attribute value taxonomies and data
Knowledge and Information …, 2006
In many application domains, there is a need for learning algorithms that can effectively exploit attribute value taxonomies (AVT)-hierarchical groupings of attribute values-to learn compact, comprehensible and accurate classifiers from data-including data that are partially specified. This paper describes AVT-NBL, a natural generalization of the naïve Bayes learner (NBL), for learning classifiers from AVT and data. Our experimental results show that AVT-NBL is able to generate classifiers that are substantially more compact and more accurate than those produced by NBL on a broad range of data sets with different percentages of partially specified values. We also show that AVT-NBL is more efficient in its use of training data: AVT-NBL produces classifiers that outperform those produced by NBL using substantially fewer training examples.