Validation of Network Classifiers (original) (raw)

A transductive bound for the voted classifier with an application to semi-supervised learning

2009

We propose two transductive bounds on the risk of majority votes that are estimated over partially labeled training sets. The first one involves the margin distribution of the classifier and a risk bound on its associate Gibbs classifier. The bound is tight when so is the Gibbs's bound and when the errors of the majority vote classifier is concentrated on a zone of low margin. In semi-supervised learning, considering the margin as an indicator of confidence constitutes the working hypothesis of algorithms which search the decision boundary on low density regions. Following this assumption, we propose to bound the error probability of the voted classifier on the examples for whose margins are above a fixed threshold. As an application, we propose a self-learning algorithm which iteratively assigns pseudo-labels to the set of unlabeled training examples that have their margin above a threshold obtained from this bound. Empirical results on different datasets show the effectiveness of our approach compared to the same algorithm and the TSVM in which the threshold is fixed manually.

Test error bounds for classifiers: A survey of old and new results

2011

In this paper, we focus the attention on one of the oldest problems in pattern recognition and machine learning: the estimation of the generalization error of a classifier through a test set. Despite this problem has been addressed for several decades, the last word has not yet been written, as new proposals continue to appear in the literature. Our objective is to survey and compare old and new techniques, in terms of quality of the estimation, easiness of use, and rigorousness of the approach.

Decision boundary for discrete Bayesian network classifiers

Journal of Machine Learning Research, 2015

Bayesian network classifiers are a powerful machine learning tool. In order to evaluate the expressive power of these models, we compute families of polynomials that sign-represent decision functions induced by Bayesian network classifiers. We prove that those families are linear combinations of products of Lagrange basis polynomials. In absence of V-structures in the predictor sub-graph, we are also able to prove that this family of polynomials does indeed characterize the specific classifier considered. We then use this representation to bound the number of decision functions representable by Bayesian network classifiers with a given structure and we compare these bounds to the ones obtained using Vapnik-Chervonenkis dimension.

Transductive Bounds for the Multi-Class Majority Vote Classifier

Proceedings of the AAAI Conference on Artificial Intelligence, 2019

In this paper, we propose a transductive bound over the risk of the majority vote classifier learned with partially labeled data for the multi-class classification. The bound is obtained by considering the class confusion matrix as an error indicator and it involves the margin distribution of the classifier over each class and a bound over the risk of the associated Gibbs classifier. When this latter bound is tight and, the errors of the majority vote classifier per class are concentrated on a low margin zone; we prove that the bound over the Bayes classifier’ risk is tight. As an application, we extend the self-learning algorithm to the multi-class case. The algorithm iteratively assigns pseudo-labels to a subset of unlabeled training examples that have their associated class margin above a threshold obtained from the proposed transductive bound. Empirical results on different data sets show the effectiveness of our approach compared to the same algorithm where the threshold is fix...

Generalization Error Bounds for Classifiers Trained With Interdependent Data

Advances in Neural Information …, 2006

In this paper we propose a general framework to study the generalization properties of binary classifiers trained with data which may be dependent, but are deterministically generated upon a sample of independent examples. It provides generalization bounds for binary classification and some cases of ranking problems, and clarifies the relationship between these learning tasks.

One Node at a Time: Node-Level Network Classification

2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), 2021

Network classification aims to group networks (or graphs) into distinct categories based on their structure. We study the connection between classification of a network and of its constituent nodes, and whether nodes from networks in different groups are distinguishable based on structural node characteristics such as centrality and clustering coefficient. We demonstrate, using various network datasets and random network models, that a classifier can be trained to accurately predict the network category of a given node (without seeing the whole network), implying that complex networks display distinct structural patterns even at the node level. Finally, we discuss two applications of node-level network classification: (i) whole-network classification from small samples of nodes, and (ii) network bootstrapping.

Building Bayesian network classifiers through a Bayesian complexity monitoring system

Proceedings of The Institution of Mechanical Engineers Part C-journal of Mechanical Engineering Science, 2009

Nowadays, the need for practical yet efficient machine learning techniques for engineering applications are highly in demand. A new learning method for building Bayesian network classifiers is presented in this article. The proposed method augments the naive Bayesian (NB) classifier by using the Chow and Liu tree construction method, but introducing a Bayesian approach to control the accuracy and complexity of the resulting network, which yields simple structures that are not necessarily a spanning tree. Experiments by using benchmark data sets show that the number of augmenting edges by using the proposed learning method depends on the number of training data used. The classification accuracy was better, or at least equal, to the NB and the tree augmented NB models when tested on 10 benchmark data sets. The evaluation on a real industrial application showed that the simple Bayesian network classifier outperformed the C4.5 and the random forest algorithms and achieved competitive results against C5.0 and a neural network. in engineering materials [4], modelling of manufacturing processes [5], predictions of manufacturing time (in hours) of machine windings, and also the size of the machine, given the customer inputs [6], and inspection of industrial products .