A machine learning algorithm for classification under extremely scarce information (original) (raw)

General solution and learning method for binary classification with performance constraints

Pattern Recognition Letters, 2008

In this paper, the problem of binary classification is studied with one or two performance constraints. When the constraints cannot be satisfied, the initial problem has no solution and an alternative problem is solved by introducing a rejection option. The optimal solution for such problems in the framework of statistical hypothesis testing is shown to be based on likelihood ratio with one or two thresholds depending on whether it is necessary to introduce a rejection option or not. These problems are then addressed when classes are only defined by labelled samples. To illustrate the resolution of cases with and without rejection option, the problem of Neyman-Pearson and the one of minimizing reject probability subject to a constraint on error probability are studied. Solutions based on SVMs and on a kernel based classifier are experimentally compared and discussed.

Theory of Classification: a Survey of Some Recent Advances

ESAIM: Probability and Statistics, 2005

The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have led to these recent results.

On Binary Classification in Extreme Regions

2018

In pattern recognition, a random label Y is to be predicted based upon observing a random vector X valued in Rd with d>1 by means of a classification rule with minimum probability of error. In a wide variety of applications, ranging from finance/insurance to environmental sciences through teletraffic data analysis for instance, extreme (i.e. very large) observations X are of crucial importance, while contributing in a negligible manner to the (empirical) error however, simply because of their rarity. As a consequence, empirical risk minimizers generally perform very poorly in extreme regions. It is the purpose of this paper to develop a general framework for classification in the extremes. Precisely, under non-parametric heavy-tail assumptions for the class distributions, we prove that a natural and asymptotic notion of risk, accounting for predictive performance in extreme regions of the input space, can be defined and show that minimizers of an empirical version of a non-asympt...

Sparse Greedy Minimax Probability Machine Classification

2003

The Minimax Probability Machine Classification (MPMC) framework builds classifiers by minimizing the maximum probability of misclassification, and gives direct estimates of the probabilistic accuracy bound Ω. The only assumptions that MPMC makes is that good estimates of means and covariance matrixes of the classes exist. However, as with Support Vector Machines, MPMC is computationally expensive and requires extensive cross validation experiments to choose kernels and kernel parameters that give good performance. In this paper we address the computational cost of MPMC by proposing an algorithm that constructs nonlinear sparse MPMC (SMPMC) models by incrementally adding basis functions (i.e. kernels) one at a time -greedily selecting the next one that maximizes the accuracy bound Ω. SMPMC automatically chooses both kernel parameters and feature weights without using computationally expensive cross validation. Therefore the SMPMC algorithm simultaneously addresses the problem of kernel selection and feature selection (i.e. feature weighting), based solely on maximizing the accuracy bound Ω. Experimental results indicate that we can obtain reliable bounds Ω, as well as test-set accuracies that are comparable to state of the art classification algorithms.

Design and Development of Bayes' Minimax Linear Classification Systems

2016

This paper considers the design and development of Bayes’ minimax, linear classification systems using linear discriminant functions that are Bayes’ equalizer rules. Bayes’ equalizer rules divide two-class feature spaces into decision regions that have equal classification errors. I will formulate the problem of learning unknown linear discriminant functions from data as a locus problem, thereby formulating geometric locus methods within a statistical framework. Solving locus problems involves finding the equation of a curve or surface defined by a given property, and finding the graph or locus of a given equation. I will devise a system of locus equations that determines Bayes’ equalizer rules which is based on a variant of the inequality constrained optimization problem for linear kernel support vector machines. Thereby, I will define a class of learning machines which are fundamental building blocks for Bayes’ minimax pattern recognition systems.

Classification with Specified Error Rates

There are many real-world datasets which have di fferent costs for misclassifi cation of di fferent classes. Further, the training data is usually imbalanced in such datasets. Traditional classifi cation methods like support vector machines (SVMs) do not deal with such problems well enough. We have studied a new maximum margin classi fication formulation for problems with specfii ed false negative and false positive error rates. Given the first and second order moments of the class conditional densities, the key idea is to use the Chebyshev-Cantelli inequality to convert the probabilistic chance constraints into second order cone constraints, thus obtaining a second order cone programming (SOCP) formulation. The dual of this formulation has an elegant geometric interpretation viz. minimizing the distance between two ellipsoids. This geometric optimization problem can be solved by a fast iterative algorithm. The formulation is extended to non-linear feature spaces using kernel methods.

The minimum error minimax probability machine

2005

We construct a distribution-free Bayes optimal classifier called the Minimum Error Minimax Probability Machine (MEMPM) in a worst-case setting, i.e., under all possible choices of class-conditional densities with a given mean and covariance matrix. By assuming no specific distributions for the data, our model is thus distinguished from traditional Bayes optimal approaches, where an assumption on the data distribution is a must. This model is extended from the Minimax Probability Machine (MPM), a recently-proposed novel classifier, and is demonstrated to be the general case of MPM. Moreover, it includes another special case named the Biased Minimax Probability Machine, which is appropriate for handling biased classification. One appealing feature of MEMPM is that it contains an explicit performance indicator, i.e., a lower bound on the worst-case accuracy, which is shown to be tighter than that of MPM. We provide conditions under which the worst-case Bayes optimal classifier converges to the Bayes optimal classifier. We demonstrate how to apply a more general statistical framework to estimate model input parameters robustly. We also show how to extend our model to nonlinear classification by exploiting kernelization techniques. A series of experiments on both synthetic data sets and real world benchmark data sets validates our proposition and demonstrates the effectiveness of our model.

Mathematical Programming Approaches to Classification Problems

Discriminant Analysis DA is widely applied in many fields. Some recent researches raise the fact that standard DA assumptions, such as a normal distribution of data and equality of the variancecovariance matrices, are not always satisfied. A Mathematical Programming approach MP has been frequently used in DA and can be considered a valuable alternative to the classical models of DA. The MP approach provides more flexibility for the process of analysis. The aim of this paper is to address a comparative study in which we analyze the performance of three statistical and some MP methods using linear and nonlinear discriminant functions in two-group classification problems. New classification procedures will be adapted to context of nonlinear discriminant functions. Different applications are used to compare these methods including the Support Vector Machines- SVMs- based approach. The findings of this study will be useful in assisting decisionmakers to choose the most appropriate model for their decision-making situation.

A machine learning algorithm for classification of mental tasks

Computers & Electrical Engineering, 2022

When it is difficult to get learning data during the training time, we have to classify objects by having extremely small information about their feature. It is assumed in the paper that only some average or mean value of every feature and the lower and upper bounds of a set of its values are known. The main idea for constructing new classification models taking into account this information is to form a set of probability distributions bounded by some lower and upper probability distribution functions (a p-box). A discriminant function is derived in order to maximise the risk measure over the set of distributions and to minimise it over a set of classification parameters. The algorithm for classification is reduced to a parametric linear programming problem.

On classification based on Lp depth with an adaptive choice of p

isical.ac.in

In the recent past, several depth-based classifiers have been proposed in the literature for classification of multivariate data. In this article, we use L p depth for this purpose, where p is chosen adaptively using the training data. While other depth-based classifiers have the Bayes risk consistency only under elliptic symmetry, the proposed classifier has this desirable property over a larger class of distributions. We analyze some simulated and real data sets to investigate its finite sample performance. Unlike other depth-based methods, this classifier can adopt itself to the underlying population structure. As a result, in many cases, it significantly outperforms other depth-based classifiers, especially when the underlying distributions are not elliptic.