IEEE Transactions on Pattern Analysis and Machine Intelligence (original) (raw)

Kernel density classification and boosting: an L2 analysis

Statistics and Computing, 2005

Kernel density estimation is a commonly used approach to classification. However, most of the theoretical results for kernel methods apply to estimation per se and not necessarily to classification. In this paper we show that when estimating the difference between two densities, the optimal smoothing parameters are increasing functions of the sample size of the complementary group, and we provide a small simluation study which examines the relative performance of kernel density methods when the final goal is classification. A relative newcomer to the classification portfolio is "boosting", and this paper proposes an algorithm for boosting kernel density classifiers. We note that boosting is closely linked to a previously proposed method of bias reduction in kernel density estimation and indicate how it will enjoy similar properties for classification. We show that boosting kernel classifiers reduces the bias whilst only slightly increasing the variance, with an overall reduction in error. Numerical examples and simulations are used to illustrate the findings, and we also suggest further areas of research.

Nonlinear kernel-based statistical pattern analysis

IEEE Transactions on Neural Networks, 2001

The eigenstructure of the second-order statistics of a multivariate random population can be inferred from the matrix of pairwise combinations of inner products of the samples. Therefore, it can be also efficiently obtained in the implicit, high-dimensional feature spaces defined by kernel functions. We elaborate on this property to obtain general expressions for immediate derivation of nonlinear counterparts of a number of standard pattern analysis algorithms, including principal component analysis, data compression and denoising, and Fisher's discriminant. The connection between kernel methods and nonparametric density estimation is also illustrated. Using these results we introduce the kernel version of Mahalanobis distance, which originates nonparametric models with unexpected and interesting properties, and also propose a kernel version of the minimum squared error (MSE) linear discriminant function. This learning machine is particularly simple and includes a number of generalized linear models such as the potential functions method or the radial basis function (RBF) network. Our results shed some light on the relative merit of feature spaces and inductive bias in the remarkable generalization properties of the support vector machine (SVM). Although in most situations the SVM obtains the lowest error rates, exhaustive experiments with synthetic and natural data show that simple kernel machines based on pseudoinversion are competitive in problems with appreciable class overlapping

Nonparametric kernel estimators for image classification

2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012

We introduce a new discriminative learning method for image classification. We assume that the images are represented by unordered, multi-dimensional, finite sets of feature vectors, and that these sets might have different cardinality. This allows us to use consistent nonparametric divergence estimators to define new kernels over these sets, and then apply them in kernel classifiers. Our numerical results demonstrate that in many cases this approach can outperform state-of-the-art competitors on both simulated and challenging real-world datasets.

A semiparametric density estimation approach to pattern classification

Pattern Recognition, 2004

A new multivariate density estimator suitable for pattern classiÿer design is proposed. The data are ÿrst transformed so that the pattern vector components with the most non-Gaussian structure are separated from the Gaussian components. Nonparametric density estimation is then used to capture the non-Gaussian structure of the data while parametric Gaussian conditional density estimation is applied to the rest of the components. Both simulated and real data sets are used to demonstrate the potential usefulness of the proposed approach.

Classification Based on Combination of Kernel Density Estimators

Lecture Notes in Computer Science, 2009

A new classification algorithm based on combination of kernel density estimators is introduced. The method combines the estimators with different bandwidths what can be interpreted as looking at the data with different "resolutions" which, in turn, potentially gives the algorithm an insight into the structure of the data. The bandwidths are adjusted automatically to decrease the classification error. Results of the experiments using benchmark data sets show promising performance of the proposed approach when compared to classical algorithms. meta-learning algorithms are also used in for the density estimation. The EM algorithm is used to maximize training data likelihood but the classification error is not directly optimized. In , authors describe an algorithm which uses "Gaussian product kernel estimators" where the bandwidths are chosen independently for each class-dimension combination. What is more, the bandwidths can vary depending on the localization in the feature space.

Probability density estimation by linear combinations of Gaussian kernels- generalizations and algorithmic evaluation

2011 International Conference on Multimedia Technology, 2011

This paper examines parametric density estimation using a variable weighted sum of Gaussian kernels, where the weights may take positive and negative values. Various statistical properties of the estimator are studied as well as its extensions to multidimensional probability density estimation. Identification of the estimator parameters are computed by a modified EM algorithm and the number of kernels are estimated by information theoretic approach, using the Akiake Information Criterion (AIC). This paper provides empirical evaluation of the estimator with respect to window-based estimators and the classical linear combinations of Gaussian estimator that uses only positive weights, showing its robustness (in terms of accuracy and speed) for various applications in image and signal analysis and machine learning.

Is there a best kernel density estimator ?

2013

For the nonparametric density estimators we show that the constant c1 in the relation bias = c1h q + o(h) can be made arbitrarily small, while keeping the variance var = 1 nh (c2 + o(h)), as measured by the constant c2, bounded, provided that the kernels are of order q. We call a free-lunch effect the fact that c1 can be made as small as desired, without increasing the density smoothness requirement or the kernel order. Another problem we consider is testing if a density satisfies a differential equation. This result can be applied to see if a density belongs to a particular family of differential equations.

Kernels on Sample Sets via Nonparametric Divergence Estimates

2012

Most machine learning algorithms, such as classification or regression, treat the individual data point as the object of interest. Here we consider extending machine learning algorithms to operate on groups of data points. We suggest treating a group of data points as an i.i.d. sample set from an underlying feature distribution for that group. Our approach employs kernel machines with a kernel on i.i.d. sample sets of vectors. We define certain kernel functions on pairs of distributions, and then use a nonparametric estimator to consistently estimate those functions based on sample sets. The projection of the estimated Gram matrix to the cone of symmetric positive semi-definite matrices enables us to use kernel machines for classification, regression, anomaly detection, and low-dimensional embedding in the space of distributions. We present several numerical experiments both on real and simulated datasets to demonstrate the advantages of our new approach.

A kernel-based parametric method for conditional density estimation

Pattern Recognition, 2011

A conditional density function, which describes the relationship between response and explanatory variables, plays an important role in many analysis problems. In this paper, we propose a new kernelbased parametric method to estimate conditional density. An exponential function is employed to approximate the unknown density, and its parameters are computed from the given explanatory variable via a nonlinear mapping using kernel principal component analysis (KPCA). We develop a new kernel function, which is a variant to polynomial kernels, to be used in KPCA. The proposed method is compared with the Nadaraya-Watson estimator through numerical simulation and practical data. Experimental results show that the proposed method outperforms the Nadaraya-Watson estimator in terms of revised mean integrated squared error (RMISE). Therefore, the proposed method is an effective method for estimating the conditional densities.

Learning with idealized kernels

2003

The kernel function plays a central role in kernel methods. Existing methods typically fix the functional form of the kernel in advance and then only adapt the associated kernel parameters based on empirical data. In this paper, we consider the problem of adapting the kernel so that it becomes more similar to the so-called ideal kernel. We formulate this as a distance metric learning problem that searches for a suitable linear transform (fcature weighting) in the kernel-induced feature space. This formulation is applicable even when the training set can only provide examples of similar and dissimilar pairs, but not explicit class label information. Computationally, this leads to a local-optima-free quadratic programming problem, with the number of variables independent of the number of features. Performance of this method is evaluated on classification and clustering tasks on both toy and real-world data sets.