Discriminant Kernels derived from the optimum nonlinear discriminant analysis (original) (raw)
Related papers
Optimum Nonlinear Discriminant Analysis and Discriminant Kernel Support Vector Machine
IEICE Transactions on Information and Systems, 2016
Kernel discriminant analysis (KDA) is the mainstream approach of nonlinear discriminant analysis (NDA). Since it uses the kernel trick, KDA does not consider its nonlinear discriminant mapping explicitly. In this paper, another NDA approach where the nonlinear discriminant mapping is analytically given is developed. This study is based on the theory of optimal nonlinear discriminant analysis (ONDA) of which the nonlinear mapping is exactly expressed by using the Bayesian posterior probability. This theory indicates that various NDA can be derived by estimating the Bayesian posterior probability in ONDA with various estimation methods. Also, ONDA brings an insight about novel kernel functions, called discriminant kernel (DK), which is defined by also using the posterior probabilities. In this paper, several NDA and DK derived from ONDA with several posterior probability estimators are developed and evaluated. Given fine estimation methods of the Bayesian posterior probability, they give good discriminant spaces for visualization or classification.
Discriminant kernels based support vector machine
The First Asian Conference on Pattern Recognition, 2011
Recently the kernel discriminant analysis (KDA) has been successfully applied in many applications. KDA is one of the nonlinear extensions of Linear Discriminant Analysis (LDA). But the kernel function is usually defined a priori and it is not known what the optimum kernel function for nonlinear discriminant analysis is.
Nonlinear discriminant analysis using kernel functions
1999
Linear Discriminant Analysis (LDA) has been widely used for linear dimension reduction. However, LDA has some limitations that one of the scatter matrices is required to be nonsingular and the nonlinearly clustered structure is not easily captured. In order to overcome the problems caused by the singularity of the scatter matrices, a generalization of LDA based on the generalized singular value decomposition (GSVD) has been developed recently. In this paper, we propose a nonlinear discriminant analysis based on the kernel method and the generalized singular value decomposition. The GSVD is applied to solve the generalized eigenvalue problem which is formulated in the feature space defined by a nonlinear mapping through kernel functions. Our GSVD-based kernel discriminant analysis is theoretically compared with other kernel-based nonlinear discriminant analysis algorithms. The experimental results show that our method is an effective nonlinear dimension reduction method.
Optimising Kernel Parameters and Regularisation Coefficients for Non-Linear Discriminant Analysis
The Journal of Machine Learning …, 2006
In this paper we consider a novel Bayesian interpretation of Fisher's discriminant analysis. We relate Rayleigh's coefficient to a noise model that minimises a cost based on the most probable class centres and that abandons the 'regression to the labels' assumption used by other algorithms. Optimisation of the noise model yields a direction of discrimination equivalent to Fisher's discriminant, and with the incorporation of a prior we can apply Bayes' rule to infer the posterior distribution of the direction of discrimination. Nonetheless, we argue that an additional constraining distribution has to be included if sensible results are to be obtained. Going further, with the use of a Gaussian process prior we show the equivalence of our model to a regularised kernel Fisher's discriminant. A key advantage of our approach is the facility to determine kernel parameters and the regularisation coefficient through the optimisation of the marginal log-likelihood of the data. An added bonus of the new formulation is that it enables us to link the regularisation coefficient with the generalisation error. 1. In Fisher's terminology the features are grouped into a vector of 'compounds'.
Nonlinear multiclass discriminant analysis
IEEE Signal Processing Letters, 2000
An alternative nonlinear multiclass discriminant algorithm is presented. This algorithm is based on the use of kernel functions and is designed to optimize a general linear discriminant analysis criterion based on scatter matrices. By reformulating these matrices in a specific form, a straightforward derivation allows the kernel function to be introduced in a simple and direct way. Moreover, we propose a method to determine the value of the regularization parameter , based on this derivation.
Multiclass probabilistic kernel discriminant analysis
2009
Abstract Kernel discriminant analysis (KDA) is an effective approach for supervised nonlinear dimensionality reduction. Probabilistic models can be used with KDA to improve its robustness. However, the state of the art of such models could only handle binary class problems, which confines their application in many real world problems. To overcome this limitation, we propose a novel nonparametric probabilistic model based on Gaussian Process for KDA to handle multiclass problems.
Bayesian predictive kernel discriminant analysis
Pattern Recognition Letters, 2013
Discriminant analysis using Kernel Density Estimator (KDE) is a common tool for classification, but depends on the choice of the bandwidth or smoothing parameter of kernel. In this paper, we introduce a Bayesian Predictive Kernel Discriminant Analysis (BPKDA) eliminating this dependence by integrating the KDE with respect to an appropriate prior probability distribution for the bandwidth. Keypoints of the method are: (1) the formulation of the classification rule in terms of mixture predictive densities obtained by integrating kernel; (2) use of Independent Components Analysis (ICA) to choose a transform matrix so that transformed components are as independent as possible; and (3) nonparametric estimation of the predictive density by KDE for each independent component. Results on benchmark data sets and simulations show that the performance of BPKDA is competitive with, and in some cases significantly better than, Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA) and Naives Bayes discriminant Analysis with normal distribution (NNBDA).
A fast kernel-based nonlinear discriminant analysis for multi-class problems
Pattern Recognition, 2006
Nonlinear discriminant analysis may be transformed into the form of kernel-based discriminant analysis. Thus, the corresponding discriminant direction can be solved by linear equations. From the view of feature space, the nonlinear discriminant analysis is still a linear method, and it is provable that in feature space the method is equivalent to Fisher discriminant analysis. We consider that one linear combination of parts of training samples, called "significant nodes", can replace the total training samples to express the corresponding discriminant vector in feature space to some extent. In this paper, an efficient algorithm is proposed to determine "significant nodes" one by one. The principle of determining "significant nodes" is simple and reasonable, and the consequent algorithm can be carried out with acceptable computation cost. Depending on the kernel functions between test samples and all "significant nodes", classification can be implemented. The proposed method is called fast kernel-based nonlinear method (FKNM). It is noticeable that the number of "significant nodes" may be much smaller than that of the total training samples. As a result, for two-class classification problems, the FKNM will be much more efficient than the naive kernel-based nonlinear method (NKNM). The FKNM can be also applied to multi-class via two approaches: one-against-the-rest and one-against-one. Although there is a view that one-against-one is superior to one-against-the-rest in classification efficiency, it seems that for the FKNM one-against-the-rest is more efficient than one-against-one. Experiments on benchmark and real datasets illustrate that, for two-class and multi-class classifications, the FKNM is effective, feasible and much efficient.
Journal of Pattern Recognition Research, 2009
Kernel Discriminant Analysis (KDA) is the usual extension of Fisher Linear Discriminant Analysis (FLDA) in a high dimensional feature space via kernel mapping. KDA recently has become a popular classification technique in machine learning and in data mining. The performance of KDA depends very heavily on the choice of the best kernel function for a given data set and the optimal choice of the kernel parameters. In this paper, we develop a novel data adaptive simultaneous parameter and kernel selection approach in KDA using information complexity (ICOMP) type criteria. We achieve this by reducing the multivariate input data into one dimension in order to find a range of the possible values to tune the parameters of the kernel mapping directly from the data rather than using trial-and-error. We tune the parameters of the kernel functions by utilizing the Mahalanobis distance of each point from the multivariate mean (centroid), Jackknife Mahalanobis distance Data Depth (JMDD), and the Smoothed Complexity Mahalanobis distance (SCMD). Such an approach provides the researcher a new and novel method to simultaneously choose optimal tuning parameters of the kernel functions; how to choose the optimal kernel function; and their effect on the KDA classifier using ICOMP. We show numerical examples on real benchmark data sets to illustrate the efficiency and the performance of our new approach in terms of reducing the misclassification error rate.
Logistic discriminant analysis
2009
Linear discriminant analysis (LDA) is one of the well known methods to extract the best features for the multiclass discrimination. Otsu derived the optimal nonlinear discriminant analysis (ONDA) by assuming the underlying probabilities and showed that the ONDA was closely related to Bayesian decision theory (the posterior probabilities). Also Otsu pointed out that LDA could be regarded as a linear approximation of the ONDA through the linear approximations of the Bayesian posterior probabilities. Based on this theory, we propose a novel nonlinear discriminant analysis named logistic discriminant analysis (LgDA) in which the posterior probabilities are estimated by multi-nominal logistic regression (MLR). The experimental results are shown by comparing the discriminant spaces constructed by LgDA and LDA for the standard repository datasets.