Efficient model selection for kernel logistic regression (original) (raw)

Efficient approximate leave-one-out cross-validation for kernel logistic regression

Machine Learning, 2008

Kernel logistic regression (KLR) is the kernel learning method best suited to binary pattern recognition problems where estimates of a-posteriori probability of class membership are required. Such problems occur frequently in practical applications, for instance because the operational prior class probabilities or equivalently the relative misclassification costs are variable or unknown at the time of training the model. The model parameters are given by the solution of a convex optimization problem, which may be found via an efficient iteratively re-weighted least squares (IRWLS) procedure. The generalization properties of a kernel logistic regression machine are however governed by a small number of hyper-parameters, the values of which must be determined during the process of model selection. In this paper, we propose a novel model selection strategy for KLR, based on a computationally efficient closed-form approximation of the leave-one-out cross-validation procedure. Results obtained on a variety of synthetic and real-world benchmark datasets are given, demonstrating that the proposed model selection procedure is competitive with a more conventional k-fold cross-validation based approach and also with Gaussian process (GP) classifiers implemented using the Laplace approximation and via the Expectation Propagation (EP) algorithm.

Parameter estimation of kernel logistic regression

Logistic regression (LR) is a nonlinear classification method, often used for binary data sets. Over fitting the training data sets may arise in LR, especially when the data sets are used have high-dimensional. One of approaches to reduce over fitting is through regularized LR method. Regularized LR can be defined as log-likelihood function of LR adding with regularization parameter. There are regularized optimization problem, because the Loss function (deviance) in regularized LR nonlinear. To minimize this problem, need a linear combination method of regularized LR, known as kernel logistic regression (KLR). KLR is a nonlinear classifier. KLR provide higher classification accuracy of small to medium sample size data sets when compared with LR. With truncated newton method, estimation KLR using maximum likelihood estimation (MLE) can be optimum.

Least-Squares Probabilistic Classifier: A Computationally Efficient Alternative to Kernel Logistic Regression

The least-squares probabilistic classifier (LSPC) is a computationally efficient alternative to kernel logistic regression (KLR). A key idea for the speedup is that, unlike KLR that uses maximum likelihood estimation for a log-linear model, LSPC uses least-squares estimation for a linear model. This allows us to obtain a global solution analytically in a classwise manner. In exchange for the speedup, however, this linear least-squares formulation does not necessarily produce a non-negative estimate. Nevertheless, consistency of LSPC is guaranteed in the large sample limit, and rounding up a negative estimate to zero in finite sample cases was demonstrated not to degrade the classification performance in experiments. Thus, LSPC is a practically useful probabilistic classifier. In this paper, we give an overview of LSPC and its extentions to covariate shift, multi-task, and multi-label scenarios. A MATLAB implementation of LSPC is available from '

Multi-class kernel logistic regression: a fixed-size implementation

2007 International Joint Conference on Neural Networks, 2007

This research studies a practical iterative algorithm for multi-class kernel logistic regression (KLR). Starting from the negative penalized log likelihood criterium we show that the optimization problem in each iteration can be solved by a weighted version of Least Squares Support Vector Machines (LS-SVMs). In this derivation it turns out that the global regularization term is reflected as a usual regularization in each separate step. In the LS-SVM framework, fixed-size LS-SVM is known to perform well on large data sets. We therefore implement this model to solve large scale multi-class KLR problems with estimation in the primal space. To reduce the size of the Hessian, an alternating descent version of Newton's method is used which has the extra advantage that it can be easily used in a distributed computing environment. It is investigated how a multi-class kernel logistic regression model compares to a one-versus-all coding scheme.

Selection of Import Vectors via Binary Particle Swarm Optimization and Cross-Validation for Kernel Logistic Regression

2007 International Joint Conference on Neural Networks, 2007

Kernel logistic regression (KLR) is a powerful discriminative algorithm. It has similar loss function and algorithmic structure to the kernel support vector machine (SVM). Recently, Zhu and Hastie proposed the import vector machine (IVM) in which a subset of the input vectors of KLR are selected by minimizing the regularized negative loglikelihood to improve the generalization performance and to reduce computation cost. In this paper, two modifications of the original IVM are proposed. The cross-validation based criterion is used to select import vectors instead of the likelihood based criterion. Also binary particle swarm optimization is used to select good subset instead of the greedy stepwise algorithm of the original IVM. Through the comparison experiment, the improvement of the generalization performance of the proposed algorithm was confirmed.

Kernel logistic regression using truncated Newton method

Computational Management Science, 2011

Kernel logistic regression (KLR) is a powerful nonlinear classifier. The combination of KLR and the truncated-regularized iteratively re-weighted leastsquares (TR-IRLS) algorithm, has led to a powerful classification method using small-to-medium size data sets. This method (algorithm), is called truncatedregularized kernel logistic regression (TR-KLR). Compared to support vector machines (SVM) and TR-IRLS on twelve benchmark publicly available data sets, the proposed TR-KLR algorithm is as accurate as, and much faster than, SVM and more accurate than TR-IRLS. The TR-KLR algorithm also has the advantage of providing direct prediction probabilities. M. Maalouf et al. (2005) were the first to show that the truncated-regularized iteratively re-weighted least squares (TR-IRLS) can be effectively implemented on LR to classify large data sets, and that it can outperform the support vector machines (SVM) algorithm. Later on, trust region Newton method , which is a type of truncated Newton, and truncated Newton interior-point methods were applied for large scale LR problems. SVM (Vapnik 1995) is considered a state-of-the-art algorithm for classifying binary data through its implementation of kernels. Kernel logistic regression (KLR) Jaakkola and Haussler 1999), which is a kernel version of LR has also proven to be a powerful classifier (Zhu and Hastie 2005). Like LR, KLR can naturally provide probabilities and extend to multi-class classification problems .

Variable selection for kernel methods with application to binary classification

The problem of variable selection in binary kernel classification is addressed in this thesis. Kernel methods are fairly recent additions to the statistical toolbox, having originated approximately two decades ago in machine learning and artificial intelligence. These methods are growing in popularity and are already frequently applied in regression and classification problems. A special thank you also to my dad, Klopper Oosthuizen, for many investments in me, and for his love and support, and to my family and friends. VIII CONTENTS CHAPTER 1: INTRODUCTION 1.1 NOTATION 1.2 OVERVIEW OF THE THESIS CHAPTER 2: VARIABLE SELECTION FOR KERNEL METHODS 2.1 INTRODUCTION 2.2 AN OVERVIEW OF KERNEL METHODS 2.2.1 BASIC CONCEPTS 2.2.2 KERNEL FUNCTIONS AND THE KERNEL TRICK 2.2.3 CONSTRUCTING A KERNEL CLASSIFIER 2.2.4 A REGULARISATION PERSPECTIVE 2.3 VARIABLE SELECTION IN BINARY CLASSIFICATION: IMPORTANT ASPECTS 2.3.1 THE RELEVANCE OF VARIABLES 2.3.2 SELECTION STRATEGIES AND CRITERIA 2.4 VARIABLE SELECTION FOR KERNEL METHODS 2.4.1 THE NEED FOR VARIABLE SELECTION 2.4.2 COMPLICATING FACTORS AND POSSIBLE APPROACHES 2.5 SUMMARY CHAPTER 3: KERNEL VARIABLE SELECTION IN INPUT SPACE 1 K 3.4 MONTE CARLO SIMULATION STUDY IX 3.4.1 EXPERIMENTAL DESIGN 3.4.2 STEPS IN EACH SIMULATION REPETITION 3.4.3 GENERATING THE TRAINING AND TEST DATA 3.4.4 HYPERPARAMETER SPECIFICATION 3.4.5 THE VARIABLE SELECTION PROCEDURES 3.4.6 RESULTS AND CONCLUSIONS 3.5 SUMMARY CHAPTER 4: ALGORITHM-INDEPENDENT AND ALGORITHM-DEPENDENT SELECTION IN FEATURE SPACE 4.1 INTRODUCTION 4.2 SUPPORT VECTOR MACHINES 4.2.1 THE TRAINING DATA ARE LINEARLY SEPARABLE IN INPUT SPACE 4.2.2 THE TRAINING DATA ARE LINEARLY SEPARABLE IN FEATURE SPACE 4.2.3 HANDLING NOISY DATA 4.3 KERNEL FISHER DISCRIMINANT ANALYSIS 4.3.1 LINEAR DISCRIMINANT ANALYSIS 4.3.2 THE KERNEL FISHER DISCRIMINANT FUNCTION

Model selection in logistic regression

arXiv (Cornell University), 2015

This paper is devoted to model selection in logistic regression. We extend the model selection principle introduced by Birgé and Massart (2001) to logistic regression model. This selection is done by using penalized maximum likelihood criteria. We propose in this context a completely data-driven criteria based on the slope heuristics. We prove non asymptotic oracle inequalities for selected estimators. Theoretical results are illustrated through simulation studies.

An empirical comparison of V$$ V -fold penalisation and cross-validation for model selection in distribution-free regression

Pattern Analysis and Applications, 2014

Model selection is a crucial issue in machinelearning and a wide variety of penalisation methods (with possibly data dependent complexity penalties) have recently been introduced for this purpose. However their empirical performance is generally not well documented in the literature. It is the goal of this paper to investigate to which extent such recent techniques can be successfully used for the tuning of both the regularisation and kernel parameters in support vector regression (SVR) and the complexity measure in regression trees (CART). This task is traditionally solved via V -fold crossvalidation (VFCV), which gives efficient results for a reasonable computational cost. A disadvantage however of VFCV is that the procedure is known to provide an asymptotically suboptimal risk estimate as the number of examples tends to infinity. Recently, a penalisation procedure called V -fold penalisation has been proposed to improve on VFCV, supported by theoretical arguments. Here we report on an extensive set of experiments comparing V -fold penalisation and VFCV for SVR/CART calibration on several benchmark datasets. We highlight cases in which VFCV and V -fold penalisation provide poor estimates of the risk respectively and introduce a modified penalisation technique to reduce the estimation error.