Additive Regularization Trade-Off: Fusion of Training and Validation Levels in Kernel Methods (original) (raw)

Multi-kernel regularized classifiers

Journal of Complexity, 2007

A family of classification algorithms generated from Tikhonov regularization schemes are considered. They involve multi-kernel spaces and general convex loss functions. Our main purpose is to provide satisfactory estimates for the excess misclassification error of these multi-kernel regularized classifiers. The error analysis consists of two parts: regularization error and sample error. Allowing multi-kernels in the algorithm improves the regularization error and approximation error, which is one advantage of the multi-kernel setting. For a general loss function, we show how to bound the regularization error by the approximation in some weighted L q spaces. For the sample error, we use a projection operator. The projection in connection with the decay of the regularization error enables us to improve convergence rates in the literature even for the one kernel schemes and special loss functions: least square loss and hinge loss for support vector machine soft margin classifiers. Existence of the optimization problem for the regularization scheme associated with multi-kernels is verified when the kernel functions are continuous with respect to the index set. Gaussian kernels with flexible variances and probability distributions with some noise conditions are demonstrated to illustrate the general theory.

Morozov, Ivanov and Tikhonov Regularization Based LS-SVMs

Lecture Notes in Computer Science, 2004

This paper contrasts three related regularization schemes for kernel machines using a least squares criterion, namely Tikhonov and Ivanov regularization and Morozov's discrepancy principle. We derive the conditions for optimality in a least squares support vector machine context (LS-SVMs) where they differ in the role of the regularization parameter. In particular, the Ivanov and Morozov scheme express the trade-off between data-fitting and smoothness in the trust region of the parameters and the noise level respectively which both can be transformed uniquely to an appropriate regularization constant for a standard LS-SVM. This insight is employed to tune automatically the regularization constant in an LS-SVM framework based on the estimated noise level, which can be obtained by e.g. by a nonparametric differogram technique.

A practical use of regularization for supervised learning with kernel methods

Pattern Recognition Letters, 2013

In several supervised learning applications, it happens that reconstruction methods have to be applied repeatedly before being able to achieve the final solution. In these situations, the availability of learning algorithms able to provide effective predictors in a very short time may lead to remarkable improvements in the overall computational requirement. In this paper we consider the kernel ridge regression problem and we look for solutions given by a linear combination of kernel functions plus a constant term. In particular, we show that the unknown coefficients of the linear combination and the constant term can be obtained very fastly by applying specific regularization algorithms directly to the linear system arising from the Empirical Risk Minimization problem. From the numerical experiments carried out on benchmark datasets, we observed that in some cases the same results achieved after hours of calculations can be obtained in few seconds, thus showing that these strategies are very well-suited for time-consuming applications.

Sparse LS-SVMs using additive regularization with a penalized validation criterion

2004

This paper is based on a new way for determining the regularization trade-off in least squares support vector machines (LS-SVMs) via a mechanism of additive regularization which has been recently introduced in [6]. This framework enables computational fusion of training and validation levels and allows to train the model together with finding the regularization constants by solving a single linear system at once. In this paper we show that this framework allows to consider a penalized validation criterion that leads to sparse LS-SVMs. The model, regularization constants and sparseness follow from a convex quadratic program in this case.

Deterministic Error Analysis of Support Vector Regression and Related Regularized Kernel Methods

Journal of Machine Learning Research, 2009

We introduce a new technique for the analysis of kernel-based regression problems. The basic tools are sampling inequalities which apply to all machine learning problems involving penalty terms induced by kernels related to Sobolev spaces. They lead to explicit deterministic results concerning the worst case behaviour of ε- and ν-SVRs. Using these, we show how to adjust regularization parameters to

Alpha and beta stability for additively regularized LS-SVMs via convex optimization

This paper considers the design of an algorithm that maximizes explicitly its own stability. The stability criterion -as often used for the construction of bounds on the generalization error of a learn-ing algorithm -is proposed to compensate for overfitting. The primal-dual formulation characterizing Least Squares Support Vector Machines (LS-SVMs) and the additive regularization framework [13] are employed to derive a computational and practical approach combined with convex optimization. The method is elaborated for non-linear regression as well as classification. The proposed stable kernel ma-chines also lead to a new notion of Lα and L β curves instead of the traditional L-curves defined on training data.

Generalized Kernel Classification and Regression

We discuss kernel-based classification and regression in a general context, emphasizing the role of convex duality in the problem formulation. We give conditions for the existence of the dual problem, and derive general globally convergent classification and regression algo- rithms for solving the true (i.e. hard-margin or rigorous) dual problem without resorting to approximations. Kernel methods perform perform optimization in Hilbert space by means of a finite dimensional dual problem. The conditions for the formulation of the dual problem essentially determine what we can "do in feature space", i.e.which opti- mization problems can be solved involving vectors in Hilbert space. Thus convex analysis plays a major role in the theory of kernel methods. The primary pur- pose of this paper is to derive general algorithms for kernel-based classification and regression by considering the problem from the viewpoint of convex analysis as represented by (8). We give a summary of t...

Adaptation for Regularization Operators in Learning Theory

2006

We consider learning algorithms induced by regularization methods in the regression setting. We show that previously obtained error bounds for these algorithms using a-priori choices of the regularization parameter, can be attained using a suitable a-posteriori choice based on validation. In particular, these results prove adaptation of the rate of convergence of the estimators to the minimax rate induced by the "effective dimension" of the problem. We also show universal consistency for this class methods.

A Convex Approach to Validation-Based Learning of the Regularization Constant

IEEE Transactions on Neural Networks, 2000

This letter investigates a tight convex relaxation to the problem of tuning the regularization constant with respect to a validation based criterion. A number of algorithms is covered including ridge regression, regularization networks, smoothing splines, and least squares support vector machines (LS-SVMs) for regression. This convex approach allows the application of reliable and efficient tools, thereby improving computational cost and automatization of the learning method. It is shown that all solutions of the relaxation allow an interpretation in terms of a solution to a weighted LS-SVM.

On the robustness of regularized pairwise learning methods based on kernels

Journal of Complexity

Regularized empirical risk minimization including support vector machines plays an important role in machine learning theory. In this paper regularized pairwise learning (RPL) methods based on kernels will be investigated. One example is regularized minimization of the error entropy loss which has recently attracted quite some interest from the viewpoint of consistency and learning rates. This paper shows that such RPL methods have additionally good statistical robustness properties, if the loss function and the kernel are chosen appropriately. We treat two cases of particular interest: (i) a bounded and non-convex loss function and (ii) an unbounded convex loss function satisfying a certain Lipschitz type condition.