Selection of kernel parameters

Selection of kernel parameters is usually done by cross-validation on a grid of parameters. We provide here some code to do it by gradient descent. It is mostly based on the following paper:

O. Chapelle, V. Vapnik, O. Bousquet and S. Mukherjee, Choosing Multiple Parameters for Support Vector Machines, Machine Learning, 46, 2002.

The code implements the following kernel methods:
• classification: SVM with an L2 penalization of the training errors,
• regression: kernel ridge regression / Gaussian process.
It uses an RBF kernel, but can easily be modified to work with any kernel which is differentiable (with respect to its parameters). The kernel parameters in this case are the scale and the ridge; it can also handle one scaling parameter per component, The kernel parameters are found with gradient descent by minimizing either the
• leave-one-out error
• radius/margin bound (the radius is approximated by the variance)
• validation error (a subset of the training set is used for validation)
• negative evidence.
The code is not meant for large scale training (typically couple of thousand training points maximum). If you have a larger dataset, just select a random subset for kernel parameters selection. It is usually gives reasonable results.
Here is the Matlab code along with an example. It uses a modified version of Carl Rasmussen's conjugate gradient optimizer minimize.m.

Learning a linear combination of kernels

A special case of this general framework is when the kernel parameters correspond to the coefficients in convex combination of base kernels, Here is some code to select these coefficients by Newton optimization of the variance/margin estimate. It makes use of svqp, a quadratic solver written by Leon Bottou. Details about the algorithm can be found in this paper. One can show that this is a convex problem.

Fast leave-one-out error estimate

Matlab code for estimating the generalization performance of an SVM, as described in Chapter 3 of my PhD thesis