Selection of kernel parameters is usually done by
cross-validation on a grid of parameters. We provide here some code to do it by
gradient descent. It is mostly based on the following paper:
It uses an RBF kernel, but can easily be modified to work with any kernel
which is differentiable (with respect to its parameters). The kernel parameters
in this case are the scale and the ridge; it can also handle one scaling
parameter per component,
The kernel parameters are found with gradient descent by
minimizing either the
leave-one-out error
radius/margin bound (the radius is approximated by the variance)
validation error
(a subset of the training set is used for validation)
negative evidence.
The code is not meant for large scale training (typically couple of thousand
training points maximum). If you have a larger dataset, just select a random
subset for kernel parameters selection. It is usually gives reasonable results.
Here is the Matlab code along with an
example. It uses a modified version of
Carl Rasmussen's conjugate
gradient optimizer minimize.m.
Learning a linear combination of kernels
A special case of this general framework is when the kernel parameters
correspond to the coefficients in convex combination of base kernels,
Here is some code to select
these coefficients by Newton optimization of the variance/margin estimate. It makes use of svqp, a quadratic solver written by Leon Bottou. Details about the algorithm can be found in this paper. One can show that this is a convex problem.
Fast leave-one-out error estimate
Matlab code for estimating the generalization performance of an SVM, as described in Chapter 3 of my
PhD thesis