A Convex Approach to Learning the Ridge based on CV
K. Pelckmans, J.A.K. Suykens, B. De Moor
This paper advances results in model selection by
relaxing the task of optimally tuning the regularization parameter
in a number of algorithms with respect to the classical
cross-validation performance criterion as a convex optimization
problem. The proposed strategy differs from the scope of
e.g. generalized cross-validation (GCV) as it concerns the
efficient optimization, not the individual evaluation of
the model selection criterion.
[PDF]
Basis Pursuit Learning and Multi-Objective Optimization
Martin Brown, Nick Costen, Georgios Papadopoulos
Computation of the entire regularization path for SVM in practice
Alexandre Belloni and Katya Scheinberg
We will discuss an implementation of parametric active set method
which computes the entire regularization path for SVM. The method is
similar to that described in the paper by Hastie, Rosset, Tibshirani
and Zhu. We will compare this method to a dual active set method which
computes just one solution on a regularization path. In theory, this
is as hard as computing the entrie regularization path, but in
practice this is not so. We will describe the challenges of parametric
active set method, present computational comparison of the two methods
on large-scale classification problems and discuss possible approach
of reducing the computational time by computing an approximate
regularization path.
Exploring the regularization path for adaptive Gaussian kernel SVMs
Roland Memisevic, Nathan Srebro, Sam Roweis
Training-set method for choosing a regularization/accuracy
Saharon Rosset, Nathan Srebro
The standard, and often most appropriate, approach to choosing a
regularization/accuracy tradeoff (i.e., regularization parameter) is
to evaluate models along the frontier on held-out data, using
validation sets or cross-validation. Nevertheless, it is interesting
to consider methods that use only the regularization/accuracy frontier
as calculated on the training set, without access to held-out
data. The main question we ask is, can a good model be chosen from
along the frontier based only on the frontier itself? This may be
practically useful when data is scarce and/or computational complexity
is an issue. It may also lead to interesting insights about the
frontier and its properties: What is a good measure of the true
complexity of models on the frontier? What are the ``correct''
parameterizations we should use for the regularization and accuracy
measures when visualizing the frontier? Etc.
The Kernel LARS algorithm
Stéphane Canu, Vincent Guigue, Alain Rakotomamonjy and Gilles Gasso
Computing regularization paths for learning multiple kernels
Francis R. Bach, Romain Thibaux, Michael I. Jordan
The problem of learning a sparse conic combination of kernel
functions or kernel matrices for classification or regression
can be achieved via the regularization by a block 1-norm (Bach et
al.,2004). In this paper, we present an algorithm that computes the
entire regularization path for these problems. The path
is obtained by using numerical continuation techniques,
and involves a running time complexity that is a constant
times the complexity of solving the problem for one value
of the regularization parameter. Working in the setting
of kernel linear regression and kernel logistic regression,
we show empirically that the effect of the block 1-norm
regularization differs notably from the (non-block)
1-norm regularization commonly used for variable selection,
and that the regularization path is of particular value in
the block case.
[PDF]
http://cmm.ensmp.fr/~bach/path/
The support vector decomposition machine
Francisco Pereira and Geoff Gordon
In machine learning problems with tens of thousands of features and
only dozens or hundreds of independent training examples,
dimensionality reduction is essential for good learning performance.
In previous work, many researchers have treated the learning problem
in two separate phases: first use an algorithm such as singular value
decomposition to reduce the dimensionality of the data set, and then
use a classification algorithm such as na\"ive Bayes or support vector
machines to learn a classifier. We demonstrate that it is possible to
combine the two goals of dimensionality reduction and classification
into a single learning objective, and present a novel and efficient
algorithm which optimizes this objective directly. We present
experimental results in two domains, fMRI analysis and cell image
processing, which show that we can achieve better learning performance
and lower-dimensional representations than two-phase approaches can.
[PDF]