NIPS 2005 Workshop on the

Accuracy-Regularization Frontier

Friday, December 9th, 2005
Westin Resort and Spa, Whistler, BC, Canada

The program has been finalized --- this outdated 'call for contributions' is no longer relevant

A prevalent approach in machine learning for achieving good generalization performance is to seek a predictor that, on one hand, attains low empirical error, and on the other hand, is "simple", as measured by some regularizer, and so guaranteed to generalize well. Consider, for example, support vector machines, where one seeks a linear classifier with low empirical error and low L2-norm (corresponding to a large geometrical margin). The precise trade-off between the empirical error and the regularizer (e.g. L2-norm) is not known. But since we would like to minimize both, we can limit our attention only to extreme solutions, i.e. classifiers such that one cannot reduce both the empirical error and the regularizer (norm). Considering the set of attainable (error,norm) combinations, we are interested only in the extreme "frontier" (or "regularization path") of this set. The typical approach is to evaluate classifiers along the frontier on held-out validation data (or cross validate) and choose the classifier minimizing the validation error.

Classifiers along the frontier are typically found by minimizing some parametric combination of the empirical error and the regularizer, e.g. norm²+C×err, for varying C, in the case of SVMs. Different values of C yield different classifiers along the frontier and C can be thought of as parameterizing the frontier. This particular parametric function of the empirical error and the regularizer is chosen because it leads to a convenient optimization problem, but minimizing any other monotone function of the empirical error and regularizer (in this case, the L2-norm) would also lead to classifiers on the frontier.

Recently, methods have been proposed for obtaining the entire frontier in computation time that is comparable to obtaining a single classifier along the frontier.

The workshop is concerned with optimization and statistical issues related to viewing the entire frontier, rather than a single predictor along it, as an object of interest in machine learning.

Specific issues to be addressed include:

Characterizing the "frontier" in a way independent of a specific trade-off, and its properties as such, e.g. convexity, smoothness, piecewise linearity/polynomial behavior.
What parametric trade-offs capture the entire frontier? Minimizing any monotone trade-off leads to a predictor on the frontier, but what conditions must be met to ensure all predictors along the frontier are obtained when the regularization parameter is varied? Study of this question is motivated by scenarios in which minimizing a non-standard parametric trade-off leads to a more convenient optimization problem.
Methods for obtaining the frontier:
1. Direct methods relying on a characterization, e.g. Hastie et al's (2004) work on the entire regularization path of Support vector Machines.
2. Warm-restart continuation methods (slightly changing the regularization parameter and initializing the optimizer to the solution of the previous value of the parameter). How should one vary the regularization parameter in order to guarantee never to be too far away from the true frontier? In a standard optimization problem, one ensures a solution within some desired distance from the optimal solution. Analogously, when recovering the entire frontier, it would be desirable to seek a frontier which is always within some desired distance in the (error,regularizer) space from the true frontier.
3. Predictor-corrector methods: when the frontier is a differentiable manifold, warm-restart methods can be improved by using a first order approximation of the manifold to predict where the frontier should be for an updated value of the frontier parameter.
Interesting generalization or uses of the frontier, e.g.:
- The frontier across different kernels
- Higher dimensional frontiers when more than two parameters are considered
Formalizing and providing guarantees for the standard practice of picking a classifier along the frontier using a hold-out set (this is especially important for more than two objectives). In some regression cases there are detailed inferences that can be done on the frontier --- for Ridge it is well established whereas for Lasso, Efron et al (2004), and more recently Zou et al (2004), establish degrees of freedom along the frontier, yielding generalization error estimates.

The main goal of the workshop is to open up research in these directions, establishing the important questions and issues to be addressed, and introducing to the NIPS community relevant approaches for multi-objective optimization.

Call for Contributions (DEADLINE HAS PASSED AND CONTRIBUTIONS HAVE ALREADY BEEN SELECTED)

We invite presentations addressing any of the above issues, or other related issues. We welcome presentations of completed work or work-in-progress, as well as position statements, papers discussing potential research directions and surveys of recent developments.

If you would like to present in the workshop, please send an abstract in plain text (preferred), postscript or PDF (Microsoft Word documents will not be opened) to frontier@cs.toronto.edu as soon as possible, and no later than October 23rd, 2005.

The final program will be posted in early November.

Format:

The workshop will be held as part of the NIPS (Neural Information Processing Systems) workshop program, and in conjunction to the main NIPS conference December 5th-8th in Vancouver. The workshop will meet in two sessions, Friday December 9th 7:30AM-10:30AM and 3:30PM-6:30PM, with a daytime break for informal exchange and/or other activities. The program will consist of:

Overview and tutorial talks:
- Trevor Hastie, Stanford University
- Lieven Vandenberghe, UCLA
- D. X. Zhou, City University of Hong Kong
Contributed presentations.
Panel discussion.

Organizing committee:

Nathan Srebro, University of Toronto
Alexandre d'Aspremont, Princeton University
Francis Bach, Ecole des Mines de Paris
Massimiliano Pontil, University College London
Saharon Rosset, IBM T.J. Watson Research Center
Katya Scheinberg, IBM T.J. Watson Research Center