Introduction to Machine Learning: A Statistical Learning Theory Approach

Problem Set 2 Posted

Tuesdays, 9:00-11:00, Ziskind Room 1

This half-course will provide a basic introduction to the fundamental concepts of Machine Learning, through the eyes of Statistical Learning Theory. The purpose is to both gain an appreciation and understanding of Machine Learning, and to introduce Statistical Learning Theory and the PAC-framework as a theoretical tool for rigorously studying Machine Learning.

In this brief half-course, we will focus almost exclusively on supervised classification, although the concepts we will discuss are relevant in a much wider context.

Specific Topics:

What is Machine Learning? When and why is it relevant?
Formalization of Learning through the PAC Framework.
Absolute, post-hoc and relative (regret) learning guarantees.
The trade-off between prior knowledge and data; The No-Free-Lunch Theorem.
Error decomposition and the model complexity (bias vs variance) trade-off.
Model selection, Occam's Razor and Structural Risk Minimization.
Model complexity in the finite-dimensional case: The VC dimension.

The following topics will be covered at the beginning of the second semester as a continuation to the material covered in the course:

Scale-sensitive model complexity: Covering numbers and fat-shattering.
An example of infinite dimensional learning: the Support Vector Machine.

Assignments:

There will be 2 homework assignments, each counting towards 20% of the final grade.

Problem Set 1: Theoretical Problems (corrected)

Note two corrections in the original version.

Problem Set 2: Experimentation

Required code and data file (code and data updated March 1st: details).

References:

There is no set book for the course, nor are there any books covering precisely the material presented in the form presented. A good reference covering much of the theoretical material is the following survey:

O. Bousquet, S. Boucheron, and G. Lugosi: Introduction to statistical learning theory. in Advanced Lectures in Machine Learning, Springer, pp. 169--207, 2004.

Some other books you might find relevant include:

L. Devroye, L. Gyorfi and G. Lugosi, A probabilistic theory of pattern recognition, Springer, 1996.
M. Anthony and P.L. Bartlett, Neural Network Learning: Theoretical Foundations, Cambridge University Press, 1999.
V. Vapnik, The nature of statistical learning theory, Springer, 2 edition.
T. Hastie, R. Tibshirani and J. Friedman, The elements of statistical learning, Springer.
B. Scholkopf and A. Smola, Learning with Kernels, MIT Press.
Kearns and Vazirani, An introduction to computational learning theory, MIT Press.

You might also find lecture notes in the following courses useful:

The first seven lectures of Rob Schapire's COS 511 in Princeton University.
Shai Ben-David's Statisitcal and Computational Foundations of Machine Learning

Lectures:

Tuesday December 23rd (Boaz):

Introduction

Tuesday December 30th (Boaz):

Validating a Hypothesis and Concentration Bounds for the Binomial.

See also Section 1 of: S. Boucheron, O. Bousquet, and G. Lugosi, (2004), Concentration inequalities, Advanced Lectures in Machine Learning, Springer, pp. 208--240, 2004.

Tuesday January 6th:

Basic setup and concepts
An example of a learning guarantee: Noiseless learning using axis-aligned rectangles
No Free Lunch theorems

Tuesday January 13th:

Learning with noise: The Empirical Risk Minimization (ERM) Learning Rule.
Uniform Convergence of Empirical Risk for finite hypothesis classes.
Post-hoc generalization guarantee for finite hypothesis classes.
Regret-type (relative) guarantee for finite hypothesis classes.
Approximation and estimation errors and the effect of model complexity.

Tuesday January 20th:

Model Order Selection and Structural Risk Minimization.
Use of Validation.
Priors over hypothesis; Description Length Bounds.
Minimum Description Length learning rule.

Tuesday January 27th:

Lower bound for finite cardinality class.
The Growth Function.
Vapnik-Chervonenkis (VC) dimension and bounds in terms of the VC dimension.

Tuesday February 3rd:

Tightness of the VC bound and near optimality of the ERM learning rule.
Another look at Structural Risk Minimization.
Other loss functions and the pseudo-shattering dimension.
Computational considerations and surrogate loss functions.

Last modified: Sun Mar 01 13:06:42 Jerusalem Standard Time 2009