Introduction to Machine Learning: A Statistical Learning Theory Approach
Tuesdays, 9:0011:00, Ziskind Room 1
This halfcourse will provide a basic introduction to the fundamental
concepts of Machine Learning, through the eyes of Statistical Learning
Theory. The purpose is to both gain an appreciation and understanding
of Machine Learning, and to introduce Statistical Learning Theory and
the PACframework as a theoretical tool for rigorously studying
Machine Learning.
In this brief halfcourse, we will focus almost exclusively on
supervised classification, although the concepts we will discuss are
relevant in a much wider context.
Specific Topics:
 What is Machine Learning? When and why is it relevant?
 Formalization of Learning through the PAC Framework.
 Absolute, posthoc and relative (regret) learning guarantees.
 The tradeoff between prior knowledge and data; The NoFreeLunch Theorem.
 Error decomposition and the model complexity (bias vs variance) tradeoff.
 Model selection, Occam's Razor and Structural Risk Minimization.
 Model complexity in the finitedimensional case: The VC
dimension.
The following topics will be covered at the beginning of the second
semester as a continuation to the material covered in the course:
 Scalesensitive model complexity: Covering numbers and fatshattering.
 An example of infinite dimensional learning: the Support Vector Machine.
Assignments:
There will be 2 homework assignments, each counting towards 20% of the
final grade.
Note two corrections in the original version.
Required code and data file (code and data updated March 1st: details).
References:
There is no set book for the course, nor are there any books covering
precisely the material presented in the form presented. A good
reference covering much of the theoretical material is the following
survey:
Some other books you might find relevant include:
 L. Devroye, L. Gyorfi and G. Lugosi, A probabilistic theory of pattern recognition, Springer, 1996.
 M. Anthony and P.L. Bartlett, Neural Network Learning: Theoretical
Foundations, Cambridge University Press, 1999.
 V. Vapnik, The nature of statistical learning theory, Springer, 2 edition.
 T. Hastie, R. Tibshirani and J. Friedman, The elements of statistical learning, Springer.
 B. Scholkopf and A. Smola, Learning with Kernels, MIT Press.
 Kearns and Vazirani, An introduction to computational learning theory, MIT
Press.
You might also find lecture notes in the following courses useful:
Lectures:
 Tuesday December 23rd (Boaz):
 Introduction
 Tuesday December 30th (Boaz):
 Validating a Hypothesis and Concentration Bounds for the Binomial.
 Tuesday January 6th:

 Basic setup and concepts
 An example of a learning guarantee: Noiseless learning using axisaligned rectangles
 No Free Lunch theorems
 Tuesday January 13th:

 Learning with noise: The Empirical Risk Minimization (ERM) Learning Rule.
 Uniform Convergence of Empirical Risk for finite hypothesis classes.
 Posthoc generalization guarantee for finite hypothesis classes.
 Regrettype (relative) guarantee for finite hypothesis classes.
 Approximation and estimation errors and the effect of model complexity.
 Tuesday January 20th:

 Model Order Selection and Structural Risk Minimization.
 Use of Validation.
 Priors over hypothesis; Description Length Bounds.
 Minimum Description Length learning rule.
 Tuesday January 27th:

 Lower bound for finite cardinality class.
 The Growth Function.
 VapnikChervonenkis (VC) dimension and bounds in terms of the VC
dimension.
 Tuesday February 3rd:

 Tightness of the VC bound and near optimality of the ERM learning
rule.
 Another look at Structural Risk Minimization.
 Other loss functions and the pseudoshattering dimension.
 Computational considerations and surrogate loss functions.
Last modified: Sun Mar 01 13:06:42 Jerusalem Standard Time 2009