Introduction to Machine Learning: A Statistical Learning Theory Approach
Tuesdays, 9:00-11:00, Ziskind Room 1
This half-course will provide a basic introduction to the fundamental
concepts of Machine Learning, through the eyes of Statistical Learning
Theory. The purpose is to both gain an appreciation and understanding
of Machine Learning, and to introduce Statistical Learning Theory and
the PAC-framework as a theoretical tool for rigorously studying
Machine Learning.
In this brief half-course, we will focus almost exclusively on
supervised classification, although the concepts we will discuss are
relevant in a much wider context.
Specific Topics:
- What is Machine Learning? When and why is it relevant?
- Formalization of Learning through the PAC Framework.
- Absolute, post-hoc and relative (regret) learning guarantees.
- The trade-off between prior knowledge and data; The No-Free-Lunch Theorem.
- Error decomposition and the model complexity (bias vs variance) trade-off.
- Model selection, Occam's Razor and Structural Risk Minimization.
- Model complexity in the finite-dimensional case: The VC
dimension.
The following topics will be covered at the beginning of the second
semester as a continuation to the material covered in the course:
- Scale-sensitive model complexity: Covering numbers and fat-shattering.
- An example of infinite dimensional learning: the Support Vector Machine.
Assignments:
There will be 2 homework assignments, each counting towards 20% of the
final grade.
Note two corrections in the original version.
Required code and data file (code and data updated March 1st: details).
References:
There is no set book for the course, nor are there any books covering
precisely the material presented in the form presented. A good
reference covering much of the theoretical material is the following
survey:
Some other books you might find relevant include:
- L. Devroye, L. Gyorfi and G. Lugosi, A probabilistic theory of pattern recognition, Springer, 1996.
- M. Anthony and P.L. Bartlett, Neural Network Learning: Theoretical
Foundations, Cambridge University Press, 1999.
- V. Vapnik, The nature of statistical learning theory, Springer, 2 edition.
- T. Hastie, R. Tibshirani and J. Friedman, The elements of statistical learning, Springer.
- B. Scholkopf and A. Smola, Learning with Kernels, MIT Press.
- Kearns and Vazirani, An introduction to computational learning theory, MIT
Press.
You might also find lecture notes in the following courses useful:
Lectures:
- Tuesday December 23rd (Boaz):
- Introduction
- Tuesday December 30th (Boaz):
- Validating a Hypothesis and Concentration Bounds for the Binomial.
- Tuesday January 6th:
-
- Basic setup and concepts
- An example of a learning guarantee: Noiseless learning using axis-aligned rectangles
- No Free Lunch theorems
- Tuesday January 13th:
-
- Learning with noise: The Empirical Risk Minimization (ERM) Learning Rule.
- Uniform Convergence of Empirical Risk for finite hypothesis classes.
- Post-hoc generalization guarantee for finite hypothesis classes.
- Regret-type (relative) guarantee for finite hypothesis classes.
- Approximation and estimation errors and the effect of model complexity.
- Tuesday January 20th:
-
- Model Order Selection and Structural Risk Minimization.
- Use of Validation.
- Priors over hypothesis; Description Length Bounds.
- Minimum Description Length learning rule.
- Tuesday January 27th:
-
- Lower bound for finite cardinality class.
- The Growth Function.
- Vapnik-Chervonenkis (VC) dimension and bounds in terms of the VC
dimension.
- Tuesday February 3rd:
-
- Tightness of the VC bound and near optimality of the ERM learning
rule.
- Another look at Structural Risk Minimization.
- Other loss functions and the pseudo-shattering dimension.
- Computational considerations and surrogate loss functions.
Last modified: Sun Mar 01 13:06:42 Jerusalem Standard Time 2009