TTIC 31120: Computational and Statistical Learning Theory
This is a webpage for the Winter 2011 course "Computational and Statistical Learning Theory", taught at TTIC, and also open to all University of Chicago students.
Lectures: Monday and Wednesday 3-4:30pm at TTIC 530 (located at 6045
S. Kenwood Ave, fifth floor).
Instructor: Nati Srebro.
TA: Karthik Sridharan.
Course Description
We will discuss classic results and recent advances in statistical
learning theory (mostly under the agnostic PAC model), touch on
computational learning theory, and also explore the relationship with
stochastic optimization, the oracle model of optimization, and online
regret analysis. Our emphasis will be on developing a rigorous
quantitative understanding of machine learning and acquiring techniques
for analyzing and proving performance guarantees for learning methods.
Pre-Requisites
- Mathematical maturity, as obtain, e.g., in a rigorous analysis course.
- Discrete Math (specifically combinatorics and asymptotic notation)
- Probability Theory
- Introduction to Machine Learning
- Algorithms; Basic Complexity Theory (NP-Hardness)
Familiarity with Convex Optimization, Computational Complexity and
background in Statistics can be helpful, but is not required.
Specific Topics:
We will try to touch
- The Statistical Model (Learning Based on an IID Sample):
- The PAC (Probably Approximately Correct) and Agnostic PAC models.
- Stochastic Optimization
- Hoeffding and Bernstien Inequalities
- Cardinality Bounds
- Description Length Bounds
- PAC-Bayes
- Compression Bounds
- The Growth Function and VC Dimension
- VC Subgraph Dimension and Fat Shattering Dimension
- Tight Characterization of Learning in terms of the VC and Fat Shattering Dimensions
- Covering Numbers
- Radamacher Averages, including Local Radamacher Analysis
- Uniform Learning and No-Free Lunch Theorems
- Online Learning, Online Optimization and Online Regret
- The Perceptron Rule and Online Gradient Descent
- Experts and the Winnow Rule
- Bergman Divergence and Online Mirror Descent
- Online to Batch Conversion
- Computational Lower Bounds:
- Computational Hardness of Proper Learning
- Cryptographic Hardness of Learning
- Lower Bounds in the Oracle Model of Optimization
- Additional Topics
- Stability Based Analysis
- Boosting: Weak Learning and the Margin Interpretation of Boosting.
- Comparison of assumptions and results with typical statistical
study of regression.
Requirements and Grading:
- Required Reading: Some material will be covered through assigned
reading, rather then in-class.
- Scribing: students are required to scribe (prepare lecture notes)
one lecture. This is required in order to receive a
grade in the class, and the quality of the notes will be
taken into consideration in determining the grade.
- Exercises and Problems: every 1-2 weeks three types of problems will be
posted:
- Exercises: These are relatively easy review exercises, often
covering required reading not covered in class. A random subset of
these will be graded.
- Challenge Problems: These are harder problems that might be a challenge to
solve. Successfully completing at least 1-2 of
these is required for receiving an "A" in the
class. Students will also be asked to present
and discuss (with the instructor) the solutions
to 1-2 of these.
- Research Problems: I don't know how to solve these (though I might
have an idea). Solving one of these will
likely earn an "A+" in the class and might
lead to a publication.
- There will be a short final in-class quiz on basic concepts
covered in the class.
In order to receive a "B" a student is expected to complete
the exercises, do a good job at scribing a lecture and
display reasonable understanding on the quiz.
In order to receive an "A" a student is expected to complete the
exercises, do a very good job scribing a lecture,
successfully answer at least 1-2 Challenge Problems and
present them to the instructor satisfactorily.
Lectures and Required Reading:
- Lecture 1: Monday January 3rd
-
- Course Overview
- Basic Setup: Generalization Error, Learning Rule, Learning Algorithm
- Analysis of the "Minimum Consistent Rectangle" learning rule in
the realizable case.
- No Free Lunch Theorem.
Required Reading: Section 1 of: S. Boucheron, O. Bousquet, and G. Lugosi, (2004), Concentration inequalities, Advanced Lectures in Machine Learning, Springer, pp. 208--240, 2004 Alternative reading : Lecture notes
Homework 1 out (due 5th Jan, 2010 (preferable), ultimate deadline 10th Jan, 2010)
- Lecture 2: Wednesday January 5th
-
- Empirical Risk Minimization
- Finite cardinality generalization bound and learning gurantee.
- The growth function.
- Lecture 4: Monday January 10th
- VC-dimension and the Shatter Lemma.
- Lecture 5: Wednesday January 13th
- PAC Learnability and the VC dimension.
- Description Length based learning rules and gurantees, Structural Risk Minimization.
- No Lecture on Monday January 17th (MLK Day)
- Lecture 6: Wednesday January 19th, given by Karthik
- PAC-Bayes bound.
- Compression scheme bounds.
Homework 2 out (due January 31st)
- Lecture 7: Monday January 24th
- Universal learning.
- Computational considerations: efficient PAC learning and proper PAC learning.
- Lecture 8: Wednesday January 26th
- Efficient proper PAC learning implies Consistency is in RP.
- Hardness of efficiently PAC learning 3-TERM-DNFs, even though they can be learned improperly.
- Lecture 9: Monday January 31st
- Inherent hardness of efficient PAC learning.
- Existence of hypothesis classes that are easy to learn statistically but hard to learn computationally, based on counting arguments.
- Overview of cryptographic hardness of efficient PAC learning.
- Hardness of efficient PAC learning vs efficient agnostic PAC learning.
Homework 3 out (due 16th Feb, 2011)
Homework 4 out (due 2nd March, 2011)
Homework 5 out (due 16th March, 2011)
Assignments:
- Homework 1 (due 5th Jan, 2011 (preferable), ultimate deadline 10th Jan, 2010) Note: updated January 4th afternoon(solutions to earlier posted version will also be accepted)
Last modified: Thu Jan 27 15:09:06 Central Standard Time 2011