TTIC 31120: Computational and Statistical Learning Theory
This is a webpage for the Fall 2012 course "Computational and Statistical Learning Theory", taught at TTIC, and also open to all University of Chicago students.
MWF 9:10-10:30AM in TTIC 530.
Instructor: Nati Srebro.
TA: Behnam Neyshabur.
Course Description
We will discuss classic results and recent advances in statistical
learning theory (mostly under the agnostic PAC model), touch on
computational learning theory, and also explore the relationship with
stochastic optimization and online regret analysis. Our emphasis will
be on developing a rigorous quantitative understanding of machine
learning and acquiring techniques for analyzing and proving
performance guarantees for learning methods.
Pre-Requisites
- Mathematical maturity, as obtain, e.g., in a rigorous analysis course.
- Discrete Math (specifically combinatorics and asymptotic notation)
- Probability Theory
- Introduction to Machine Learning
- Algorithms; Basic Complexity Theory (NP-Hardness)
Familiarity with Convex Optimization, Computational Complexity and
background in Statistics can be helpful, but is not required.
Specific Topics:
We will try to touch:
- The Statistical Model (Learning Based on an IID Sample):
- The PAC (Probably Approximately Correct) and Agnostic PAC models.
- Stochastic Optimization
- Cardinality Bounds
- Description Length Bounds
- PAC-Bayes
- Compression Bounds
- The Growth Function and VC Dimension
- VC Subgraph Dimension and Fat Shattering Dimension
- Tight Characterization of Learning in terms of the VC and Fat Shattering Dimensions
- Covering Numbers
- Radamacher Averages, including Local Radamacher Analysis
- Uniform Learning and No-Free Lunch Theorems
- Online Learning, Online Optimization and Online Regret
- The Perceptron Rule and Online Gradient Descent
- Experts and the Winnow Rule
- Bergman Divergence and Online Mirror Descent
- Online to Batch Conversion
- Computational Lower Bounds:
- Computational Hardness of Proper Learning
- Cryptographic Hardness of Learning
- Additional Topics
- Stability Based Analysis
- Boosting: Weak Learning and the Margin Interpretation of Boosting.
Requirements and Grading:
- Required Reading: Some material will be covered through assigned
reading, rather then in-class.
- Problem Sets: about 5-6 problem sets, some of which introducing additional material beyond the material covered in class. Students are required to turn in the problem sets, but they will not be graded.
- Scribing: each student is required to scribe (prepare lecture notes)
for two lectures. These notes will be graded.
- Challenge Problems: These are harder problems that might be a challenge to
solve and will be included in the problem sets. Successfully completing at least 1-2 of
these is required for receiving an "A" in the
class. Students will also be asked to present
and discuss (with the instructor) the solutions
to 1-2 of these.
- Research Problems: I don't know how to solve these (though I might
have an idea). Solving one of these will
likely earn an "A+" in the class and might
lead to a publication.
- Final Exam
In order to receive a "B" a student is expected to complete
the exercises, do a serious job at scribing a lecture and
display reasonable understanding on the exam.
In order to receive an "A" a student is expected to complete the
exercises, do an excellent job scribing a lecture,
successfully answer at least 1-2 Challenge Problems, and display a good understanding on the exam.
Recommended Texts
We will not be following a specific text, but some of the material is covered in:
Lectures and Required Reading:
- Lecture 1: Wednesday October 3rd
-
- Course Overview
- Basic Setup: Generalization Error, Learning Rule, Learning Algorithm
- Warmup: "Minimum Consistent Rectangle" learning rule.
Required Reading: Section 1 of: S. Boucheron, O. Bousquet, and G. Lugosi, (2004), Concentration inequalities, Advanced Lectures in Machine Learning, Springer, pp. 208--240, 2004 Alternative reading : Lecture notes
Problem Set 1 (due October 10th)
- Lecture 2: Friday October 5th
-
- Analysis of the "Minimum Consistent Rectangle" learning rule in
the realizable case. (Kearns and Vazirani Section 1.1)
- No Free Lunch Theorem.
- Empirical Risk Minimization
- Finite cardinality generalization bound and learning gurantee.
Bousquet et al, Introduction to statistical learning theory, Sections 1-3.
- Lecture 3: Wednesday October 10th
-
- Realizable, Agnostic and Optimistic Rates (details in HW)
- VC Learning Guarantee:
- Growth Function for Binary Classification
- Symmetrization (proof deferred)
- The VC-dimension and Shatter Lemma (proof in HW)
Bousquet et al, Introduction to statistical learning theory, Section 4
Problem Set 2 (due October 17th)
- Lecture 4: Friday October 12th
-
- The VC-dimension as a tight characterization of PAC learnability
- Compression Schemes and Compression-Based Learning Guarantee
- Lecture 5: Monday October 15th
-
- Description Length Bounds, the Minimum Description Length and Structural Risk Minimization learning rules.
- PAC-Bayes Bounds (proof in HW)
- Lecture 6: Wednesday October 17th
-
- Universal Learning
- Introduction of computational model, definition of proper and improper efficient PAC learnability.
- Reducing the consistency problem to Proper PAC Learnability.
Kearns and Vazirani Chapter 1
Problem Set 3 (due October 24th)
- Lecture 7: Friday October 19th
-
- 3-TERM-DNF is not efficiently properly PAC learnable, but is efficiently (improperly) PAC learnable.
- On the efficient PAC learnability of poly-time functions.
Kearns and Vazirani Chapter 1
- Lecture 8: Monday October 22nd
-
- Learning, Cryptography, and One-Way Functions
- Hardness of Discrete Cube Root as an assumption, and the Discrete Cube Root as a learning problem.
- Cryptographic hardness of PAC Learning poly-time functions and log-depth networks.
- For further reading on hardness of learning depth-two networks: Cryptographic Hardness for Learning Intersections of Halfspaces, Klivans and Sherstov, FOCS 2006.
- Efficient Agnostic Learning:
- Reducing AGREEMENT to agnostic proper learning, and the hardness of agnostic proper learning half-spaces. (proofs in HW)
- Cryptographic hardness of agnostic improper learning of half-spaces. Further reading Learning Kernel Based Halfspaces with the 0-1 Loss, Shai Shalev-Shwartz, Karthik Sridharan and Ohad Shamir, SIAM J Computin 2011.
- Non-convexity of 0/1-loss and introduction of convex surrogate loss functions.
Kearns and Vazirani Chapter 6
- Lecture 9: Wednesday October 24th
-
- Introduction to statistical learning (optimization) of real valued objectives.
- Empirical Covering Numbers as generalization of the Growth Function
- Rademacher Complexity and Symmetrization
- McDiarmid's Inequality
- Massart's Finite Class Lemma (proof in HW)
Problem Set 4 (due October 31st)
- Lecture 10: Friday October 26th
-
- Pollard's and Dudley's bounds on the Rademacher complexity in terms of covering numbers (proof of Dydley's refined bound in HW).
- Learning guarantee in terms of the VC-subgraph dimension:
- Pseudo-shattering and the VC-subgraph dimension.
- Bounding covering numbers in terms of the VC-subgraph dimension (proof deferred).
- The class of linear predictors.
- Scale Sensitive Learning
- The fat shattering dimension.
- Bounding covering numbers in terms of the fat shattering dimension---the fat shatter lemma (proof in HW).
- Lipschitz Composition Lemma.
- Conclusion: Learning guarantee in terms of fat-shattering dimension.
- Lecture 11: Monday October 29th
-
- The class of l_2-bounded linear predictors
- Bounding the fat-shattering dimension
- Bounding the Rademacher complexity directly
- Fat shattering as necessary and sufficient for learnability.
- Margin-based learning guarantees: the margin loss, ramp loss and hinge-loss.
- Lecture 12: Wednesday October 31st
-
- Aggregate/coting predictors
- Rademacher complexity of the convex hull.
- l_1-bounded linear predictors.
- Introduction to Online Learning
- Online learning rule, regret, mistake bound.
- Mistake bound for finite cardinality classes: HALVING
Problem Set 5&6 (due November 11th)
- Lecture 13, Friday November 2nd
-
- Halfspaces are not online learnable
- The Online Perceptron and its mistake bound.
- Online Gradient Descent
- Relating regret and linear regret
- Lecture 14: Monday November 5th
-
- Analysis of Online Gradient Descent
- Interpreting Perceptron as Online Gradient Descent
- Conservative vs aggressive updates.
- Non-separable perceptron.
- Hypothesis classes and function classes; Linear and Lipschitz classes.
- Online vs. Statistical Learning; Online to Batch Conversion.
- Lecture 15: Wednesday November 7th
-
- Online to Batch Conversions and the Stochastic Approximation approach.
- Stochastic Gradient Descent as computationally optimal for PAC Learning.
- Understanding Online Gradient Descent in terms of regularization.
- Beyond the l_2 norm: Strongly convex distance generating functions, Bregman divergence, strong convexity and dual norms.
- Lecture 16: Monday November 12th
- Background Reading: Sections 5.4.1-5.4.2 of Nemirovski's "Lectures on Modern Convex Optimization"
- Online Mirror Descent
- l_p norm bounded classes; using a squared l_p norm for an l_1 constrained problem.
- The entropy regularized, exponentiated gradient, and the multiplicative weights algorithm.
- Lecture 17: Wednesday November 14th
- Plan:
- Mirror Decent, Dual Averaging, Passive-Aggressive and Follow-The-Regularized-Leader
- Using a strongly convex regularizer to bound the Rademacher complexity.
- Stability based learning guarantees
- Bounding stability in terms of strong convexity
Problem Set 7 (due November 21th)
- Lecture 18: Monday November 19th
- Plan:
- Lecture 19: Wednesday November 21st
- Plan:
Problem Set 8 (due December 1st)
- Lecture 20: Monday November 26th
- Plan:
Assignments:
Problem Set 1 (due October 10th)
Problem Set 2 (due October 17th)
Problem Set 3 (due October 24th)
Problem Set 4 (due October 31st)
Problem Set 5&6 (due November 11th)
Problem Set 7 (due November 21th)
Problem Set 8 (due December 1st)
Last modified: Mon Nov 12 22:39:06 Central Standard Time 2012