CMSC 3590 - Large Scale Learning, Spring 2009

Syllabus

The course will focus on theory and practice of working with large data sets, characterized by large numbers of data points and/or high dimensions. We will cover a set of computational tools that allow efficient storage, search and inference in such data sets, as well as theoretical results pertaining to these tools. We will consider both statistical and computational efficiency and limitations of the methods covered, and discuss data structures and algorithms for implementing these methods in practice. The major topics discussed in the course will include the following:

Dimensionality reduction, including random projections, spectral methods and embeddings.
Efficient indexing and search, including locality-sensitive hashing and metric trees
Large-scale non-parametric methods, including locally-weighted regression and example-based density estimation
Unsupervised statistical learning tasks, including clustering

Instructors:

Sham Kakade	`sham at tti-c.org`
Greg Shakhnarovich	`greg at tti-c.org`

Time and location:

Time:	Tue, Thu 1:30
Location:	TTI-Chicago, room 530
	6045 S. Kenwood Ave., 5th floor

Lecture slides and notes

Tue 3/31/09
- Dimensionality reduction: Karhunen-Loeve transform, PCA, SVD
- lecture notes (PDF)

Thu 4/2/09
- Johnson-Lindenstrauss lemma, random projections
- lecture notes (PDF)

Tue 4/7/09
- Classification & Regression, margins and random projections, risk of L_2 regularization vs. dimsensionality reduction
- lecture notes (PDF)

Thu 4/9/09
- Regression, PCA, random features, and the risk of L_2 regularization
- lecture notes (PDF)

Tue 4/14/09
- Nearest neighbor rules for classification and regression
- Asymptotic bounds on risk, rates of convergence with finite sample
- lecture notes (PDF)

Thu 4/16/09
- Nonparametric regression: Nadaraya-Watson kernel estimator, local linear models
- lecture notes (PDF)

Tue 4/21/09
- kd-trees
- lecture notes (PDF)

Thu 4/23/09
- bounded-box-decomposition (BBD) trees, approximate NN search
- lecture notes: posted soon

Tue 4/28/09
- Locality sensitive hashing in L₁ and Hamming spaces
- lecture notes: posted soon