The course will focus on theory and practice of working with large
data sets, characterized by large numbers of data points and/or high
dimensions. We will cover a set of computational tools that allow
efficient storage, search and inference in such data sets, as well as
theoretical results pertaining to these tools. We will consider both
statistical and computational efficiency and limitations of the
methods covered, and discuss data structures and algorithms for
implementing these methods in practice.
The major topics discussed in the course will include the following:
Dimensionality reduction, including random projections, spectral
methods and embeddings.
Efficient indexing and search, including locality-sensitive hashing and
metric trees
Large-scale non-parametric methods, including locally-weighted
regression and example-based density estimation
Unsupervised statistical learning tasks, including clustering