CMSC 35900-2:
A Probabilistic Approach to Machine Learning

Fall 2008, Tuesdays and Thursdays 10:30-12:00 at TTI-C
Instructor: Nati Srebro

What is this class?

We will consider selected machine learning topics from a probabilistic, and often Bayesian, perspective. That is, we will present an approach to machine learning which focuses on constructing a probabilistic model and then predicting using the posterior probability given observations.

Pre-Requisites and Intended Audience

The class assumes an understanding of basic concepts in machine learning (not necessarily from a Bayesian perspective). It is generally not intended as an introductory class to machine learning, but rather as a class introducing an alternative perspective, and relevant techniques, to those already (at least somewhat) familiar with machine learning.
Pre-requisites:

An introductory class to machine learning, such as "CMSC35400 Machine Learning" or "CMSC35420/TTIC103 Statistical Methods in AI".
Probability

Topics

We will discuss the fundamental principles of the probabilistic approach, the relationship to other approaches, inference techniques and specific probabilistic models commonly used in machine learning.

Topics may include:

Introduction to Bayesian inference.
Conjugate priors. The binomial, multinomial and Gaussian distributions.
Discriminative probabilistic models and the conditional likelihood.
Approximate inference, including Laplace approximation and variational methods.
Sampling techniques, including Gibbs Sampling and general MCMC.
The EM Algorithm as a variational technique.
Topic / latent variable models.
Nonparametric Bayesian models.
Gaussian processes and their relationship to SVM/Kernel methods.
The Dirichlet process: clustering and nonparametric topic models.
Neural Networks as Inference.
Boltzmann Machines and Deep Belief Networks.

References

The primary text we will use for about half of the topics is:

David MacKay: Information Theory, Inference and Learning Algorithms.: The entire book, as well as some extra material, is available online (of course, you can also purchase a hardbound physical version of the book).

Other texts that will be used to cover specific topics:

Carl Rasmussen and Christopher Williams: Gaussian Processes for Machine Learning.: This book is also available online or for purchase.
Radford Neal: Probabilistic Inference Using Markov Chain Monte Carlo Methods: An excellent detailed survey of MCMC sampling techniques---available (only) online.
Andrew Gelman, John Carlin, Hal Stern and Donald Rubin: Bayesian Data Analysis, 2nd Edition.: Available for purchase, but unfortunately not available online.
Michael Jordan: An Introduction to Probabilistic Graphical Models.: This book is not yet available. Hardcopies of relevant chapters will be provided to students attending the class.

Problem Sets

Problem Set 1

Problem Set 2

Detailed Schedule and Topics Covered

Thursday October 2nd

What is learning? Importance of prior knowledge to learning. Representing prior knowledge via a probability distribution.
Bayesian inference and "inverse probability" calculations. The prior, likelihood, posterior and evidence.

Tuesday October 7th

Bayesian inference. The posterior distribution of the parameters and over future events. The MAP and Maximum Likelihood estimates of the parameters.
The Bayesian evidence and its use in model comparison.
Reading: MacKay Sections 2.1-2.3, Chapter 3
Exchangeability and de Finetti's Theorem.
What is a Conjugate Prior?
The Beta distribution: definition, properties, calculations.

Thursday October 9th: no lecture

Friday October 10th:

Supervised learning using the Naive Bayes model.
The Posterior Mean parameter estimate and its relationship to the full posterior in the Naive Bayes model.

Tuesday October 14th: no lecture.

Thursday October 16th

Model selection in the Naive Bayes model.
The relationship between the Naive Bayes model and Logistic Linear Regression.
A Probabilistic Model for Logistic Linear Regression
Generative vs. Discriminative Learning

Friday October 17th

Conditional Independence and Factorization in Directed Graphical Models.
The "Bayes Ball" algorithm.
Reading: Jordan Section 2.1
Problem Set 1 out.

Tuesday October 21st

From Linear Prediction to Gaussian Processes.
Gaussian Processes for Regression.
Reading: Rasmussen and Williams Sections 2.1-2.2.

Thursday October 23rd

Problem set 1 recommended submission date
Gaussian Processes for Classification.
Reading: Rasmussen and Williams Sections 3.1-3.3

Tuesday October 28th

Final date problem set 1 accepted
Laplace Approximation.
Reading: Rasmussen and Williams Section 3.4

Thursday October 30th

Model Selection and the Laplace Approximation to the Bayesian Evidence.
Relationship to Regularization and to SVMs.
Reading: Rasmussen and Williams Sections 5.1-5.2,5.4-5.5,6.4

Tuesday November 4th

Introduction to sampling techniques:
- Integration through sampling.
- Rejection sampling.
- Importance sampling.
Introduction to Markov Chains and the MCMC technique:
- Sampling using a Markov chain.
- The stationary distribution.
- Ergodic and non-ergodic Markov chains.
- Detailed balance and reversible Markov chains.
Reading: MacKay Sections 29.1-29.3,29.6
More detailed reading: Neal Chapter 3

Thursday November 6th

The Metropolis MCMC.
The Metropolis-Hastings MCMC.
The Langevin Method.
Gibbs Sampling.
Reading: MacKay Sections 29.4-29.5,41.4
More detailed reading: Neal Chapter 4

Tuesday November 11th

Calculating the Bayesian Evidence using MCMC.
MCMC Sampling vs. Optimization.
Simulated Annealing.
Reading: Neal Section 6.1-6.2
Multi-layered Feed-forward Networks.
Neural Networks as Inference.
Reading: MacKay Chapters 41,44 (and background from Chapters 38-39

Thursday November 13th

Boltzman Machines.
Reading: MacKay Chapters 42-43

Tuesday November 18th

Restricted Boltzman Machines and Deep Belief Networks

Thursday November 20th

Latent Variables.
Clustering Models.
The EM Algorithm.

Tuesday November 25th

The Dirichlet Distribution
Gibbs Sampling in Clustering Models.
Problem Set 2 out.

Last modified: Tue Nov 25 22:12:43 Central Standard Time 2008

CMSC 35900-2:A Probabilistic Approach to Machine Learning