CMSC 35900-2:
A Probabilistic Approach to Machine Learning
Fall 2008, Tuesdays and Thursdays 10:30-12:00 at TTI-C
Instructor: Nati Srebro
What is this class?
We will consider selected machine learning topics from a
probabilistic, and often Bayesian, perspective. That is, we will
present an approach to machine learning which focuses on constructing
a probabilistic model and then predicting using the posterior
probability given observations.
Pre-Requisites and Intended Audience
The class assumes an understanding of basic concepts in machine
learning (not necessarily from a Bayesian perspective). It is
generally not intended as an introductory class to machine learning,
but rather as a class introducing an alternative perspective, and
relevant techniques, to those already (at least somewhat) familiar
with machine learning.
Pre-requisites:
- An introductory class to machine learning, such as "CMSC35400
Machine Learning" or "CMSC35420/TTIC103 Statistical Methods in
AI".
- Probability
Topics
We will discuss the fundamental principles of the probabilistic
approach, the relationship to other approaches, inference techniques
and specific probabilistic models commonly used in machine
learning.
Topics may include:
- Introduction to Bayesian inference.
- Conjugate priors. The binomial, multinomial and Gaussian
distributions.
- Discriminative probabilistic models and the conditional likelihood.
- Approximate inference, including Laplace approximation and
variational methods.
- Sampling techniques, including Gibbs Sampling and general MCMC.
- The EM Algorithm as a variational technique.
- Topic / latent variable models.
- Nonparametric Bayesian models.
- Gaussian processes and their relationship to SVM/Kernel methods.
- The Dirichlet process: clustering and nonparametric topic models.
- Neural Networks as Inference.
- Boltzmann Machines and Deep Belief Networks.
References
The primary text we will use for about half of the topics is:
- David MacKay: Information Theory, Inference and Learning Algorithms.
-
The entire book, as well as some extra material, is available online (of course, you can also purchase a hardbound physical version of the book).
Other texts that will be used to cover specific topics:
- Carl Rasmussen and Christopher Williams: Gaussian
Processes for Machine Learning.
- This book is also
available online
or for purchase.
- Radford Neal: Probabilistic Inference Using Markov Chain Monte Carlo Methods
- An excellent detailed survey of MCMC sampling techniques---available (only) online.
- Andrew Gelman, John Carlin, Hal Stern and Donald Rubin:
Bayesian Data Analysis, 2nd
Edition.
- Available for purchase, but unfortunately
not available online.
- Michael Jordan: An Introduction to Probabilistic Graphical Models.
- This book is not yet available.
Hardcopies of relevant chapters will be provided
to students attending the class.
Problem Sets
Detailed Schedule and Topics Covered
- Thursday October 2nd
- What is learning? Importance of prior knowledge to learning. Representing prior knowledge via a probability distribution.
- Bayesian inference and "inverse probability" calculations. The prior, likelihood, posterior and evidence.
- Tuesday October 7th
- Bayesian inference. The posterior distribution of the parameters and over future events. The MAP and Maximum Likelihood estimates of the parameters.
- The Bayesian evidence and its use in model comparison.
- Reading: MacKay Sections 2.1-2.3, Chapter 3
- Exchangeability and de Finetti's Theorem.
- What is a Conjugate Prior?
- The Beta distribution: definition, properties, calculations.
- Thursday October 9th: no lecture
- Friday October 10th:
- Supervised learning using the Naive Bayes model.
- The Posterior Mean parameter estimate and its relationship to the full posterior in the Naive Bayes model.
- Tuesday October 14th: no lecture.
- Thursday October 16th
- Model selection in the Naive Bayes model.
- The relationship between the Naive Bayes model and Logistic Linear Regression.
- A Probabilistic Model for Logistic Linear Regression
- Generative vs. Discriminative Learning
- Friday October 17th
- Conditional Independence and Factorization in Directed Graphical Models.
- The "Bayes Ball" algorithm.
- Reading: Jordan Section 2.1
- Problem Set 1 out.
- Tuesday October 21st
- From Linear Prediction to Gaussian Processes.
- Gaussian Processes for Regression.
- Reading: Rasmussen and Williams Sections 2.1-2.2.
- Thursday October 23rd
-
- Tuesday October 28th
-
- Thursday October 30th
- Model Selection and the Laplace Approximation to the Bayesian Evidence.
- Relationship to Regularization and to SVMs.
- Reading: Rasmussen and Williams Sections 5.1-5.2,5.4-5.5,6.4
- Tuesday November 4th
- Introduction to sampling techniques:
- Integration through sampling.
- Rejection sampling.
- Importance sampling.
- Introduction to Markov Chains and the MCMC technique:
- Sampling using a Markov chain.
- The stationary distribution.
- Ergodic and non-ergodic Markov chains.
- Detailed balance and reversible Markov chains.
- Reading: MacKay Sections 29.1-29.3,29.6
- More detailed reading: Neal Chapter 3
- Thursday November 6th
- The Metropolis MCMC.
- The Metropolis-Hastings MCMC.
- The Langevin Method.
- Gibbs Sampling.
- Reading: MacKay Sections 29.4-29.5,41.4
- More detailed reading: Neal Chapter 4
- Tuesday November 11th
- Calculating the Bayesian Evidence using MCMC.
- MCMC Sampling vs. Optimization.
- Simulated Annealing.
- Reading: Neal Section 6.1-6.2
- Multi-layered Feed-forward Networks.
- Neural Networks as Inference.
- Reading: MacKay Chapters 41,44 (and background from Chapters 38-39
- Thursday November 13th
- Boltzman Machines.
- Reading: MacKay Chapters 42-43
- Tuesday November 18th
- Restricted Boltzman Machines and Deep Belief Networks
- Thursday November 20th
- Latent Variables.
- Clustering Models.
- The EM Algorithm.
- Tuesday November 25th
-
Last modified: Tue Nov 25 22:12:43 Central Standard Time 2008