Maximum Margin Matrix Factorization



Maximum Margin Matrix Factorizations
Nathan Srebro, Jason Rennie and Tommi Jaakkola
Advances in Neural Information Processing Systems (NIPS) 17, 2005 (December 2004 conference)
[PDF], [Slides in PDF], [Poster in PDF]
We present a novel approach to collaborative prediction, using low-norm instead of low-rank factorizations. The approach is inspired by, and has strong connections to, large-margin linear discrimination. We show how to learn low-norm factorizations by solving a semi-definite program, and present generalization error bounds based on analyzing the Rademacher complexity of low-norm factorizations.

Learning with Matrix Factorizations
Nathan Srebro
PhD Thesis, Massachusetts Institute of Technology, August 2004.
[PDF], [Defense slides PDF]
A more elaborate description of MMMF, with full proofs and more details about the optimization problem and generalization error bounds can be found in Chapters 5 and 6.2 of the Thesis.

Rank, Trace-Norm and Max-Norm
Nathan Srebro and Adi Shraibman
18th Annual Conference on Learning Theory (COLT), June 2005.
[Conference proceedings PDF], [Slides in PDF]
We study the rank, trace-norm and max-norm as complexity measures of matrices, focusing on the problem of fitting a matrix with matrices having low complexity. We present generalization error bounds for predicting unobserved entries that are based on these measures. We also consider the possible relations between these measures. We show gaps between them, and bounds on the extent of such gaps.

Fast Maximum Margin Matrix Factorization for Collaborative Prediction
Jason Rennie and Nathan Srebro
22nd International Conference on Machine Learning (ICML), August 2005.
[Conference proceedings PDF], [Jason's Slides PDF]
MMMF can be formulated as a semi-definite programming (SDP) and learned using standard SDP solvers. However, current SDP solvers can only handle MMMF problems on matrices of dimensionality up to a few hundred. Here, we investigate a direct gradient-based optimization method for MMMF and demonstrate it on large collaborative prediction problems. We compare against state-of-the-art collaborative prediction results on large movie-rating data sets and find the that MMMF substantially outperforms other methods.

Other relevant publications

Generalization Error Bounds for Collaborative Prediction with Low-Rank Matrices
Nathan Srebro, Noga Alon and Tommi Jaakkola
Advances in Neural Information Processing Systems (NIPS) 17, 2005 (December 2004 conference)
[PDF], [Poster in PDF]
See also Section 6.1 of my PhD Thesis
Here, we provide generalization error bounds for the more common approach of completing a matrix using a low-rank, rather then low-norm, factorization. These bounds are compared against in the above work on MMMF

Loss Functions for Preference Levels: Regression with Discrete Ordered Labels
Jason Rennie and Nathan Srebro
IJCAI-05 Multidisciplinary Workshop on Advances in Preference Handling, July 2005.
[Proceedings PDF], [Slides PDF].
The collaborative prediction data sets used in the MMMF experiments include discrete ordered labels, and the loss functions described here were developed for use with MMMF. This paper describes their use in a more straight-forward regression setting

Nati Srebro
Last modified: Fri Jan 7 00:42:10 EST 2005