Invited Talk Abstracts

George Papandreou: Perturb-and-MAP Random Fields

Probabilistic Bayesian methods such as Markov random fields are well suited for describing complex data, providing us with the natural conceptual framework for representing the uncertainty in interpreting them and automatically learning model parameters from training examples. However, Bayesian techniques pose significant computational challenges in computer vision and other large-data domains and practitioners offer prefer computationally cheaper deterministic energy minimization techniques instead.

I will present a new computationally efficient probabilistic random field model, which can be best described as a “Perturb-and-MAP” generative process: We obtain a random sample from the whole field by first injecting noise into the system's energy function, then solving an optimization problem to find the least energy configuration of the perturbed system. With Perturb-and-MAP random fields we thus turn powerful deterministic energy minimization methods into efficient probabilistic one-shot random sampling algorithms. I will discuss how the Perturb-and-MAP model relates to the standard Gibbs MRF and how it can be used in conjunction with other approximate Bayesian computation techniques such as MCMC or variational Bayes. I will illustrate these ideas with applications in image inpainting and deblurring, image segmentation, and scene labeling, showing how the Perturb-and-MAP model enables Bayesian inference in challenging large-scale computer vision problems.

Please see for more info and papers related to Perturb-and-MAP.

Max Welling: Herding versus Perturb-and-MAP

In this talk I will illustrate herding by applying it to the problem of estimating the final outcome of a game of GO given some current game state. An application of herding to simultaneous image segmentation and labeling will also be discussed. Finally, I will discuss the similarities and differences between herding and the recently introduced “Perturb-and-MAP” method. What are the key ingredients that these methods share, which may explain their success?

Amir Globerson: A Simple Geometric Interpretation of SVM using Stochastic Adversaries

It has long been clear that support vector machines are related to notions of geometric robustness. However, most characterizations of robustness in this context involved somewhat elaborate perturbation schemes. Here we propose considering robustness to stochastic perturbations and show that this results in a very simple interpretation of the SVM objective and its regularization constant. We also present extensions to other regularized loss minimization problems and multiclass learning problems.

Andrea Montanari: Approximate Message Passing Algorithms

Approximate Message Passing (AMP) algorithms are a class of iterative algorithms for non-linear statistical estimation. They build on the theory of belief propagation in graphical models but apply to problems without any (obvious) sparse graph structure. A rich class of applications is provided by signal processing problems in which a structured signal has to be recovered from noisy linear observations.I will describe several applications, as well as the connections with graphical models ideas.

Based on joint work with Mohsen Bayati, David Donoho, Adel Javanmard, Iain Johnstone, Arian Maleki.

Tamir Hazan: Learning with Random Maximum A-Posteriori Perturbations

Learning and inference in complex models drives much of the research in machine learning applications, from computer vision, natural language processing, to computational biology. The inference problem in such cases involves assessing the weights of possible structures, whether objects, parsers, or molecular structures. Although it is often feasible to only find the most likely or maximum a-posteriori (MAP) assignment rather than considering all possible assignment, MAP inference is limited when there are other likely assignments. In a fully probabilistic treatment, all possible alternative assignments are considered thus requiring summing over the assignments with their respective weights which is considerably harder (#P hard vs NP hard). The main surprising result of our work is that MAP inference (maximization) can be used to approximate and bound the weighted counting. This leads us to a new approximate inference framework that is based on MAP-statistics, thus does not depend on pseudo-probabilities, contrasting the current framework of Bethe approximations which lacks statistical meaning. This approach excels in regimes where there are several but not exponentially many prominent assignments. For example, this happens in cases where observations carry strong signals (local evidence) but are also guided by strong consistency constraints (couplings).

Pascal Vincent: Learning Robust Representations Through Input Perturbations: From Stochastic Perturbations to Analytic Criteria and Back

The notion that higher level representations (features) computed from low level inputs should be robust to small input perturbations appears both natural and simple. Yet it proves to be a powerful inductive bias for guiding the learning of useful representations. I will explain how it inspired several different learning criteria and successful algorithms for unsupervised representation learning. More specifically I will highlight how it can be implemented through either stochastic input perturbations, or in the infinitesimal limit through an analytic criterion, yielding different regularized autoencoder variants (denoising and contractive) with different strengths. I will then present an inductive principle for learning unnormalized probability densities that similarly uses input perturbations, and is related to (regularized) score matching. Finally I will discuss results that suggest such approaches can and do implicitly model low dimensional non-linear data manifolds, and will propose enhanced variants that would more directly bias the learning towards such manifold strctures. Throughout the presentation, I will be going back and forth between stochastic perturbations and related deterministic analytic criteria, which I hope may spawn interesting discussions on the interface between, and merits of, both these outlooks.

Ryan Adams: Building Probabilistic Models Around Deterministic Optimization Procedures

In applications such as those found in image labeling or point-matching problems in computer vision, we have access to highly efficient discrete optimization (or MAP inference) procedures, but far less efficient marginal inference procedures. The main difficulty stems from the computational hardness of computing the partition function under the standard Gibbs distribution that arises when working with these models. We explore alternative distributions over high dimensional structured discrete spaces that are specified in terms of generative models based on optimization procedures, such as graph cuts or the Hungarian algorithm.

This is joint work with Daniel Tarlow and Richard Zemel.

Shie Mannor: All Learning is Robust

Controlling overfitting is a long standing topic of study in machine learning. Regularization is a commonly used technique to control overfitting where a penalty is added to the cost function (typically the classification or regression error). The success of regularization in a host of different algorithms is usually interpreted as coming from penalizing the complexity of the resulting decision rules favoring “simple” rules. In this talk we propose a different perspective to learning base on robust optimization. That is, assuming that each sample corrupted by a certain disturbance, we find the best decision under the most adversarial perturbation. We show that a special choice of the perturbation exactly recovers the solution obtained by penalizing complexity via regularization. Both Support Vector Machines and Lasso can be re-derived from a robust optimization perspective.

The equivalence relationship between regularization and robustness gives a physical interpretation of the regularization process. Moreover, it helps us explain from a robustness point of view why support vector machines are statistically consistent, and why Lasso produces sparse solutions.

Generalizing these results we use the robustness perspective to derive new algorithms in new domains that have both favorable statistical and computational properties. We finally argue that robustness is a necessary and sufficient condition for consistency of learning algorithms and in fact every useful learning algorithm must possess some robustness properties.

The work presented in this talk is based on the following papers, available from the speaker's home page:
H. Xu, C. Caramanis and S. Mannor. Robustness and Regularization of Support Vector Machines. 2009. Journal of Machine Learning Research, 10(Jul):1485-1510.
H. Xu, C. Caramanis and S. Mannor. Robust Regression and Lasso. 2010. IEEE Transactions on Information Theory, 56(7): 3561 - 3574.
H. Xu and S. Mannor. Robustness and Generalization. 2012. Machine Learning, 86(3):391-423.