ICML 2016 Workshop on
Multi-View Representation Learning (MVRL)

June 23, 2016
New York City, NY, USA

Workshop Abstract

Multi-view data are becoming increasingly available in machine learning and its applications. Such data may consist of multi-modal measurements of an underlying signal, such as audio+video, audio+articulation, video+fMRI, image+text, webpage+click-through data, and text in different languages; or may consist of synthetic views of the same measurements, such as different time steps of a time sequence, word+context words, ordifferent parts of a parse tree. The different views often contain complementary information, and multi-view learning methods can take advantage of this information to learn representations/features that are useful for understanding the structure of the data and that are beneficial for downstream tasks.

There have been increasing research activities in this direction, including exploration of different objectives (e.g., latent variable models, information bottleneck, contrastive losses, correlation-based objectives, multi-view auto-encoders, and deep restricted Boltzmann machines), deep learning models, the learning/inference problems that come with these models, and theoretical understanding of the methods.

The purpose of this workshop is to bring together researchers and practitioners in this area to share their latest results, to express their opinions, and to stimulate future research directions. We expect the workshop to help consolidate the understanding of various approaches proposed by different research groups, to help practitioners find the most appropriate tools for their applications, and to promote better understanding of the challenges in specific applications.

Possible topics include but are not limited to

Invited Speakers

Confirmed speakers:
Chris Dyer Carnegie Mellon University
Sham Kakade Universify of Washington
Honglak Lee University of Michigan
Ruslan Salakhutdinov Carnegie Mellon University

Additional speakers TBA!

Organizing Committee

Xiaodong He Microsoft Research
Karen Livescu TTI-Chicago
Weiran Wang TTI-Chicago
Scott Wen-tau Yih Microsoft Research


  1. Galen Andrew, Raman Arora, Jeff Bilmes, and Karen Livescu. Deep canonical correlation analysis. In ICML, 2013.

  2. Raman Arora and Karen Livescu. Multi-view CCA-based acoustic features for phonetic recognition across speakers and domains. In ICASSP, 2013.

  3. Francis R. Bach and Michael I. Jordan. Kernel independent component analysis. Journal of Machine Learning Research, 2002.

  4. Francis R. Bach and Michael I. Jordan. A probabilistic interpretation of canonical correlation analysis. Technical Report 688, Dept. of Statistics, University of California, Berkeley, 2005.

  5. Matthew B. Blaschko, Jacquelyn A. Sheltonb, Andreas Bartelsc, Christoph H. Lamperte, and Arthur Gretton. Semi-supervised kernel canonical correlation analysis with application to human fMRI. Pattern Recognition Letters, 32(11), 2011.

  6. Sarath Chandar, Stanislas Lauly, Hugo Larochelle, Mitesh M. Khapra, Balaraman Ravindran, Vikas Raykar, and Amrita Saha. An autoencoder approach to learning bilingual word representations. In NIPS, 2014.

  7. Kamalika Chaudhuri, Sham M. Kakade, Karen Livescu, and Karthik Sridharan. Multi-view clustering via canonical correlation analysis. In ICML, 2009.

  8. Paramveer Dhillon, Dean Foster, and Lyle Ungar. Multi-view learning of word embeddings via CCA. In NIPS, 2011.

  9. Ali Mamdouh Elkahky, Yang Song, and Xiaodong He. A multi-view deep learning approach for cross domain user modeling in recommendation systems. In Proceedings of the 24th International Conference on World Wide Web, 2015.

  10. Hao Fang, Saurabh Gupta, Forrest Iandola, Rupesh K. Srivastava, Li Deng, Piotr Dollar, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John C. Platt, C. Lawrence Zitnick, and Geoffrey Zweig. From captions to visual concepts and back. In CVPR, 2015.

  11. Manaal Faruqui and Chris Dyer. Improving vector space word representations using multilingual correlation. In Proceedings of EACL, 2014.

  12. Aria Haghighi, Percy Liang, Taylor Berg-Kirkpatrick, and Dan Klein. Learning bilingual lexicons from monolingual corpora. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2008.

  13. Karl Moritz Hermann and Phil Blunsom. Multilingual distributed representations without word alignment. In ICLR, 2014.

  14. Micah Hodosh, Peter Young, and Julia Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. Journal of Artificial Intelligence Research, 2013.

  15. Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. Learning deep structured semantic models for web search using clickthrough data. In CIKM, 2013.

  16. Sham M. Kakade and Dean P. Foster. Multi-view regression via canonical correlation analysis. In COLT, 2007.

  17. Einat Kidron, Yoav Y. Schechner, and Michael Elad. Pixels that sound. In CVPR, 2005.

  18. Ryan Kiros, Ruslan Salakhutdinov, and Rich Zemel. Multimodal neural language models. In ICML, 2014.

  19. Zhuang Ma, Yichao Lu, and Dean Foster. Finding linear structure in large datasets with scalable canonical correlation analysis. In ICML, 2015.

  20. Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, and Ruslan Salakhutdinov. Generating images from captions with attention. In ICLR, 2016.

  21. Brian Mcwilliams, David Balduzzi, and Joachim Buhmann. Correlated random features for fast semi-supervised learning. In NIPS, 2013.

  22. Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Ng. Multimodal deep learning. In ICML, 2011.

  23. Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Gregoire Mesnil. A latent semantic model with convolutional-pooling structure for information retrieval. In CIKM, 2014.

  24. Richard Socher and Fei-Fei Li. Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora. In CVPR, 2010.

  25. Kihyuk Sohn, Wenling Shang, and Honglak Lee. Improved multimodal deep learning with variation of information. In NIPS, 2014.

  26. Nitish Srivastava and Ruslan Salakhutdinov. Multimodal learning with deep boltzmann machines. Journal of Machine Learning Research, 2014.

  27. Weiran Wang, Raman Arora, Karen Livescu, and Jeff Bilmes. On deep multi-view representation learning. In ICML, 2015.

  28. Weiran Wang and Karen Livescu. Large-scale approximate kernel canonical correlation analysis. In ICLR, 2016.

  29. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption generation with visual attention. In ICML, 2015.

  30. Fei Yan and Krystian Mikolajczyk. Deep correlation for matching images and text. In CVPR, 2015.

  31. Bishan Yang, Scott Wen tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. Embedding entities and relations for learning and inference in knowledge bases. In ICLR, 2015.