Workshop on Machine Learning in Speech and Language Processing

September 13, 2016
San Francisco, CA, USA
Speaker: Kai Yu (Shanghai Jiao Tong University)

Title: Structured deep learning for context awareness in speech and language processing

Abstract:
Real world speech and language are always produced in context, such as acoustic environment, speaking style, topic, etc. A widely used processing paradigm, referred to as multi-style training, is to group all kinds of data together and rely on a powerful general model to implicitly remove the effect of the irrelevant contexts. Although multi-style deep learning approaches have shown impressive overall performance, analysis shows that there still exist large performance gaps between different contexts. In this talk, an alternative paradigm, structured training, is discussed in detail within deep learning framework. Here, contexts are explicitly modelled and combined with the model for the variabilities of primary interest. Various structured context representations combined with feedforward, recurrent neural network for acoustic and language modelling are reviewed. Structured deep learning is the extension of traditional adaptation framework. It is argued that no matter what advanced deep learning approaches are used, appropriate context model is still likely to yield additional gains. Hence the investigation of acoustic and language context representation and modelling is of great importance to the speech community.