On March 13 and 14, 2015, ICSI researchers, alumni, and friends gathered to celebrate the career of Nelson Morgan, the founding leader of ICSI’s Speech Group and the director of ICSI for 12 years.

Ten of his former students and current colleagues gave talks inspired by Morgan’s contributions both to science and to ICSI.

Skip to:

The Spread of Innovation in Speech and Language Processing
Professor Dan Jurafsky
Chair of Linguistics,
Professor of Computer Science, Stanford

Understanding how innovation spreads in a field is crucial for our understanding of an academic area. In this talk I'll describe our work on a kind of `computational history of science' in which we use natural language processing on the text of on-line papers to model how ideas spread from field to field. By tracing the rise and fall of topics in the text and following the citation network the trajectories of individual authors in the field, we show how the current field of computational linguistics matured and developed. We show the crucial role of interdisciplinarity and also of government influence in the spread of scientific innovation and discuss the key role that speech recognition researchers like Nelson Morgan took in pollinating natural language processing with probabilistic and machine learning innovations. Joint work with David Hall, Chris Manning, Dan McFarland, Ashton Anderson, and Adam Vogel.


The End of General-Purpose Computers (Again)
Krste Asanović
Professor of Computer Science
UC Berkeley

With their inherent flexibility and rapidly accelerating performance, general-purpose computers have been the primary engine driving computing into all aspects of modern life. But over twenty years ago, there was a flurry of activity building specialized computers to handle neural network algorithms as it was thought specialization could bring large efficiency gains. These "sixth-generation" machines followed the earlier "fifth-generation" computing effort to build specialized computers for artificial intelligence. None of these specialized computers were even close to commercially viable. Now, driven by the big data buzz, there is considerable interest in a new wave of specialized hardware for neural network training. What's changed (if anything)?


*NN: Neural Network Acoustic Modelling Across the Decades
Professor Steve Renals
Professor of Speech Technology
University of Ediburgh

Two decades ago neural networks were the hot topic in acoustic modelling for speech recognition - as they are now. Most of the key research questions in the early 1990s - adaptation, context modelling, robustness - form the research challenges that drive speech recognition research today. (Indeed, there are also close relations to research questions addressed in the early 1960s...) In this talk I'll link some of the work done to address these questions then and now, and discuss the progress that has been made, and how the work that Morgan and colleagues did twenty years ago (and more) has influenced where we are now - and where we are going.


Why Big Dumb Neural Nets Are Even Smarter Than We Thought
Brian Kingsbury
Thomas J. Watson Research Center

Surprise that large neural networks could be reliably trained, with training from different random weight initializations leading to networks having substantially the same final performance, is a recurring theme in the early literature on using big networks for speech recognition. The unfulfilled expectation behind this surprise was that it would be easy for large networks to become trapped in poor local optima, much like had been seen for smaller networks. In this talk I will review some recent results drawing on random matrix theory, the physics of spin glasses, and empirical studies of large networks, that shed light on this phenomenon.


Speech Recognition: HMM (1975-today), HMM/ANN (1985-today), HMM/DNN (2005-today), HMM/CS (2010-today), and others (2010-today). But Are we Really Making Progress?
Professor Hervé Bourlard
Director, Idiap Research Institute
Professor, EPFL

The speech recognition community has always been excellent at fully exploiting breakthroughs coming from different disciplines. Actually, the community has even often played a pioneering role in the exploitation of complex theoretical principles on a problem as hard as continuous speech recognition. While "big data" makes the buzz and still looms, although execution lags, the same community was certainly among the first ones to really understand the importance of very large datasets, and to develop efficient techniques to process and fully exploit such data, resulting in "deep understanding" of the underlying process. This progress has benefited (and still does today) to many other (sequential or not) pattern recognition problems.

After a very brief discussion along this perspective, we will try to identify where new understanding actually occurred and what/why "new" ideas can have a lasting impact on the field. From experience, it is clear that real progress can only be achieved through understanding of the full system, and "ignorance-based systems" will always quickly reach serious limitations. The tradeoff is probably somewhere in between. But, in all cases, our community should make sure to remain at the forefront of new theoretical developments.

This talk will assume basic knowledge of all the fields and acronyms (and a few more) listed in the title.


Deep vs. Wide: Questions for Continuous-Space Language Processing
Professor Mari Ostendorf
Professor of Electrical Engineering
University of Washington

Continuous-space distributional and neural network models of language are attracting a lot of attention due to success on a variety of tasks from semantic similarity to sentiment recognition to machine translation. As the field develops, many interesting parallels with robust speech recognition are emerging, including utility of bottleneck representations, multiple streams, handling of new words, etc., as well as general learning issues. How far does the analogy go, and what can we learn from the results?


Robustness, Separation, and Pitch
Professor Dan Ellis
Professor of Electrical Engineering
Columbia University

The challenge of robust speech recognition is still very much with us, and despite the great advances made possible by powerful classifiers and multi-condition training, there is still a desire to separate speech energy from other sources present in the environment. While the dream of ideal enhancement to recover an isolated voice without interference is probably too difficult and likely too stringent, successful systems for recognizing speech in mixtures have used more-or-less explicit separation. One of the strongest cues, for listeners and machines alike, is the strong and relatively slowly-varying local periodicity of the speech signal which we perceive as pitch.

I'll review some approaches to robust speech recognition with special attention to those that attempt to separate the speech energy, and talk about the role of pitch and how useful it can be in aiding recognition.


Dealing with Irrelevant Variability in Speech
Professor Hynek Hermansky
The Johns Hopkins University

Besides a message, speech carries information from a number of additional sources, which introduce irrelevant variability (noise). Unpredictable effects of noise are typically successfully dealt with by extensive training of a machine using noisy data. This is currently the technique of choice in most practical situations, which is often hard to beat. This is typically combined with various schemes for noise suppression in the recognizer front end. Some believe that recent advances in artificial neural net classifiers may alleviate the need for explicit noise suppression schemes altogether.

Taking the guidance from nature, some organisms (such as some insects) indeed do with simple sensor followed by a one-stage perceptual system. However, any more advanced organism (including humans) posses much more structured perceptual systems, designed by forces of nature to deal with effects of unwanted noises. Along these lines, we argue for the continuing need for build-in structures in feature extraction and classifiers, based on our advancing knowledge of properties of cognitive signals such as speech.


Can we Understand the HMM's Failures to Model Speech Well Enough to Replace It?
Steven Wegmann
Director of Speech Research

The underlying structures of the models that we use for speech recognition have not changed in the last 2-3 decades despite their glaring deficiencies and repeated attempts to replace them. In particular, why has the HMM, whose very strong model assumptions are clearly violated by speech data, been so hard to remove from the core of nearly all speech recognition systems? In this talk I will describe results from the diagnostic research at ICSI that is attempting to answer this question by developing a deep, quantitative understanding of the HMM's weaknesses. This overview will include recent work that is trying to understand why neural-network-based features and marginals are so successful within the HMM framework.


Morgan - A History of Collaboration in Speech Research
Horacio Franco, Elizabeth Shriberg
SRI International
Andreas Stolcke
Microsoft Research

Besides having made outstanding and well-known contributions to the technical state-of-the-art in speech processing, Morgan also has had a significant impact on the shape of the speech research community since the late 1980's. In this talk we will chart a short history of projects and programs in speech and natural language technology that Morgan has been involved in since any of us can remember, and their broader impact on the research community. Special attention will be given to the collaboration between ICSI's speech group and the Speech Technology and Research Laboratory at SRI International, and the recent joint projects with Microsoft Research.