Profile: Roberto Pieraccini

Monday, April 23, 2012

Many notable computer scientists work at ICSI, and in recognition of their contributions to computer science, we will profile some of our current scientists and visitors, as well as notable alumni, here on the blog. We begin with Roberto Pieraccini, ICSI’s new director as of January 15, 2012. A speech scientist best known for applying statistical methods to spoken language understanding and dialog, Roberto has 30 years of experience in research and development at corporate laboratories and technology companies in the U.S. and Italy.

Roberto was born in 1955 and grew up in Viareggio, a small Tuscan town on the northwestern coast of Italy. When he was a child, his lifelong interest in advanced technology was sparked by science fiction movies such as 2001: A Space Odyssey, whichfeatures Hal 9000, a computer capable of holding conversations with humans, and by the Apollo 11 mission, which landed the first human on the moon. “I spent two days in front of the television,” Roberto said. “I saw NASA’s control room with all these lights and buttons and dials and I thought, ‘I want to do something like that.’” Fascinated with technology and electronics, he would take old televisions apart and build gadgets in his bedroom.

Recognition, Understanding, Dialog

After receiving his doctorate in electrical engineering from the Universita’ degli Studi di Pisa, he took a research position in the speech department of the Italian telephone company ‘s laboratories, Centro Studi E Laboratori Telecomunicazioni (CSELT), in Torino, where he built his first rudimentary “speech understanding machines.”

In 1988, he joined Bell Labs in Murray Hill, New Jersey for a one-year appointment under Larry Rabiner, who has produced much of the seminal work in speech research. After a brief return to Italy, Bell Labs offered Roberto a permanent position, and he decided to cross the Atlantic Ocean for good.

At Bell Labs, he continued to work on speech recognition, but was also interested by the problem of language understanding. In 1994 he and Esther Levin, another Bell Labs researcher, built a system that received the best scores at the DARPA evaluation of language understanding systems within the Air Travel Information Systems program. It was the first application of statistical methods (similar to those used in speech recognition) to the problem of language understanding, now considered a mainstream method even in commercial applications.

By the late 1990s, Roberto began to focus on dialog machines, which, besides understanding what a human says, also make decisions on how to interact with humans in order to reach a goal. In 2000, Esther and he again made a breakthrough when they built a system capable of learning to interact with humans in a dialog by giving it positive or negative reinforcement signals, using what is known as “reinforcement learning theory.” That initial experiment gave rise to a new discipline, statistical dialog modeling, which is now followed by several research labs around the world.

Bridging the Research-Industry Chasm

In 1999, he left the research world to become the director of dialog technology at SpeechWorks, which later became Nuance. There, he built systems in a commercial, rather than research-oriented, environment. “It was an extremely useful experience,” Roberto said. “Your success in research depends not only on how you solve a problem, but also on the relevance of the problems you choose to solve. Working in industry showed me the importance of choosing problems carefully and building things that ‘work.’”

Roberto attributes the “research-industry chasm” to the fact that different the two communities are, in most cases, disjoint. “Research and industry people don’t hang out,” he said.

Roberto returned to research in 2003, taking a position at the IBM T.J.Watson Research Center in Yorktown Heights, New York. In addition to speech understanding and dialog, he also worked on Web search in different languages, such as Arabic and Farsi.

In 2005, he became the chief technology officer of SpeechCycle, a company that produced sophisticated dialog systems for telephone services and, later, smartphones; in 2012, he joined ICSI as our director.

Speech Research: History and Future

In March, MIT Press released Roberto’s book, The Voice in the Machine, about the history of speech technology, both as a research field and as an industry. “People have a fascination with machines that can speak with humans,” he said. Scientists have been constructing machines that could speak as early as 1790, but the first moderately successful system that could “understand” speech was Audrey, built in the 1950s. Speech recognition research continued through the 1960s, which saw the introduction of digital computers and advances in artificial intelligence. Then, in the 1970s, researchers began applying statistical methods to the problem.

Roberto says it is difficult to build computers that understand speech as well as humans can because speech and human language are extremely complex. “We will be happy when we will be able to build machines that can make sense of speech and language the same way humans do,” he said. “But speech recognition does not work that well yet.” He believes the major challenge facing speech research is to improve features so that systems can deal, for example, with heavy accents and audio of poor quality.

Roberto believes, however, that the future of speech technology is not necessarily an interactive, anthropomorphic computer like Hal 9000. For example, speech recognition is used to search for particular words or sentence in audio and video clips on the Web, or finding “who spoke when” in a given audio track. “The future may not be what it used to be,” he said.