ICSI Speech Technology Improves Accessibility of IT in Tamil Nadu, India

"I live in a village six kilometers from here in the hills. I came here [Arthoor] to register my boy for school next year. Afterwards, I planned to go to Dindigul [another twenty kilometers away] to get information on plant disease and treatment. [...] My banana crops and all of the banana crops in my village are affected by a disease. [...] Now that I have met you and used this system, I am satisfied. I think that my crops are affected with nematodes or bore weevils. I will go home now and try the recommended treatments."

A banana farmer, who had never attended school but had taught himself to read, was one of several villagers to try a speech-driven dialog system built by ICSI researchers specifically for the needs and conditions of people living in developing regions like Tamil Nadu, India.

ICSI researchers Dr. Madelaine Plauché and Joyojeet Pal, along with Divya Ramachandran and Richard Carlson, were awarded third prize in the CITRIS white paper competition for their proposed work on Simple, Scalable Speech technologies to improve access to Information Technologies (IT) in developing regions. The project, supervised at ICSI by Dr. Chuck Wooters, is part of the UC Berkeley TIER (Technology and Infrastructure for Emerging Regions) project.

Last month in rural Tamil Nadu, where illiteracy rates range from 50% for men to 80% for women, a collaboration between the ICSI researchers and the staff of M.S. Swaminathan Research Foundation (MSSRF) in Sempatti resulted in the rapid creation and deployment of a low-cost, speech- and touch-screen driven application that enables villagers of all literacy levels to access existing written information.

Plauché, a linguist, and Carlson, a software developer, traveled to Tamil Nadu to meet with Udhaykumar N., a computer science student at Amrita University in Coimbatore and local experts in agriculture, horticulture, and rural development at the MSSRF village resource center in Sempatti. After three weeks of collaborative design sessions, the team converted text from the MSSRF website to a user-friendly interface that provides recommended agricultural practices, pest protection, and yields for local varieties of banana crops, in the form of pre-recorded Tamil and digital pictures. The Banana Crop interface is based on automatic speech recognition (ASR) technology originally developed at ICSI, which was then customized based on the needs of local banana farmers in the Sempatti area.

The multi-modal application was quickly adopted by men and women of varying degrees of education and familiarity with technology. They either spoke single-word commmands in Tamil (i.e. "pests" or "intercropping") or pressed buttons to navigate the visual user interface. The ASR technology accurately recognized speech input despite variations in dialect, environment, and background noise. Young children were especially adept at operating the system, and many people expressed great pride at hearing a computer speak in their dialect about content relevant to their day to day lives. Feedback from the villagers who used the system indicated a strong desire to have information on other crops and other topics in this form. In additon to relying on speech cues and easily recognizable images for ease of navigation, the application conveniently operates via either telephone or PC.

Speech technologies that offer easy access to relevant, up-to-date information are ideally suited for remote regions of the world and regions with high illiteracy rates. According to Dr. Plauché, "access to local, relevant information is extremely valuable for effective short-term and long-term decision making. By creating simple, easy-to-use speech tools, we hope to allow communities with a need for greater access [to this information] to make their own interactive applications."

With that goal in mind, Plauché, Carlson, and Udhaykumar are currently developing Open Sesame, an open source toolkit which will allow non expert speakers of any dialect to convert local language text into accessible, multi-modal applications using Text-to-Speech, Automatic Speech Recognition (ASR), and custom localization tools. The application is built with open-source software, rather than proprietary Windows software. The researchers believe open-source software is better suited to developing regions because it is free, and more importantly, easily customizable. MSSRF plans to install a version of the speech application in 100 community village centers throughout Tamil Nadu, Mahrashtra, Andhra Pradesh, Rajasthan, and Orissa in the next three years. The new version will support three additional languages (Marathi, Hindi, and Oriya) and will greatly improve rural access to information on additional crops as well as topics such as animal husbandry, disaster preparedness, how to start a self-help group, local education and employment opportunites, and basic health and sanitation.