FrameNet Brazil Works on Trilingual Tourism and Soccer Dictionary for 2014 World Cup in Brazil

Thursday, July 25, 2013

FrameNet Brazil logoResearchers working on FrameNet Brazil, a machine-readable lexicon based on the original English FrameNet housed at ICSI, are helping build a trilingual dictionary – in English, Spanish, and Portuguese – in preparation for the FIFA World Cup soccer championships, which will be held in Brazil next year. The dictionary will have an emphasis on words and phrases related to tourism and soccer. FrameNet Brazil, or FN-Br, was established in 2007 and now comprises seven researchers and more than two dozen students, from undergraduates to postdocs.

Linguists and computer scientists around the world have established framenets for several languages other than English, including Chinese, German, Hebrew, Japanese, and Swedish. The work is based on the theory of frame semantics developed by Professor Charles Fillmore and his colleagues. Linguists annotate text by hand in order to build a database of language, usable by both humans and machines, that shows words in each of their meanings and that describes the relationships between words.

According to Tiago Torrent, a FN-Br staff linguist who manages research for the project, developing a framenet for Brazilian Portuguese required creating new labels for grammatical functions and phrase types and changing some culturally specific frames, which are schematic representations of situation types that words participate in.

Framenets group words according to their frames and describes the patterns in which they combine with other words and phrases according to how frame elements are expressed. Loosely defined, frame elements are the things that are worth talking about within the frame activated by a word. For example, verbs of revenge involve a prior event, an avenger, an offender, an injured party, and a punishment.

Tiago Torrent of FN-Br

Tiago says FrameNet serves as a good foundation for a trilingual dictionary for tourists who visit Brazil for the World Cup in 2014 because it uses information about the frames evoked by words to infer the intended meaning from several possible meanings. Tiago points to the example of the Portuguese verb chegar, which can mean to arrive or to make it to. Say a tourist who does not speak Portuguese wants to know the meaning of the sentence Brasil chegou à semifinal. A traditional multilingual dictionary would only give the two different possible meanings – Brazil arrived, or made it to the semifinals. FrameNet, however, knows that the “to make it to” contains elements of a team and a specific phase of the World Cup playoffs and would infer the correct translation: Brazil made it to the semifinals.

“FrameNet allows us to provide users with the most probable meaning and the most complete meaning,” Tiago said.

The dictionary will be online and accessible by mobile devices. In addition to searching for translations of complete sentences, users can search for individual words and see a list of frames evoked. The work is in collaboration with SemanTec, a research group of the university Unisinos.

FN-Br’s PI is Margarida Salomão, a former PhD student of Fillmore’s. In 2007, when she returned to Brazil from a senior post-doc in Berkeley, she established the project at the Federal University of Juiz de For a (UFJF). FN-Br received its first funding in 2008. It is a research initiative of the graduate program at UFJF, and 17 graduate students now work in the program. Two PhD theses and four master’s dissertations have been written about the work. Tiago says this is one difference between FN-Br and ICSI’s FrameNet. “We have a lot of people writing their theses and dissertations about it,” he said. “It gives us different approaches to FrameNet in the sense that there is always a group of people dedicating a lot of their time to raising issues about FrameNet and discussing them.”

Tiago says another difference is that FN-Br tries to more fully integrate a constructicon into its database, which, like ICSI’s FrameNet, is available on the Web. In 2010, the project received funding to develop a FN-Br constructicon.

Tiago says the best way to think about a constructicon is to compare it with a lexicon, a repository of words and their meanings. Lexicons such as FrameNet are built to support applications like machine translation and automatic summary. However, computers must also be given information about the ways the words are put together as these constructions affect the meaning of text. Just as a lexicon is a repository of lexical units, so a constructicon is a repository of constructions. "Languages are more than lexical units,” Tiago says. “That’s the key motivation for building constructicons."

Tiago points to the example of the differences between how questions are asked in English and Brazilian Portuguese. In English, the auxiliary verb is put before the subject, as in “Do you want to go to the library?”; in Portuguese, it is not. In order for a machine to translate between the two, it must first be given the information that the construction will be different.

FN-Br has also applied for the opportunity to develop a dictionary for the 2016 Summer Olympics, which will also be held in Brazil.

ICSI’s FrameNet and FN-Br work closely together. Last year, FN-Br and the UFJF graduate program hosted the International School on Frame Semantics and Its Technological Applications, which was taught by Miriam Petruck, a long-time FrameNet collaborator, and FrameNet researcher Michael Ellsworth.

Visit the FN-Br Web site.