Publication Details
Title: Taxonomic Data Integration from Multilingual Wikipedia Editions
Author: G. de Melo and G. Weikum
Group: AI
Date: 2013
PDF: [Not available online]
Overview:
Information systems are increasingly making use of taxonomic knowledge about words and entities. A taxonomic knowledge base may reveal that the Lago di Garda is a lake and that lakes as well as ponds, reservoirs, and marshes are all bodies of water. As the number of available taxonomic knowledge sources grows, there is a need for techniques to integrate such data into combined, unified taxonomies. In particular, the Wikipedia encyclopedia has been used by a number of projects, but its multilingual nature has largely been neglected. This paper investigates how entities from all editions of Wikipedia as well as WordNet can be integrated into a single coherent taxonomic class hierarchy. We rely on linking heuristics to discover potential taxonomic relationships, graph partitioning to form consistent equivalence classes of entities, and a Markov chain-based ranking approach to construct the final taxonomy. This results in MENTA (Multilingual Entity Taxonomy), a resource that describes 5.4 million entities and is one of the largest multilingual lexical knowledge bases currently available.
Acknowledgements:
This work was partially funded by the Deutscher Akademischer Austausch Dienst (DAAD) through a postdoctoral fellowship.
Bibliographic Information:
To appear in Knowledge and Information Systems
Bibliographic Reference:
G. de Melo and G. Weikum. Taxonomic Data Integration from Multilingual Wikipedia Editions. To appear in Knowledge and Information Systems, 2013
Author: G. de Melo and G. Weikum
Group: AI
Date: 2013
PDF: [Not available online]
Overview:
Information systems are increasingly making use of taxonomic knowledge about words and entities. A taxonomic knowledge base may reveal that the Lago di Garda is a lake and that lakes as well as ponds, reservoirs, and marshes are all bodies of water. As the number of available taxonomic knowledge sources grows, there is a need for techniques to integrate such data into combined, unified taxonomies. In particular, the Wikipedia encyclopedia has been used by a number of projects, but its multilingual nature has largely been neglected. This paper investigates how entities from all editions of Wikipedia as well as WordNet can be integrated into a single coherent taxonomic class hierarchy. We rely on linking heuristics to discover potential taxonomic relationships, graph partitioning to form consistent equivalence classes of entities, and a Markov chain-based ranking approach to construct the final taxonomy. This results in MENTA (Multilingual Entity Taxonomy), a resource that describes 5.4 million entities and is one of the largest multilingual lexical knowledge bases currently available.
Acknowledgements:
This work was partially funded by the Deutscher Akademischer Austausch Dienst (DAAD) through a postdoctoral fellowship.
Bibliographic Information:
To appear in Knowledge and Information Systems
Bibliographic Reference:
G. de Melo and G. Weikum. Taxonomic Data Integration from Multilingual Wikipedia Editions. To appear in Knowledge and Information Systems, 2013
