Publication Details
Title: Unsupervised Translation Sense Clustering
Author: M. Bansal, J. DeNero, and D. Lin
Group: AI
Date: June 2012
PDF: https://www.icsi.berkeley.edu/pubs/ai/unsupervisedtranslation12.pdf
Overview:
We propose an unsupervised method for clustering the translations of a word, such that the translations in each cluster share a common semantic sense. Words are assigned to clusters based on their usage distribution in large monolingual and parallel corpora using the softK-Means algorithm. In addition to describing our approach, we formalize the task of translation sense clustering and describe a procedure that leverages WordNet for evaluation. By comparing our induced clusters to reference clusters generated from WordNet, we demonstrate that our method effectively identifies sense-based translation clusters and benefits from both monolingual and parallel corpora. Finally, we describe a method for annotating clusters with usage examples.
Bibliographic Information:
Proceedings of the North American Chapter of the Association for Computational Linguistics Human Language Technologies Conference (NAACL HLT 2010), Montreal, Canada, pp. 773-782
Bibliographic Reference:
M. Bansal, J. DeNero, and D. Lin. Unsupervised Translation Sense Clustering. Proceedings of the North American Chapter of the Association for Computational Linguistics Human Language Technologies Conference (NAACL HLT 2010), Montreal, Canada, pp. 773-782, June 2012
Author: M. Bansal, J. DeNero, and D. Lin
Group: AI
Date: June 2012
PDF: https://www.icsi.berkeley.edu/pubs/ai/unsupervisedtranslation12.pdf
Overview:
We propose an unsupervised method for clustering the translations of a word, such that the translations in each cluster share a common semantic sense. Words are assigned to clusters based on their usage distribution in large monolingual and parallel corpora using the softK-Means algorithm. In addition to describing our approach, we formalize the task of translation sense clustering and describe a procedure that leverages WordNet for evaluation. By comparing our induced clusters to reference clusters generated from WordNet, we demonstrate that our method effectively identifies sense-based translation clusters and benefits from both monolingual and parallel corpora. Finally, we describe a method for annotating clusters with usage examples.
Bibliographic Information:
Proceedings of the North American Chapter of the Association for Computational Linguistics Human Language Technologies Conference (NAACL HLT 2010), Montreal, Canada, pp. 773-782
Bibliographic Reference:
M. Bansal, J. DeNero, and D. Lin. Unsupervised Translation Sense Clustering. Proceedings of the North American Chapter of the Association for Computational Linguistics Human Language Technologies Conference (NAACL HLT 2010), Montreal, Canada, pp. 773-782, June 2012
