Uses of the Cross-Lingual Link Structure of Wikipedia
We recently became interested in obtaining topically-aligned data from Wikipedia via cross-lingual links. For example, we can use the document tuple (George Bush, جيورج بوش) for all sorts of things, even without sentence alignment. There have been a few recent, related papers, among them:
- Kevin Duh. 2011. Providing Cross-Lingual Editing Assistance to Wikipedia Users. In CICLING.
- Gerard de Melo and Gerhard Weikum. 2010. Untangling the cross-lingual structure of Wikipedia. In ACL.
- Philipp Sorg, Philipp Cimiano. 2008. Enriching the Crosslingual Link Structure of Wikipedia – A Classification-Based Approach. In AAAI.
The consensus seems to be that cleanup is required prior to information extraction. However, this observation is language-pair specific. For English-Arabic at least, we have not noticed an improvement by applying the algorithms of [2].
Leave a Reply
You must be logged in to post a comment.