Spence Green

التكرار يعلم الحمار

I work at Lilt. In addition to computers and languages, my interests include travel, running, and scuba diving. more...

Uses of the Cross-Lingual Link Structure of Wikipedia

without comments

We recently became interested in obtaining topically-aligned data from Wikipedia via cross-lingual links. For example, we can use the document tuple (George Bush, جيورج بوش) for all sorts of things, even without sentence alignment. There have been a few recent, related papers, among them:

  1. Kevin Duh. 2011. Providing Cross-Lingual Editing Assistance to Wikipedia Users. In CICLING. 
  2. Gerard de Melo and Gerhard Weikum. 2010. Untangling the cross-lingual structure of Wikipedia. In ACL.
  3. Philipp Sorg, Philipp Cimiano. 2008. Enriching the Crosslingual Link Structure of Wikipedia – A Classification-Based Approach. In AAAI.

The consensus seems to be that cleanup is required prior to information extraction. However, this observation is language-pair specific. For English-Arabic at least, we have not noticed an improvement by applying the algorithms of [2].

Written by Spence

February 8th, 2011 at 4:56 pm

Posted in Corpora,NLP

Leave a Reply

You must be logged in to post a comment.