Spence Green

التكرار يعلم الحمار

I work at Lilt. In addition to computers and languages, my interests include travel, running, and scuba diving. more...

Archive for the ‘NLP’ Category

Taxonomy of MT Systems

without comments

How useful! I found this taxonomy buried in the Moses documentation:

  • hierarchical phrase-based: no linguistic syntax
  • string-to-tree: linguistic syntax only in output language
  • tree-to-string: linguistic syntax only in input language
  • tree-to-tree: linguistic syntax in both languages
  • target-syntactified: linguistic syntax only in output language
  • syntax-augmented: linguistic syntax only in output language
  • syntax-directed: linguistic syntax only in input language
  • syntax-based: unclear, we use it for models that have any linguistic syntax

 

Written by Spence

February 23rd, 2011 at 5:03 pm

2011 Arabic Linguistics Symposium Programme Posted

with 2 comments

The 2011 ALS starts on 4 March. I’ve always wanted to attend this conference, and have fooled myself into believing that I might submit to it one day. Reading through the bound proceedings of ALS counts as one of my earlier grad school memories.

Written by Spence

February 23rd, 2011 at 5:01 pm

Uses of the Cross-Lingual Link Structure of Wikipedia

without comments

We recently became interested in obtaining topically-aligned data from Wikipedia via cross-lingual links. For example, we can use the document tuple (George Bush, جيورج بوش) for all sorts of things, even without sentence alignment. There have been a few recent, related papers, among them:

  1. Kevin Duh. 2011. Providing Cross-Lingual Editing Assistance to Wikipedia Users. In CICLING. 
  2. Gerard de Melo and Gerhard Weikum. 2010. Untangling the cross-lingual structure of Wikipedia. In ACL.
  3. Philipp Sorg, Philipp Cimiano. 2008. Enriching the Crosslingual Link Structure of Wikipedia – A Classification-Based Approach. In AAAI.

The consensus seems to be that cleanup is required prior to information extraction. However, this observation is language-pair specific. For English-Arabic at least, we have not noticed an improvement by applying the algorithms of [2].


Written by Spence

February 8th, 2011 at 4:56 pm

Posted in Corpora,NLP