Spence Green

التكرار يعلم الحمار

I work at Lilt. In addition to computers and languages, my interests include travel, running, and scuba diving. more...

NLP Software That People Actually Use

without comments

A tired lament in NLP is that people don’t release their code, or that they release incomprehensible code, or that they wrote code in Haskell, or whatever. As models get more complicated, the burden of software engineering increases, making it hard to quickly test new ideas. It’s getting to the point where you have to invest $100k without reading a prospectus. I’ve been thinking about the good libraries that people do actually use, and why people use them. Here is the list I made (in no particular order):

  1. OpenFST — Finite-state toolkit
  2. SRILM — Language modeling
  3. CharniakBerkeley / Stanford / Bikel parsers — Statistical constituency parsing
  4. MST / MALT dependency parsers
  5. Stanford NER system — named entity recognition!
  6. LingPipe — The kitchen sink
  7. Mallet — A smaller kitchen sink
  8. GIZA++ — Word alignment
  9. Moses — Phrase-based machine translation
  10. Joshua — Hierarchical machine translation

I don’t know the histories of all of these packages. But a few conservative generalilzations are:

  • They work.
  • They don’t necessary provide “best published” performance, but they get very close.
  • Most of them started as someone’s grad school project, or at least had significant student contributions.
  • You can easily name a person associated with all of them.

The end result: a good open-source package helps other people and makes you famous. That sounds like a good bargain.


Written by Spence

February 1st, 2011 at 4:14 pm

Posted in NLP,Software

Leave a Reply

You must be logged in to post a comment.