Spence Green

التكرار يعلم الحمار

I work at Lilt. In addition to computers and languages, my interests include travel, running, and scuba diving. more...

Archive for the ‘NLP’ Category

Arabtex patch for the ACL and Computational Linguistics Stylesheets

without comments

I’ve recently been preparing two submissions to CL on Arabic. Not surprisingly, the CL stylesheet (clv2.cls) conflicted with Arabtex. The editors contacted Klaus Legally, who wrote a patch. Just place the file in the directory that contains your *.tex files.

The most recent ACL stylesheet also conflicts with Arabtex, but this issue can be quickly fixed by modifying the abstract environment in acl-hlt2011.sty as follows (showing a diff from the distributed stylesheet to the modified version):

< \renewenvironment{abstract}{\centerline{\large\bf...
---
> \newenvironment{abstractX}{\centerline{\large\bf... 

Written by Spence

June 1st, 2011 at 2:31 pm

Posted in Arabic,NLP

Using Humans to Translate the Web: Duolingo

without comments

Written by Spence

May 5th, 2011 at 7:34 am

Michael Jordan on the Top Open Problems in Bayesian Statistics

without comments

A colleague sent along the March 2011 issue of the ISBA Bulletin in which Michael Jordan lists the top five open problems in Bayesian statistics. The doyen at the intersection of ML/statistics/Bayesian stuff polled his peers and came up with this list:

  1. Model selection and hypothesis selection: Given data, how do we select from a set of potential models? How can we be certain that our selection was correct?
  2. Computation and statistics: When MCMC is too slow/infeasible, what do we do? More importantly: “Several respondents asked for a more thorough integration of computational science and statistical science, noting that the set of inferences that one can reach in any given situation are jointly a function of the model, the prior, the data and the computational resources, and wishing for more explicit management of the tradeoffs among these quantities.”
  3. Bayesian/frequentist relationships: This is getting at the situation for high dimensional models when a subjective prior is hard to specify, and a simple prior is misleading. Can we then “…give up some Bayesian coherence in return for some of the advantages of the frequentist paradigm, including simplicity of implementation and computational tractability”?
  4. Priors: No surprise here. One resondent had a fascinating comment: when we want to model data that arises from human behavior and human beliefs, then we would expect/desire effects on both the prior and likelihood. Then what do we do?
  5. Nonparametrics and semi-parametrics: What are the classes of problems for which NP Bayes methods are appropriate/”worth the trouble”? In NLP, clustering (i.e., using DP priors) is certainly an area in which nonparametrics have been successful.

Written by Spence

May 4th, 2011 at 12:54 pm