## Archive for the ‘Machine Learning’ Category

## Entity Clustering Across Languages

I posted the final version of our NAACL 2012 paper on entity clustering across languages. The idea is to identify text mentions to entities in the world (e.g., people, places, and things) in multiple languages. We’ve tried to stay away from language-specific feature engineering. Moreover, we only used training resources that are abundant for many languages. For example, we make use of the topical structure of Wikipedia in which a single topic (e.g., “Steve Jobs”) is discussed in many languages. Our techniques should be especially useful for low-resource languages.

We wrote the first few lines of code for this project in June 2010. The code for this paper now exceeds 20k lines and I shudder to think of the rolling brownouts our experiments have likely caused throughout Maryland. We’ve burned up the COE computing cluster. I am thankful that this work will finally receive a public hearing.

Names are fascinating objects. They tend to originate in one language, usually the language of the culture in which the entity originates. Then the name spreads. Nicknames and aliases develop. New variants arise in other languages and writing systems. We’ve started to think of this phenomenon as a phylogenetic process, much like the proliferation of linguistic cognates or even bird species. Nick and Jason have been developing a model for this process, an initial version of which they recently presented at the NIPS NP Bayes workshop.

## Michael Jordan on the Top Open Problems in Bayesian Statistics

A colleague sent along the March 2011 issue of the ISBA Bulletin in which Michael Jordan lists the top five open problems in Bayesian statistics. The doyen at the intersection of ML/statistics/Bayesian stuff polled his peers and came up with this list:

**Model selection and hypothesis selection**: Given data, how do we select from a set of potential models? How can we be certain that our selection was correct?**Computation and statistics**: When MCMC is too slow/infeasible, what do we do? More importantly: “Several respondents asked for a more thorough integration of computational science and statistical science, noting that the set of inferences that one can reach in any given situation are jointly a function of the model, the prior, the data and the computational resources, and wishing for more explicit management of the tradeoffs among these quantities.”**Bayesian/frequentist relationships**: This is getting at the situation for high dimensional models when a subjective prior is hard to specify, and a simple prior is misleading. Can we then “…give up some Bayesian coherence in return for some of the advantages of the frequentist paradigm, including simplicity of implementation and computational tractability”?**Priors**: No surprise here. One resondent had a fascinating comment: when we want to model data that arises from human behavior*and*human beliefs, then we would expect/desire effects on both the prior and likelihood. Then what do we do?**Nonparametrics and semi-parametrics**: What are the classes of problems for which NP Bayes methods are appropriate/”worth the trouble”? In NLP, clustering (i.e., using DP priors) is certainly an area in which nonparametrics have been successful.

## Reading Up On Bayesian Methods

For the next few months I’ve decided to focus on semi-supervised learning in a Bayesian setting. At Johns Hopkins last summer I was introduced to “fancy generative models,” i.e. various flavors of Dirichlet Process, but I was slow on the uptake. Now I’m trying to catch-up. Here are some helpful reading lists:

- Tom Griffiths’ Reading list on Bayesian methods
- Sharon Goldwater’s Bayesian language modeling reading list — Has a higher proportion of application references.
- Yee Whye Teh and Frank Wood have both posted some excellent tutorials and example code.

In addition to a thorough understanding of MCMC–which is relatively simple–it’s also important to at least have an awareness of variational methods, which is relatively hard. Jason Eisner recently wrote a high-level introduction to variational inference that is a soft(er) encounter with the subject than the canonical reference:

M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul.* *An introduction to variational methods for graphical models. Machine Learning, 1999.

Where will this lead? It is argued that the Bayesian framework offers a more appealing cognitive model. That may be. What interests me is the pairing of Bayesian updating with data collection from the web. Philip Resnik recently covered efforts to translate voicemails during the revolution in Egypt as one method of re-connecting that country with the world. This data is clearly useful, but it what is unclear is how to use it to retrain standard (e.g., frequentist) probabilistic NLP models. Cache models, at least in principle, offer an alternative.