Archive for the ‘Software’ Category
NLP Software That People Actually Use
A tired lament in NLP is that people don’t release their code, or that they release incomprehensible code, or that they wrote code in Haskell, or whatever. As models get more complicated, the burden of software engineering increases, making it hard to quickly test new ideas. It’s getting to the point where you have to invest $100k without reading a prospectus. I’ve been thinking about the good libraries that people do actually use, and why people use them. Here is the list I made (in no particular order):
- OpenFST — Finite-state toolkit
- SRILM — Language modeling
- Charniak / Berkeley / Stanford / Bikel parsers — Statistical constituency parsing
- MST / MALT dependency parsers
- Stanford NER system — named entity recognition!
- LingPipe — The kitchen sink
- Mallet — A smaller kitchen sink
- GIZA++ — Word alignment
- Moses — Phrase-based machine translation
- Joshua — Hierarchical machine translation
I don’t know the histories of all of these packages. But a few conservative generalilzations are:
- They work.
- They don’t necessary provide “best published” performance, but they get very close.
- Most of them started as someone’s grad school project, or at least had significant student contributions.
- You can easily name a person associated with all of them.
The end result: a good open-source package helps other people and makes you famous. That sounds like a good bargain.
Programmers and Scientists
A distinction should be made between Computer Science and computer programming that is more substantial than orthographic convention. Some might stop at the observation that the first is an academic discipline while the other is a vocation, hence the two conventions. Indeed, electricians do not study signal theory, and electrical engineers avoid cable installation. The same cannot be said about our discipline: in school, and at work, we spend most of our time contemplating software. If that is so, then perhaps software quality should be the metric. Does it follow that highly-trained computer scientists should produce better code? In fact, the opposite is almost always true: research software is often disorganized and unstable, more like a bicycle made from sticks and glue than a polished instrument. Software practitioners use this fact to arrive at an equally errant conclusion: a good programmer is a scientist (Jeff Atwood’s recent feature on NP-completeness illustrates this fallacy). For every parry from academia, there is a riposte from industry. In the end, neither man leaves the field unharmed.
The Brave New World, Briefly Revisited
The Toyota Production System (TPS) was the progenitor for a variety of change-oriented manufacturing techniques. Six-sigma, Lean, and other such constructs trace their heritage to TPS. Because Agile methodologies were influenced by “lean” thinking and an abhorrence of “Big M” processes, they too have eastern roots. For me, the allure of Agile methods, regardless of flavor, has always been the recognition of software as a human act: Programmers are not automata on an assembly-line tacking trunk lids to mechanical foetuses. Incidentally, the Japanese reached the same conclusion decades ago, as described by Teruyuki Minoura, a Toyota executive:
An environment where people have to think brings with it wisdom, and this wisdom brings with it kaizen (continuous improvement),” notes Minoura. “If asked to produce only one unit at a time, to produce according to the flow, a typical line worker is likely to be flummoxed. It’s a basic characteristic of human beings that they develop wisdom from being put under pressure. Perhaps the greatest strength of the Toyota Production System is the way it develops people.
There can be no successful monozukuri (making thing) without hito-zukuri (making people). To keep coming up with revolutionary new production techniques, we need to develop unique ideas and knowledge by thinking about problems in terms of genchi genbutsu. This means it’s necessary to think about how we can develop people who can come up with these ideas. As our operations become increasingly global, there’s also a need to think how to implant the Toyota DNA in our overseas personnel.