Linux Today: Linux News On Internet Time.

More on LinuxToday

developerWorks: Charming Python: The Natural Language Toolkit

Jun 28, 2004, 03:30 (0 Talkback[s])
(Other stories by David Mertz)


Desktop-as-a-Service Designed for Any Cloud ? Nutanix Frame

"Your humble writer knows a little bit about a lot of things; but despite writing a fair amount about text processing (a book, for example), linguistic processing is a relatively novel area for me. Forgive me if I stumble through my explanations of the quite remarkable Natural Language Toolkit (NLTK), a wonderful tool for teaching, and working in, computational linguistics using Python. Computational linguistics, moreover, is closely related to the fields of artificial intelligence, language/speech recognition, translation, and grammar checking.

"It is natural to think of NLTK as a stacked series of layers that build on each other. Readers familiar with lexing and parsing of artificial languages (like, say, Python) will not have too much of a leap to understand the similar -- but deeper -- layers involved in natural language modeling. While NLTK comes with a number of corpora that have been pre-processed (often manually) to various degrees, conceptually each layer relies on the processing in the adjacent lower layer. Tokenization comes first; then words are tagged; then groups of words are parsed into grammatical elements, like noun phrases or sentences (according to one of several techniques, each with advantages and drawbacks); finally sentences or other grammatical units can be classified..."

Complete Story

Related Stories: