dcsimg
Linux Today: Linux News On Internet Time.




More on LinuxToday


LinuxNewbie.org: Text Processing Pipelines NHF

Jul 25, 2000, 12:39 (1 Talkback[s])
(Other stories by Adrian J. Chung)

[ Thanks to Sensei for this link. ]

"Sure the command line is evil, but mastering it will unlock the powers of a Unix box that remain unrealized under modern graphical user interfaces. This article details the construction of text processing pipelines, using ordinary GNU utilities, to accomplish a few fairly challenging tasks."

"Suppose that for whatever reason, one is interested in the word usage of a piece of text, perhaps from an article such as this one, downloaded from the Net. One might want to know what word is most frequently used while ignoring all the non-words, like variable names in source code or other random bits of junk. Perhaps a ranking of word frequency is required. Should one resort to writing a special word counting program in Perl? Here's how to do it using a few GNU text utilities and the assistance of that great resource /usr/dict/words."

"First we begin by breaking up the sentences in the text file so that there is no more than one word per line. The "tr" tool is useful here. This tool translates files one character at a time. For example here is the essential USENET tool rot13 using "tr"

Complete Story

Related Stories: