Linux Today: Linux News On Internet Time.

Grep the Web

Sep 02, 1999, 15:31 (4 Talkback[s])
(Other stories by Martin Vermeer)

[ The opinions expressed by authors on Linux Today are their own. They speak only for themselves and not for Linux Today. ]

By Martin Vermeer

As routined users of regular expressions for finding stuff on our hard disk or lines within our source files, we tend to forget that searching and selecting stuff like this is far from a trivial activity. Regular expressions are a programming language; a simple one, but a programming paradigm nevertheless. As in all true programming languages, there is a non-trivial relationship between code written and results obtained; a relationship that will only open up to the user after an extended period of exercise by trial and error.

Most ordinary computer users never see regular expressions. In fact, searching for generic filenames in an Explorer-type applet will likely baffle them, and files the names of which are not exactly remembered will only be unearthed by the needle-in-haystack approach. Now, however, with the exploding popularity of the World Wide Web, one kind of regular expression is both available to, and potentially extremely useful for, the average, non-computer-knowledgable user. The search expressions that the popular search engines accept can be seen as a very simple, and somewhat atypical, kind of regular expressions.

Even using these very simple regular expressions requires learning by doing, though the people designing these search engines do everything in their power to make them as easy as possible to use. As I have noticed, most non-programming-capable users simply never learn this well, and therefore fail to ever realize for themselves the potential of the Web as a source -- the most versatile and complete source in history -- of information on any subject you can possibly imagine.

There is a touching testimonial in The AltaVista Story (Osborne McGraw-Hill, ISBN 0-07-882435-4) which underlines this point. Annie Warren of Digital Equipment explains how she showed a friend with an active interest in genealogy how AltaVista could be harnessed to digging out relevant information; and, as she recounts about her friend, "all of a sudden, she needed a PC and an account with an Internet Service Provider." I too can testify from experience how powerfully rewarding it can be to show someone -- not just how to solve one problem, but the basic skills needed to solve a whole class of problems over and over again.

I venture that one effective form of free software advocacy is teaching the effective mining of the World Wide Web as a knowledge base for IT support. Even among computing professionals, the potential of "grepping the Web" (in a manner of speaking) is substantially underestimated. A colleague of mine, a computer professional with over a decade of experience on several operating systems, got stuck trying to change passwords on the Digital Unix Alpha mainframe of our institute. The machine claimed that the password file was "in use", as if it were somehow locked, the way a device like a modem can be if it is reserved by another user.

When told of this situation, I volunteered that there was probably a lock file somewhere that was erroneously left over from an aborted operation and that would have to be manually removed. As to name and location of this lock file, I had nothing useful to offer and nothing was found in the obvious places either. I suggested therefore that he paint the exact error message (in quotes) into the AltaVista search window, well realizing that chances of success were pretty minimal. After all, what may work for Linux is unlikely to work for an OS that you would be very unlikely to install on your home computer, has a limited free software tradition (well, there is of course DECUS) and an installed base that is several orders of magnitude smaller.

Still, several documents came up and the first one contained detailed instructions including the identity of the lock file to be removed. Had my advice not worked, we presumably would have used the vendor support, paying good money for it -- for a piece of knowledge freely available Out There!

This is a rather trivial example not even involving a true regular-expression search. I have found AltaVista invaluable for this kind of thing. Often I remember vaguely that something exists, and by craftily combining "plus" conditions that have to be met simultaneously, and using the "minus" prefix to exclude categories of irrelevant stuff coming up, I usually manage to find what I'm after. Google isn't bad either; its syntax assumes that conditions are always to be "anded" together, usually a realistic assumption.

The existing search engines are still notoriously inefficient, especially in inexperienced hands, and even the best of them have indexed only some 20% of the Web's content. There is ample scope for improvement, an issue that the people operating Google appear to be trying to address.

Recently I saw someone complain -- apropos of something entirely different -- that there were no Linux drivers for the IEEE-488 (GPIB) bus or tcl/tk tools for use with it in a laboratory environment. This perceived deficiency was what kept him locked into Visual Basic. Now I had read in Linux Journal about the Linux Lab project and remembered seeing something along these lines; a quick search and I could help set a Linux lover free.

Examples abound.

Here we have a tool to debunk the myth of the need for vendor support. Teach people to fish and they will be no more hungry; teach them regular expressions and they will be helpless nevermore. The glittering prize of empowerment, a literacy tool just for the taking by anyone with the patience to teach it.

"It's an enchanted world, Hobbes!"

Martin Vermeer is a research professor and department head at the Finnish Geodetic Institute, as well as "docent" at Helsinki University, Department of Geophysics. He uses Linux both at work and at home.