Linux Today: Linux News On Internet Time.

Martin Vermeer -- Follow the Scientists!

Mar 17, 1999, 17:22 (12 Talkback[s])
(Other stories by Martin Vermeer)

by Martin Vermeer LyX

Professor Vermeer argues against the notion that word processing must be forever a species of finger painting voodoo and the province of proprietary software developers. His principle counter example is the TEX, LATEX, LYX family of applications for structured document layout.

A funny place, this world we live in. Everybody sends me MS Word attachments and assumes I can read them. I've given up preaching and just use Star Office. No luck this time though, not even with genuine Word, until the sender exported to Word 6.0 format to match my dusty old Windows partition. As you guessed, the document contained only text.

The world is massively wasting resources attempting to exchange documents in multiple incompatible closed formats, even from different versions of the same software. Human adaptability as a curse. Why this unquestioning acceptance of document exchange as black magic?

Compare this to the Internet; there things are so much better. HTML is the standard language for Web pages; MIME the standard for email; and so on. Of course, it all works well: Scientists designed it for their own use!

Little could Tim Berners-Lee foresee that some day the turnover of e-commerce would dwarf funds budgeted for high-energy physics research, when at CERN, Geneva, he thought up the World Wide Web. He just followed his scientific instincts for building something that works for the community: a file format based on a pre-existing specification, SGML (Standard General Mark-up Language); and a text-based client-server protocol over TCP/IP. Open standards, text based, legible, editable, portable. And doing what it should: allowing timely worldwide publication of research results in multimedia form.

HTML has limitations. It allows too easy visual-only mark-up to make pages just look good without concern for the document's logical structure. Therefore, we see developments that separate document creation and document layout style definition, like Cascading Style Sheets and XML (Extensible Mark-up Language).

A lesser known, similar chain of events happened a decade before the www. Don Knuth of Stanford designed TEX, a computer typesetting language that garnered fame for the visual quality of documents produced with it. Then Leslie Lamport (DEC) based a macro library, LATEX, on it, to address the same problem as with HTML: TEX is too visual a language. A typesetting engine, not a document processor. With LATEX, you can code and structure your document without concern for the final layout; then you combine it with a document definition or class file, and out comes the printed version. Guaranteed pixel identical everywhere -- no publisher complaining about a nine page doc when you sent him only eight pages.

The importance of "structured document authoring" cannot be overstated. LATEX encourages this practice. While most modern word processors support it through "style sheets", too few people use it. A colleague, a gifted professional, sent me a Word document where section headers had been produced by "finger paint": twice RETURN, section number, header text, painted bold, and twice RETURN again... of course they didn't show up in the table of contents, which had to be finger painted too... there is much education needed here. But how to make people learn when software appears so easy as to make learning superfluous?

For years, LATEX has been the Lingua Franca of scientific document exchange. Journals accepted it. As a plain text mark-up language like HTML, it is robust, portable, "hackable" and compatible across platforms and versions. Scientists are used to the programming-like activity of writing mark-up code. However, general users are not. No problem in principle -- LATEX could be for scientists only. But a colleague's experience tells me that the Huns are at the gate. He got his article delayed by a year because a scientific journal couldn't handle his LATEX manuscript! So, if we want the open, structured document tradition to survive, we'd better make it accessible for non-scientists also.

Fortunately, an excellent initiative along these lines, LYX, has been ongoing for three years now. Most of the developers are scientists. LYX should be seen to be believed, especially the equation editor. If you thought MS Word's equation editor was good -- think again. The rest of this visual document processor is of similar excellence. Nearly everything in LATEX is supported now: sectioning headers, figures, tables, live links to numbered objects; even BibTEX, the great bibliography database manager by Oren Patashnik (Stanford). On-screen, the text looks roughly like on paper, including sectioning, formulas, tables, graphics etc., a bit like Word's "normal" mode. "View mode" means clicking the xdvi or GhostScript renderer. Even "outline mode" exists, the table-of-contents window containing live sectioning links.

With LYX, the pain of writing LATEX mark-up is no more. Also, LYX is ready for the Internet age with its SGML (i.e. XML) DocBook export (only experimental for now). What is still weak, and needed to make LYX competitive, is support for easy setting up of document type definitions. Currently only basic, document level properties are configurable, plus, surprisingly, the choice of bullet list symbols. A nice selection of classes is available: the base classes, those of the American Mathematical Society, the beautiful Komascript classes, and more. It all has a scientific slant. Besides business letters, there is one nonscientific document format -- for film/movie scripts. If you want nontrivial deviations from these provided classes, you may browse for a suitable option or style file on the Internet. But at some point you'll find yourself grinding out LATEX code. For example, if you want to change the indent of bullet lists, you'll be embedding ERT (Evil Red Text, LYX geekspeak for raw LATEX) into your doc. A no-no for dummy users and a sales argument for commercial word processors.

It has been argued that the open source development model is suitable only for producing infrastructural, "plumbing" type software, not end user friendly, high-useability applications. If the validity of this argument interests you, you should closely follow LYX, a great software that has been developing at a spectacular pace. This could be a white raven disproving the assumption. And again, scientists doing it!

Martin Vermeer mv@liisa.pp.fi

Martin Vermeer is a research professor and department head at the Finnish Geodetic Institute, as well as "docent" (probably something like assistant professor) at Helsinki University, Department of Geophysics. He uses Linux both at work and at home.

File translated from TEX by TTH, version 1.55.