Linux Today: Linux News On Internet Time.
Search Linux Today
Linux News Sections:  Developer -  High Performance -  Infrastructure -  IT Management -  Security -  Storage -
Linux Today Navigation
LT Home
Contribute
Contribute
Link to Us
Linux Jobs


More on LinuxToday


developerWorks: Parsing with the Spark Module

Jan 03, 2003, 05:30 (0 Talkback[s])
(Other stories by David Mertz)

"In this article, which follows on an earlier installment of 'Charming Python' devoted to SimpleParse, I introduce some basic concepts in parsing and discuss the Spark module. Parsing frameworks are a rich topic that warrants quite a bit of study to get the full picture; these two articles make a good start, for both readers and myself.

"In my programming life, I have frequently needed to identify parts and structures that exist inside textual documents: log files, configuration files, delimited data, and more free-form (but still semi-structured) report formats. All of these documents have their own 'little languages' for what can occur within them. The way I have programmed these informal parsing tasks has always been somewhat of a hodgepodge of custom state-machines, regular expressions, and context-driven string tests. The pattern in these programs was always, roughly, 'read a bit of text, figure out if we can make something of it, maybe read a bit more text afterwards, keep trying.'

"Parsers distill descriptions of the parts and structures in documents into concise, clear, and declarative rules identifying what makes up a document. Most formal parsers use variants on Extended Backus-Naur Form (EBNF) to describe the 'grammars' of the languages they describe. Basically, an EBNF grammar gives names to the parts one might find in a document; additionally, larger parts are frequently composed of smaller parts. The frequency and order in which small parts may occur in larger parts is specified by operators..."

Complete Story

Related Stories: