Linux Today: Linux News On Internet Time.

More on LinuxToday

developerWorks: Parsing with the Spark Module

Jan 03, 2003, 05:30 (0 Talkback[s])
(Other stories by David Mertz)


Desktop-as-a-Service Designed for Any Cloud ? Nutanix Frame

"In this article, which follows on an earlier installment of 'Charming Python' devoted to SimpleParse, I introduce some basic concepts in parsing and discuss the Spark module. Parsing frameworks are a rich topic that warrants quite a bit of study to get the full picture; these two articles make a good start, for both readers and myself.

"In my programming life, I have frequently needed to identify parts and structures that exist inside textual documents: log files, configuration files, delimited data, and more free-form (but still semi-structured) report formats. All of these documents have their own 'little languages' for what can occur within them. The way I have programmed these informal parsing tasks has always been somewhat of a hodgepodge of custom state-machines, regular expressions, and context-driven string tests. The pattern in these programs was always, roughly, 'read a bit of text, figure out if we can make something of it, maybe read a bit more text afterwards, keep trying.'

"Parsers distill descriptions of the parts and structures in documents into concise, clear, and declarative rules identifying what makes up a document. Most formal parsers use variants on Extended Backus-Naur Form (EBNF) to describe the 'grammars' of the languages they describe. Basically, an EBNF grammar gives names to the parts one might find in a document; additionally, larger parts are frequently composed of smaller parts. The frequency and order in which small parts may occur in larger parts is specified by operators..."

Complete Story

Related Stories: