OO Approach to file parsing with Python?

Jon McLin jon_mclin at attglobal.net
Tue Jun 27 00:10:55 EDT 2000


Our company has decided to try an initial deployment of Python in a file-parsing
role.  We need to translate some customer-provided structured ASCII files into
our standard schema for use by legacy applications.    A procedural
implementation is obvious, but completely non-reusable, and I'd like to be able
to leverage this in the future.  (different inputs, the same outputs.)  Can
someone point me to examples of OO approaches to this type of problem,
preferrably implemented in Python?

Some data, and thoughts:

The basic file structure is

Multi-Line Header
One or More DataSets
    where each Data Set consists of
    a single line header, and
    one or more lines of tab-delimited data (fixed structure across all sets).

The data sets are identified by the structure of their header lines.

A nested loop processor is the obvious procedural approach, assigning each input
field to a corresponding field in an output object.

One seemingly-OO approach is to create classes for each construct on the input
side.  The constructors would actually parse the file, and call subordinate
constructors (member classes) as appropriate.   (This seems awkward).
The output classes would match our schema, and of course these would be
reusable.

Since there is not a one-for-one correspondence between input and output in
either classes or records, the next step would be to derive the set of  outputs
from the inputs (quantity of output instances).    These would be constructed,
and the input structure would be walked, telling each instance to populate it's
output analogue.  Finally, the output hierarchy would be walked, telling each
instance to "Publish" itself.

With this basic model, the overall structure and the output classes would be
reusable, while the input classes and the translators (if not part of the input
classes) would be application specific.

Conceptual Code Snippet:
#  Parse and Create an object hierarchy
#    MyInput becomes an object containing member objects, dictionaries, etc..
#    SourceData is a class in an ApplicationSpecific
MyInputObject=SourceData("MyFile")

# Translate and create output hierarchy.
#  The custom classes will need to import the standard OutputObject module,
which contains constructors
#    for all output classes.
MyOutputObject=MyInputObject.Translate()

#  Create the desired (standard) output
MyOutputObject.Publish()

But as I noted, parsing with this model seems awkward.

Any pointers/constructive criticisms/suggestions are much appreciated.
Non-constructive criticism will be deflected without ego damage.   After all, I
know that I know far less than I know not, and that furthermore far more is
unknown than is known to be unknown.   Sigh.   ;>

Thanks,
Jon






More information about the Python-list mailing list