Fixed-length fields in Martel

Andrew Dalke adalke at mindspring.com
Wed Jun 4 12:34:17 EDT 2003


Michael Schmit:
> > Do you mean my Martel, the regexp parser which generates SAX events?
>
> Yes.

So people are actually using it.  Cool!

> What would you suggest for processing this tabular data? The desired
output
> would be a table (list of lists). Naming each column might be too
> complicated for many columns. Pass the whole table as event and
disassemble
> in the ContentHandler? Pass lines as events? This would make the
> ContentHandler to disassemble the table again. How to avoid this
> redundancy?

What do you need the output to look like?  You could name everything
the same ("field") if it's inappropriate to name every field different.  You
could push the parsing into the ContentHandler (I've got some experimental
code in the lastest Martel to allow content handlers to say they are willing
to do extra processing, for performance sake.)

Also, take a look at the "LAX" content handler, included with Martel.
It's meant to be a simple way to read lists of fields from flat XML reords
and helps with columnar data.

> Validity:
> If the lines have no fixed part (like "ATOM") the combination of
fixed_width
> expressions matches any line of appropriate length. How to make that more
> robust. Require whitespace separation between columns?

So fixed number of characters in a field plus whitespace between them?

  Re("[^\s]{6}") + Re("[ \t]+") + .... + AnyEol()

This reads 6 non-space characters followed by one or more spaces or
tabs, etc. and then the newline.

                    Andrew
                    dalke at dalkescientific.com






More information about the Python-list mailing list