Fixed-length fields in Martel
Michael Schmitt
nomail at nomail.com
Wed Jun 4 06:15:26 EDT 2003
Hello Andrew.
> Michael Schmit:
>> Is the a way to handle fixed-length fields with Martel. The often used
>> 'split-by-whitespace'-approach fails, if fields can be empty.
>
> Do you mean my Martel, the regexp parser which generates SAX events?
Yes.
> You could do
>
> field = Group("fixed_field", Re(".{10}"))
> fields = Rep(field)
>
> That would parse your whole file in groups of 10.
>
> If you had something more complicated, like
>
> 6 characters for the field name
> - if the field name is "ATOM " then
> 1 space
> 4 characters for the atom name
> 6 characters * 3 for the coordinate
>
> you could do
>
> def fixed_width(name, size):
> return Group(name, Re("."*size))
>
> ATOM_LINE = (Str("ATOM ") + Str(" ") + fixed_width("name", 4) +
> fixed_width("x", 6) + fixed_width("y", 6) + fixed_width("z", 6) +
> AnyEol())
>
That is almost exactly what I was looking for.
Some more questions:
Postprocessing:
What would you suggest for processing this tabular data? The desired output
would be a table (list of lists). Naming each column might be too
complicated for many columns. Pass the whole table as event and disassemble
in the ContentHandler? Pass lines as events? This would make the
ContentHandler to disassemble the table again. How to avoid this
redundancy?
Validity:
If the lines have no fixed part (like "ATOM") the combination of fixed_width
expressions matches any line of appropriate length. How to make that more
robust. Require whitespace separation between columns?
Thanks a lot for your help.
Regards,
Michael
More information about the Python-list
mailing list