Fixed-length fields in Martel

Michael Schmitt nomail at nomail.com
Wed Jun 4 06:15:26 EDT 2003


Hello Andrew.

> Michael Schmit:
>> Is the a way to handle fixed-length fields with Martel. The often used
>> 'split-by-whitespace'-approach fails, if fields can be empty.
> 
> Do you mean my Martel, the regexp parser which generates SAX events?

Yes.

> You could do
> 
> field = Group("fixed_field", Re(".{10}"))
> fields = Rep(field)
> 
> That would parse your whole file in groups of 10.
> 
> If you had something more complicated, like
> 
> 6 characters for the field name
>   - if the field name is "ATOM  " then
>       1 space
>       4 characters for the atom name
>       6 characters * 3 for the coordinate
> 
> you could do
> 
> def fixed_width(name, size):
>   return Group(name, Re("."*size))
> 
> ATOM_LINE = (Str("ATOM  ") + Str(" ") + fixed_width("name", 4) +
>       fixed_width("x", 6) + fixed_width("y", 6) + fixed_width("z", 6) +
> AnyEol())
> 

That is almost exactly what I was looking for.

Some more questions:

Postprocessing:
What would you suggest for processing this tabular data? The desired output 
would be a table (list of lists). Naming each column might be too 
complicated for many columns. Pass the whole table as event and disassemble 
in the ContentHandler? Pass lines as events? This would make the 
ContentHandler to disassemble the table again. How to avoid this 
redundancy?

Validity:
If the lines have no fixed part (like "ATOM") the combination of fixed_width 
expressions matches any line of appropriate length. How to make that more 
robust. Require whitespace separation between columns?


Thanks a lot for your help.

Regards, 
Michael













More information about the Python-list mailing list