how to parse numeric data files

Peter Hansen peter at engcorp.com
Tue Apr 29 13:41:37 EDT 2003


george young wrote:
> 
> We have several electronic testing machines(of various ages and
> manufacturers) that spew out testing data files in various ascii formats.
> Currently we have a nasty mess of awk/shell/C/fortran programs that
> extract and process some data from these files.  I have a dream of
> a suite of simple, clear, maintainable python programs to do these tasks.
> 
> The trick is I hope to come up with something that our hardware
> engineers can understand and maintain easily without studying
> things like BNF, LALR etc. (they won't).
> 
> [Below is a sample of one of the worst formats, shortened from a 40MB file!]

The format is quite amenable to parsing with re, if rather large...
the real question is how much of that data do you need, and what do
you need to do with it?  What do your current scripts actually do?
Also, how much of the content you showed is *fixed* format, and how
much of the format can vary?  Is anything optional?

If you want the hardware engineers to be able to maintain it, you might
want to support a kind of "template" specification, where you provide the
names of various tags which are recognized (e.g. "PH_lot_id:") and you
automatically extract the appropriate value found thereater.

What you're trying to do is not really that complex, I think, so I
do think you should be able to find a good simple solution for it.
As you said, this _is_ Python...

-Peter




More information about the Python-list mailing list