Best way to parse file into db-type layout?

John Machin sjmachin at lexicon.net
Fri Apr 29 18:10:33 EDT 2005


On Fri, 29 Apr 2005 18:54:54 GMT, Peter A. Schott
<paschott at no.yahoo.spamm.com> wrote:

>That looks promising.

> The field numbers are pre-defined at the mainframe level.

Of course. Have you managed to acquire a copy of the documentation, or
do you have to reverse-engineer it?

>This may help me get to my ultimate goal which is to pump these into a DB on a
>row-by-row basis ( :-P  ) 

That's your *ultimate* goal? Are you running a retro-computing museum
or something? Don't you want to *USE* the data?

> I'll have to do some playing around with this.  I
>knew that it looked like a dictionary, but wasn't sure how best to handle this.
>
>One follow-up question:  I'll end up getting multiple records for each "type".

What does that mean?? If it means that more than one customer will get
the "please settle your account" letter, and more than one customer
will get the "please buy a spangled fritzolator, only $9.99" letter,
you are stating the obvious -- otherwise, please explain.

>Would I be referencing these by row[#][field#]?

Not too sure what you mean by that -- whether you can get away with a
(read a row, write a row) way of handling the data depends on its
structure (like what are the relationships if any between different
rows) and what you want to do with it -- both murky concepts at the
moment.

>
>Minor revision to the format is that starts like:
>###,1,1,val_1,....

How often do these "minor revisions" happen? How flexible do you have
to be? And the extra "1" means what? Is it ever any other number?

>
>
>I think right now the plan is to parse through the file and insert the pairs
>directly into a DB table.  Something like RowID, LetterType, date, Field#,
>Value.

Again, I'd recommend you lose the "Field#" in favour of a better
representation, ASAP.

>  I can get RowID and LetterType overall, date is a constant, the rest
>would involve reading each pair and inserting both values into the table.  Time
>to hit the books a little more to get up to speed on all of this.

What you need is (a) a clear appreciation of what you are trying to do
with the data at a high level (b) then develop an understanding of
what is the underlying data model (c) then and only then worry about
technical details.

Good luck,
John





More information about the Python-list mailing list