Best way to parse file into db-type layout?

Fri Apr 29 14:54:54 EDT 2005

That looks promising.  The field numbers are pre-defined at the mainframe level.
This may help me get to my ultimate goal which is to pump these into a DB on a
row-by-row basis ( :-P  )  I'll have to do some playing around with this.  I
knew that it looked like a dictionary, but wasn't sure how best to handle this.

One follow-up question:  I'll end up getting multiple records for each "type".
Would I be referencing these by row[#][field#]?

Minor revision to the format is that starts like:
###,1,1,val_1,....

I think right now the plan is to parse through the file and insert the pairs
directly into a DB table.  Something like RowID, LetterType, date, Field#,
Value.  I can get RowID and LetterType overall, date is a constant, the rest
would involve reading each pair and inserting both values into the table.  Time
to hit the books a little more to get up to speed on all of this.

I really appreciate the help.  I knew there had to be a better way to do this,
just wasn't sure what it was.

-Pete

John Machin <sjmachin at lexicon.net> wrote:

> On Thu, 28 Apr 2005 23:34:31 GMT, Peter A. Schott
> <paschott at no.yahoo.spamm.com> wrote:
> 
> >I've got a file that seems to come across more like a dictionary from what I can
> >tell.  Something like the following format:
> >
> >###,1,val_1,2,val_2,3,val_3,5,val_5,10,val_10
> >###,1,val_1,2,val_2,3,val_3,5,val_5,11,val_11,25,val_25,967,val_967
> >
> >In other words, different layouts (defined mostly by what is in val_1, val_2,
> >val_3).
> >
> >The ,#, fields indicate what "field" from our mainframe the corresponding value
> >represents.
> >
> >Is there a good way to parse this into a DB-type format where I only pull out
> >the values corresponding to the appropriate field numbers?  Seems like
> >converting to a dictionary of some sort would be best, but I don't quite know
> >how I would go about that.
> >
> >In this case, the first field is a value unto itself - represents a "letter
> >type" that would be associated with the rest of the record.  The fields are
> >either present or not, no placeholder if there's no value for e.g. Field #4.
> 
> Here's a sketch, tested as you'll see, but totally devoid of the
> error-checking that would be necessary for any data coming from an MF.
> 
> C:\junk>type schott.txt
> pers,1,xxx,2,yyy,3,zzz,100,SMITH,101,JOHN,102,ALOYSIUS,103,1969-12-31
> addr,1,qqq,2,www,3,eee,200,"""THE LODGE"", 123 MAIN ST",205,WALLA
> WALLA,206,WA
> 
> C:\junk>type schott.py
> import csv
> for row in csv.reader(open('schott.txt', 'rb')):
>     rectype = row[0]
>     recdict = {}
>     for k in range(1, len(row), 2):
>         recdict[int(row[k])] = row[k+1]
>     print rectype, recdict
> 
> C:\junk>python schott.py
> pers {1: 'xxx', 2: 'yyy', 3: 'zzz', 100: 'SMITH', 101: 'JOHN', 102:
> 'ALOYSIUS', 103: '1969-12-31'}
> addr {1: 'qqq', 2: 'www', 3: 'eee', 200: '"THE LODGE", 123 MAIN ST',
> 205: 'WALLA WALLA', 206: 'WA'}
> 
> Hint: you'll probably go nuts if you don't implement some sort of
> naming convention instead of those numbers.
> 
> One way would be like this:
> 
> mf_surname = 100
> mf_given_1 = 101
> ...
> mf_state = 206
> 
> then you can refer to recdict[mf_state] instead of recdict[206].
> 
> Going upmarket a bit:
> 
> Have a mf_map = {100: 'surname', 206: 'state', } # etc etc
> 
> then you do
> 
> class Record(object):
>     pass
> 
> # for each row:
>    rec = Record()
>    rec.rectype = row[0]
>    for k in range(1, len(row), 2):
>        setattr(rec, mf_map[int(row[k])], row[k+1])
> 
> Then you can refer to rec.state instead of recdict[mf_state] or
> recdict[206].
> 
> Further upmarket would involve gathering basic "type" information
> about the MF fields (free text, alpha code, identifier (e.g. SSN),
> money, quantity, date, etc etc) so that you can do validations and
> format conversions as appropriate.
> 
> HTH,
> John
> 
>