Best way to parse file into db-type layout?
Peter A. Schott
paschott at no.yahoo.spamm.com
Fri Apr 29 14:54:54 EDT 2005
That looks promising. The field numbers are pre-defined at the mainframe level.
This may help me get to my ultimate goal which is to pump these into a DB on a
row-by-row basis ( :-P ) I'll have to do some playing around with this. I
knew that it looked like a dictionary, but wasn't sure how best to handle this.
One follow-up question: I'll end up getting multiple records for each "type".
Would I be referencing these by row[#][field#]?
Minor revision to the format is that starts like:
###,1,1,val_1,....
I think right now the plan is to parse through the file and insert the pairs
directly into a DB table. Something like RowID, LetterType, date, Field#,
Value. I can get RowID and LetterType overall, date is a constant, the rest
would involve reading each pair and inserting both values into the table. Time
to hit the books a little more to get up to speed on all of this.
I really appreciate the help. I knew there had to be a better way to do this,
just wasn't sure what it was.
-Pete
John Machin <sjmachin at lexicon.net> wrote:
> On Thu, 28 Apr 2005 23:34:31 GMT, Peter A. Schott
> <paschott at no.yahoo.spamm.com> wrote:
>
> >I've got a file that seems to come across more like a dictionary from what I can
> >tell. Something like the following format:
> >
> >###,1,val_1,2,val_2,3,val_3,5,val_5,10,val_10
> >###,1,val_1,2,val_2,3,val_3,5,val_5,11,val_11,25,val_25,967,val_967
> >
> >In other words, different layouts (defined mostly by what is in val_1, val_2,
> >val_3).
> >
> >The ,#, fields indicate what "field" from our mainframe the corresponding value
> >represents.
> >
> >Is there a good way to parse this into a DB-type format where I only pull out
> >the values corresponding to the appropriate field numbers? Seems like
> >converting to a dictionary of some sort would be best, but I don't quite know
> >how I would go about that.
> >
> >In this case, the first field is a value unto itself - represents a "letter
> >type" that would be associated with the rest of the record. The fields are
> >either present or not, no placeholder if there's no value for e.g. Field #4.
>
> Here's a sketch, tested as you'll see, but totally devoid of the
> error-checking that would be necessary for any data coming from an MF.
>
> C:\junk>type schott.txt
> pers,1,xxx,2,yyy,3,zzz,100,SMITH,101,JOHN,102,ALOYSIUS,103,1969-12-31
> addr,1,qqq,2,www,3,eee,200,"""THE LODGE"", 123 MAIN ST",205,WALLA
> WALLA,206,WA
>
> C:\junk>type schott.py
> import csv
> for row in csv.reader(open('schott.txt', 'rb')):
> rectype = row[0]
> recdict = {}
> for k in range(1, len(row), 2):
> recdict[int(row[k])] = row[k+1]
> print rectype, recdict
>
> C:\junk>python schott.py
> pers {1: 'xxx', 2: 'yyy', 3: 'zzz', 100: 'SMITH', 101: 'JOHN', 102:
> 'ALOYSIUS', 103: '1969-12-31'}
> addr {1: 'qqq', 2: 'www', 3: 'eee', 200: '"THE LODGE", 123 MAIN ST',
> 205: 'WALLA WALLA', 206: 'WA'}
>
> Hint: you'll probably go nuts if you don't implement some sort of
> naming convention instead of those numbers.
>
> One way would be like this:
>
> mf_surname = 100
> mf_given_1 = 101
> ...
> mf_state = 206
>
> then you can refer to recdict[mf_state] instead of recdict[206].
>
> Going upmarket a bit:
>
> Have a mf_map = {100: 'surname', 206: 'state', } # etc etc
>
> then you do
>
> class Record(object):
> pass
>
> # for each row:
> rec = Record()
> rec.rectype = row[0]
> for k in range(1, len(row), 2):
> setattr(rec, mf_map[int(row[k])], row[k+1])
>
> Then you can refer to rec.state instead of recdict[mf_state] or
> recdict[206].
>
> Further upmarket would involve gathering basic "type" information
> about the MF fields (free text, alpha code, identifier (e.g. SSN),
> money, quantity, date, etc etc) so that you can do validations and
> format conversions as appropriate.
>
> HTH,
> John
>
>
More information about the Python-list
mailing list