Best way to parse file into db-type layout?

John Machin sjmachin at lexicon.net
Thu Apr 28 20:41:16 EDT 2005


On Thu, 28 Apr 2005 23:34:31 GMT, Peter A. Schott
<paschott at no.yahoo.spamm.com> wrote:

>I've got a file that seems to come across more like a dictionary from what I can
>tell.  Something like the following format:
>
>###,1,val_1,2,val_2,3,val_3,5,val_5,10,val_10
>###,1,val_1,2,val_2,3,val_3,5,val_5,11,val_11,25,val_25,967,val_967
>
>In other words, different layouts (defined mostly by what is in val_1, val_2,
>val_3).
>
>The ,#, fields indicate what "field" from our mainframe the corresponding value
>represents.
>
>Is there a good way to parse this into a DB-type format where I only pull out
>the values corresponding to the appropriate field numbers?  Seems like
>converting to a dictionary of some sort would be best, but I don't quite know
>how I would go about that.
>
>In this case, the first field is a value unto itself - represents a "letter
>type" that would be associated with the rest of the record.  The fields are
>either present or not, no placeholder if there's no value for e.g. Field #4.

Here's a sketch, tested as you'll see, but totally devoid of the
error-checking that would be necessary for any data coming from an MF.

C:\junk>type schott.txt
pers,1,xxx,2,yyy,3,zzz,100,SMITH,101,JOHN,102,ALOYSIUS,103,1969-12-31
addr,1,qqq,2,www,3,eee,200,"""THE LODGE"", 123 MAIN ST",205,WALLA
WALLA,206,WA

C:\junk>type schott.py
import csv
for row in csv.reader(open('schott.txt', 'rb')):
    rectype = row[0]
    recdict = {}
    for k in range(1, len(row), 2):
        recdict[int(row[k])] = row[k+1]
    print rectype, recdict

C:\junk>python schott.py
pers {1: 'xxx', 2: 'yyy', 3: 'zzz', 100: 'SMITH', 101: 'JOHN', 102:
'ALOYSIUS', 103: '1969-12-31'}
addr {1: 'qqq', 2: 'www', 3: 'eee', 200: '"THE LODGE", 123 MAIN ST',
205: 'WALLA WALLA', 206: 'WA'}

Hint: you'll probably go nuts if you don't implement some sort of
naming convention instead of those numbers.

One way would be like this:

mf_surname = 100
mf_given_1 = 101
...
mf_state = 206

then you can refer to recdict[mf_state] instead of recdict[206].

Going upmarket a bit:

Have a mf_map = {100: 'surname', 206: 'state', } # etc etc

then you do

class Record(object):
    pass

# for each row:
   rec = Record()
   rec.rectype = row[0]
   for k in range(1, len(row), 2):
       setattr(rec, mf_map[int(row[k])], row[k+1])

Then you can refer to rec.state instead of recdict[mf_state] or
recdict[206].

Further upmarket would involve gathering basic "type" information
about the MF fields (free text, alpha code, identifier (e.g. SSN),
money, quantity, date, etc etc) so that you can do validations and
format conversions as appropriate.

HTH,
John






More information about the Python-list mailing list