Newbie code review of parsing program Please
Mark Tolonen
M8R-yfto6h at mailinator.com
Sun Nov 16 13:40:59 EST 2008
"len" <lsumnler at gmail.com> wrote in message
news:fc3ef718-edc4-4892-8418-3eeff0975edc at u18g2000pro.googlegroups.com...
>I have created the following program to read a text file which happens
> to be a cobol filed definition. The program then outputs to a file
> what is essentially a file which is a list definition which I can
> later
> copy and past into a python program. I will eventually expand the
> program
> to also output an SQL script to create a SQL file in MySQL
>
> The program still need a little work, it does not handle the following
> items
> yet;
>
> 1. It does not handle OCCURS yet.
> 2. It does not handle REDEFINE yet.
> 3. GROUP structures will need work.
> 4. Does not create SQL script yet.
>
> It is my anticipation that any files created out of this program may
> need
> manual tweeking but I have a large number of cobol file definitions
> which
> I may need to work with and this seemed like a better solution than
> hand
> typing each list definition and SQL create file script by hand.
>
> What I would like is if some kind soul could review my code and give
> me
> some suggestions on how I might improve it. I think the use of
> regular
> expression might cut the code down or at least simplify the parsing
> but
> I'm just starting to read those chapters in the book;)
>
> *** SAMPLE INPUT FILE ***
>
> 000100 FD SALESMEN-FILE
> 000200 LABEL RECORDS ARE STANDARD
> 000300 VALUE OF FILENAME IS "SALESMEN".
> 000400
> 000500 01 SALESMEN-RECORD.
> 000600 05 SALESMEN-NO PIC 9(3).
> 000700 05 SALESMEN-NAME PIC X(30).
> 000800 05 SALESMEN-TERRITORY PIC X(30).
> 000900 05 SALESMEN-QUOTA PIC S9(7) COMP.
> 001000 05 SALESMEN-1ST-BONUS PIC S9(5)V99 COMP.
> 001100 05 SALESMEN-2ND-BONUS PIC S9(5)V99 COMP.
> 001200 05 SALESMEN-3RD-BONUS PIC S9(5)V99 COMP.
> 001300 05 SALESMEN-4TH-BONUS PIC S9(5)V99 COMP.
>
> *** PROGRAM CODE ***
>
> #!/usr/bin/python
>
> import sys
>
> f_path = '/home/lenyel/Bruske/MCBA/Internet/'
> f_name = sys.argv[1]
>
> fd = open(f_path + f_name, 'r')
>
> def fmtline(fieldline):
> size = ''
> type = ''
> dec = ''
> codeline = []
> if fieldline.count('COMP.') > 0:
> left = fieldline[3].find('(') + 1
> right = fieldline[3].find(')')
> num = fieldline[3][left:right].lstrip()
> if fieldline[3].count('V'):
> left = fieldline[3].find('V') + 1
> dec = int(len(fieldline[3][left:]))
> size = ((int(num) + int(dec)) / 2) + 1
> else:
> size = (int(num) / 2) + 1
> dec = 0
> type = 'Pdec'
> elif fieldline[3][0] in ('X', '9'):
> dec = 0
> left = fieldline[3].find('(') + 1
> right = fieldline[3].find(')')
> size = int(fieldline[3][left:right].lstrip('0'))
> if fieldline[3][0] == 'X':
> type = 'Xstr'
> else:
> type = 'Xint'
> else:
> dec = 0
> left = fieldline[3].find('(') + 1
> right = fieldline[3].find(')')
> size = int(fieldline[3][left:right].lstrip('0'))
> if fieldline[3][0] == 'X':
> type = 'Xint'
> codeline.append(fieldline[1].replace('-', '_').replace('.',
> '').lower())
> codeline.append(size)
> codeline.append(type)
> codeline.append(dec)
> return codeline
>
> wrkfd = []
> rec_len = 0
>
> for line in fd:
> if line[6] == '*': # drop comment lines
> continue
> newline = line.split()
> if len(newline) == 1: # drop blank line
> continue
> newline = newline[1:]
> if 'FILENAME' in newline:
> filename = newline[-1].replace('"','').lower()
> filename = filename.replace('.','')
> output = open('/home/lenyel/Bruske/MCBA/Internet/'+filename
> +'.fd', 'w')
> code = filename + ' = [\n'
> output.write(code)
> elif newline[0].isdigit() and 'PIC' in newline:
> wrkfd.append(fmtline(newline))
> rec_len += wrkfd[-1][1]
>
> fd.close()
>
> fmtfd = []
>
> for wrkline in wrkfd[:-1]:
> fmtline = str(tuple(wrkline)) + ',\n'
> output.write(fmtline)
>
> fmtline = tuple(wrkfd[-1])
> fmtline = str(fmtline) + '\n'
> output.write(fmtline)
>
> lastline = ']\n'
> output.write(lastline)
>
> lenrec = filename + '_len = ' + str(rec_len)
> output.write(lenrec)
>
> output.close()
>
> *** RESULTING OUTPUT ***
>
> salesmen = [
> ('salesmen_no', 3, 'Xint', 0),
> ('salesmen_name', 30, 'Xstr', 0),
> ('salesmen_territory', 30, 'Xstr', 0),
> ('salesmen_quota', 4, 'Pdec', 0),
> ('salesmen_1st_bonus', 4, 'Pdec', 2),
> ('salesmen_2nd_bonus', 4, 'Pdec', 2),
> ('salesmen_3rd_bonus', 4, 'Pdec', 2),
> ('salesmen_4th_bonus', 4, 'Pdec', 2)
> ]
> salesmen_len = 83
>
> If you find this code useful please feel free to use any or all of it
> at your own risk.
>
> Thanks
> Len S
You might want to check out the pyparsing library.
-Mark
More information about the Python-list
mailing list