Newbie code review of parsing program Please

Sun Nov 16 13:40:59 EST 2008

"len" <lsumnler at gmail.com> wrote in message 
news:fc3ef718-edc4-4892-8418-3eeff0975edc at u18g2000pro.googlegroups.com...
>I have created the following program to read a text file which happens
> to be a cobol filed definition.  The program then outputs to a file
> what is essentially a file which is a list definition which I can
> later
> copy and past into a python program.  I will eventually expand the
> program
> to also output an SQL script to create a SQL file in MySQL
>
> The program still need a little work, it does not handle the following
> items
> yet;
>
> 1.  It does not handle OCCURS yet.
> 2.  It does not handle REDEFINE yet.
> 3.  GROUP structures will need work.
> 4.  Does not create SQL script yet.
>
> It is my anticipation that any files created out of this program may
> need
> manual tweeking but I have a large number of cobol file definitions
> which
> I may need to work with and this seemed like a better solution than
> hand
> typing each list definition and SQL create file script by hand.
>
> What I would like is if some kind soul could review my code and give
> me
> some suggestions on how I might improve it.  I think the use of
> regular
> expression might cut the code down or at least simplify the parsing
> but
> I'm just starting to read those chapters in the book;)
>
> *** SAMPLE INPUT FILE ***
>
> 000100 FD  SALESMEN-FILE
> 000200     LABEL RECORDS ARE STANDARD
> 000300     VALUE OF FILENAME IS "SALESMEN".
> 000400
> 000500 01  SALESMEN-RECORD.
> 000600     05  SALESMEN-NO                PIC 9(3).
> 000700     05  SALESMEN-NAME              PIC X(30).
> 000800     05  SALESMEN-TERRITORY         PIC X(30).
> 000900     05  SALESMEN-QUOTA             PIC S9(7) COMP.
> 001000     05  SALESMEN-1ST-BONUS         PIC S9(5)V99 COMP.
> 001100     05  SALESMEN-2ND-BONUS         PIC S9(5)V99 COMP.
> 001200     05  SALESMEN-3RD-BONUS         PIC S9(5)V99 COMP.
> 001300     05  SALESMEN-4TH-BONUS         PIC S9(5)V99 COMP.
>
> *** PROGRAM CODE ***
>
> #!/usr/bin/python
>
> import sys
>
> f_path = '/home/lenyel/Bruske/MCBA/Internet/'
> f_name = sys.argv[1]
>
> fd = open(f_path + f_name, 'r')
>
> def fmtline(fieldline):
>    size = ''
>    type = ''
>    dec = ''
>    codeline = []
>    if fieldline.count('COMP.') > 0:
>        left = fieldline[3].find('(') + 1
>        right = fieldline[3].find(')')
>        num = fieldline[3][left:right].lstrip()
>        if fieldline[3].count('V'):
>            left = fieldline[3].find('V') + 1
>            dec = int(len(fieldline[3][left:]))
>            size = ((int(num) + int(dec)) / 2) + 1
>        else:
>            size = (int(num) / 2) + 1
>            dec = 0
>        type = 'Pdec'
>    elif fieldline[3][0] in ('X', '9'):
>        dec = 0
>        left = fieldline[3].find('(') + 1
>        right = fieldline[3].find(')')
>        size = int(fieldline[3][left:right].lstrip('0'))
>        if fieldline[3][0] == 'X':
>            type = 'Xstr'
>        else:
>            type = 'Xint'
>    else:
>        dec = 0
>        left = fieldline[3].find('(') + 1
>        right = fieldline[3].find(')')
>        size = int(fieldline[3][left:right].lstrip('0'))
>        if fieldline[3][0] == 'X':
>            type = 'Xint'
>    codeline.append(fieldline[1].replace('-', '_').replace('.',
> '').lower())
>    codeline.append(size)
>    codeline.append(type)
>    codeline.append(dec)
>    return codeline
>
> wrkfd = []
> rec_len = 0
>
> for line in fd:
>    if line[6] == '*':      # drop comment lines
>        continue
>    newline = line.split()
>    if len(newline) == 1:   # drop blank line
>        continue
>    newline = newline[1:]
>    if 'FILENAME' in newline:
>        filename = newline[-1].replace('"','').lower()
>        filename = filename.replace('.','')
>        output = open('/home/lenyel/Bruske/MCBA/Internet/'+filename
> +'.fd', 'w')
>        code = filename + ' = [\n'
>        output.write(code)
>    elif newline[0].isdigit() and 'PIC' in newline:
>        wrkfd.append(fmtline(newline))
>        rec_len += wrkfd[-1][1]
>
> fd.close()
>
> fmtfd = []
>
> for wrkline in wrkfd[:-1]:
>    fmtline = str(tuple(wrkline)) + ',\n'
>    output.write(fmtline)
>
> fmtline = tuple(wrkfd[-1])
> fmtline = str(fmtline) + '\n'
> output.write(fmtline)
>
> lastline = ']\n'
> output.write(lastline)
>
> lenrec = filename + '_len = ' + str(rec_len)
> output.write(lenrec)
>
> output.close()
>
> *** RESULTING OUTPUT ***
>
> salesmen = [
> ('salesmen_no', 3, 'Xint', 0),
> ('salesmen_name', 30, 'Xstr', 0),
> ('salesmen_territory', 30, 'Xstr', 0),
> ('salesmen_quota', 4, 'Pdec', 0),
> ('salesmen_1st_bonus', 4, 'Pdec', 2),
> ('salesmen_2nd_bonus', 4, 'Pdec', 2),
> ('salesmen_3rd_bonus', 4, 'Pdec', 2),
> ('salesmen_4th_bonus', 4, 'Pdec', 2)
> ]
> salesmen_len = 83
>
> If you find this code useful please feel free to use any or all of it
> at your own risk.
>
> Thanks
> Len S

You might want to check out the pyparsing library.

-Mark