Line Text Parsing

Thu Feb 5 10:36:27 EST 2004

I think one of the easiest ways to do this is to
write a class that knows how to parse each of the
unique lines.  As you are reading through the file/table
and encounter a line like the first, create a new
class instance and pass it the line's contents.  The
__init__ method of the class can parse the line and
place each of the field values in an attribute of the
class.

Something like (this is pseudocode):

class linetype01:
    #
    # Define a list that contains information about how to
    # parse a single linetype.  The info is fieldname,
    # beginning column, ending column, fieldlength
    #

    _parsinginfo=[('recnum',0,8),
                  ('linetype',8,3),
                  ('dataitem2',11,3),
                  ...)
    def __init__(self, linetext):
        self.linetext=linetext
        for fieldname, begincol, fieldlength in _parsinginfo:
            self.__dict__[fieldname]=linetext[begincol,
begincol+fieldlength+1]
    return

you would define a class like this for each unique linetype

in main program
import sys

#
# Insert code to open file/table here
#
for line in table:
    #
    # See which linetype it is
    #
    linetype=line[8:10]
    if linetype == "01":
        pline=linetype01(line)
        #
        # Now you can extract the values by accessing attributes of
        # the class.
        #
        recordnum=pline.recnum
        tlinetype=pline.linetype
        #
        # Do something with the values
        #

    elif linetype == "55":
        pline=linetype55(line)

    elif linetype == "20":
        pline=linetype20(line)
    else:
        print "ERROR-Illegal linetype encountered")
        sys.exit(2)

Just one of many ways to solve this problem.

-Larry

"allanc" <kawNOSPAMenks at nospamyahoo.ca> wrote in message
news:Xns948575A2C930Aacuencacanadacom at 198.161.157.145...
> I'm new with python so bear with me.
>
> I'm looking for a way to elegantly parse fixed-width text data (as opposed
> to CSV) and saving the parsed data unto a database. The text data comes
> from an old ISAM-format table and each line may be a different record
> structure depending on key fields in the line.
>
> RegExp with match and split are of interest but it's been too long since
> I've dabbled with RE to be able to judge whether its use will make the
> problem more complex.
>
> Here's a sample of the records I need to parse:
>
> 01508390019002      11284361000002SUGARPLUM
> 015083915549           SHORT ON LAST ORDER
> 0150839220692 000002EA BMC   15 KG   001400
>
> 1st Line is a (portion of) header record.
> 2nd Line is an text instruction record.
> 3rd Line is a Transaction Line Item record.
>
> Each type of record has a different structure. But these set of lines
> appear in the one table.
>
>
> Any ideas would be greatly appreciated.
>
> Allan