Line Text Parsing

Dang Griffith noemail at noemail4u.com
Wed Feb 4 14:45:00 EST 2004


On Wed, 04 Feb 2004 19:35:52 GMT, allanc
<kawNOSPAMenks at nospamyahoo.ca> wrote:

>I'm new with python so bear with me.
>
>I'm looking for a way to elegantly parse fixed-width text data (as opposed 
>to CSV) and saving the parsed data unto a database. The text data comes 
>from an old ISAM-format table and each line may be a different record 
>structure depending on key fields in the line.
>
>RegExp with match and split are of interest but it's been too long since 
>I've dabbled with RE to be able to judge whether its use will make the 
>problem more complex.
>
>Here's a sample of the records I need to parse:
>
>01508390019002      11284361000002SUGARPLUM
>015083915549           SHORT ON LAST ORDER 
>0150839220692 000002EA BMC   15 KG   001400
>
>1st Line is a (portion of) header record.
>2nd Line is an text instruction record.
>3rd Line is a Transaction Line Item record.
>
>Each type of record has a different structure. But these set of lines 
>appear in the one table.

Are the key fields in fixed positions?  If so, pluck them out and use
them as an index into a dictionary of functions to call.  I can't tell
from your example where the keys are, so I'm assuming the first 8 are
simply a line number and the next 4 are the key.

Maybe something along these lines:

def header(x):
    print 'header: %s' % x # process header

def testinstruction(x):
    print 'test instruction: %s' % x # process test instruction

def lineitem(x):
    print 'lineitem: %s' % x # process line item

ptable = {'0190':header, '5549': testinstruction, '2069': lineitem}

for line in file("data.dat"):
    ptable[line[8:12]](line)

    --dang



More information about the Python-list mailing list