Line Text Parsing
Dang Griffith
noemail at noemail4u.com
Wed Feb 4 14:45:00 EST 2004
On Wed, 04 Feb 2004 19:35:52 GMT, allanc
<kawNOSPAMenks at nospamyahoo.ca> wrote:
>I'm new with python so bear with me.
>
>I'm looking for a way to elegantly parse fixed-width text data (as opposed
>to CSV) and saving the parsed data unto a database. The text data comes
>from an old ISAM-format table and each line may be a different record
>structure depending on key fields in the line.
>
>RegExp with match and split are of interest but it's been too long since
>I've dabbled with RE to be able to judge whether its use will make the
>problem more complex.
>
>Here's a sample of the records I need to parse:
>
>01508390019002 11284361000002SUGARPLUM
>015083915549 SHORT ON LAST ORDER
>0150839220692 000002EA BMC 15 KG 001400
>
>1st Line is a (portion of) header record.
>2nd Line is an text instruction record.
>3rd Line is a Transaction Line Item record.
>
>Each type of record has a different structure. But these set of lines
>appear in the one table.
Are the key fields in fixed positions? If so, pluck them out and use
them as an index into a dictionary of functions to call. I can't tell
from your example where the keys are, so I'm assuming the first 8 are
simply a line number and the next 4 are the key.
Maybe something along these lines:
def header(x):
print 'header: %s' % x # process header
def testinstruction(x):
print 'test instruction: %s' % x # process test instruction
def lineitem(x):
print 'lineitem: %s' % x # process line item
ptable = {'0190':header, '5549': testinstruction, '2069': lineitem}
for line in file("data.dat"):
ptable[line[8:12]](line)
--dang
More information about the Python-list
mailing list