Converting a text data file from positional to tab delimited.

Lee Joramo lee at joramo.com
Tue Mar 13 09:09:52 EST 2001


I am looking for suggestions to speed up the process of converting a large
text data file from 'positional' layout to tab delimited. The data file is
over 200MB in size containing over 40,000 lines which have over 600 fields.

I suspect that the 'for' loop that splits each line into tab delimited,
could be optimized. Perhaps it could be replaced with a regex or other
technique.

Thanks for any suggestions.

Lee Joramo

==================

#'layout' list elements [field name, start position, end position]
#For brievty, I have only included 10 fields.
#The contents of 'layout' are extracted from a text file that
#describes the layout of the datafile.
layout = [
    ['STUDY', 0, 7]
    ['MDLNO', 8, 12]
    ['DASH', 13, 16]
    ['INCENT', 17, 17]
    ['CODE1', 18, 18]
    ['CODE2', 19, 19]
    ['COVLET', 20, 20]
    ['VERSN', 21, 21]
    ['MAILNO', 22, 23]
    ['MLDYY', 24, 27]]

inFile = open('rawdata.dat', 'r')
outFile = open('delimted.dat', 'w')
while 1:
    lines = inFile.readlines(1000)
    if not lines: break
    for line in lines:
        delimitedLine = ""
        delimit = ""
        for field in layout:
            #
            #can this loop be improved??
            #
            fieldValue = line[field[1]:field[2]]
            delimitedLine = delimitedLine + delimit + fieldValue
            delimit = "\t"
        outFile.write(delimitedLine+"\n")
inFile.close()
outFile.close()
del lines




More information about the Python-list mailing list