Help optimize a script?

Joseph Santaniello someone at _no-spam_arbitrary.org
Wed Oct 17 13:51:51 EDT 2001


Hello All,

I have a simple script that I wrote to convert some fixed-width delimted
files to tab-delimited.

It works, but some of my files are over 100MB and it takes forever.

First, does anyone know of a tool that does this so I don't have to
reinvent the wheel, and barring that, can anyone offer some tips on how to
optimize this code:

indecies = { 'cob':[3,6,2,2,8,1,8], 'opend':[6,3,3,2,3,4,12,29] }
# above is trimmed for this example
# the lists in these dictionaries above are are the widths of the fields
# in the input files. The keys match the input file names just to keep
things readable.


# while is used cuz line in readlines() used too much ram with
# huge files.
while 1:
        line = sys.stdin.readline()
        if not line:
                break
        new = ''
        start = 0
        for index in indecies[sys.argv[1]]:
                new = new + string.strip(line[start:start + index])+'\t'
                start = start + index
        print new

So it reads in a line, then iterates over the list in he corresponding
dictionary, and prints stripped substrings extracted according the the
field widths in the list, printing a tab between each, then grabs a new
line and does it again.

Any suggestions on how to speed this up?

Thanks,

Joseph







More information about the Python-list mailing list