Fastest way to read / procsess / write text?

Pekka Niiranen krissepu at vip.fi
Wed Jun 5 03:28:21 EDT 2002


Use map() instead of  for -loop:

pdata = []
pdata = map(re.sub(whatever), data)

I managed to get 10x speed increase myself with map()

Try alsol earning mxtexttools for further speedups (so I can ask help from
you ;))

-pekka-

Brent Miller wrote:

> I'm somewhat new to python and programing in general and I trying to
> write a simple script that I can use to format large text files so that
> they are suitable for importing into MySQL. So far, I have came up with
> this:
>
> #!/usr/bin/python2
>
> import sys, re
>
> infile = sys.stdin
> data = infile.read()
> infile.close()
>
> data = re.sub('[.](?=\d\d\d\d\d\d)|[.](?=\w+[ ][<|>])|[.](?=\w+[:])|[
> ](?!0x)', '\t', data)
>
> outfile = open("/mnt/storage/output.txt", "w")
> outfile.write(data)
> outfile.close()
>
> This works beautifully most of the time (it's super fast), except when I
> pipe large files (>50megs) to it and then it usually dies half way
> through complaining of memory errors because it ran out of ram.
>
> Short of buying more ram, is there a way I can make this more effecient
> without taking a big peformance hit? I tried the above where I used a
> "for line in data:" loop so it would only process one line at a time but
> this seemed to take forever! I'm imagining there's a way to process the
> data in something like 1 meg chunks, but I'm not too sure how to do that.
>
> I'm using python 2.2.1 on a RH linux 7.2 system with 256 ram and a
> Athlon XP1700 if it makes any diffrence.
>
> Any suggestions?
>
> TIA,
> Brent




More information about the Python-list mailing list