Fastest way to read / procsess / write text?
Brent Miller
yidaki2 at excite.com
Tue Jun 4 19:47:55 EDT 2002
I'm somewhat new to python and programing in general and I trying to
write a simple script that I can use to format large text files so that
they are suitable for importing into MySQL. So far, I have came up with
this:
#!/usr/bin/python2
import sys, re
infile = sys.stdin
data = infile.read()
infile.close()
data = re.sub('[.](?=\d\d\d\d\d\d)|[.](?=\w+[ ][<|>])|[.](?=\w+[:])|[
](?!0x)', '\t', data)
outfile = open("/mnt/storage/output.txt", "w")
outfile.write(data)
outfile.close()
This works beautifully most of the time (it's super fast), except when I
pipe large files (>50megs) to it and then it usually dies half way
through complaining of memory errors because it ran out of ram.
Short of buying more ram, is there a way I can make this more effecient
without taking a big peformance hit? I tried the above where I used a
"for line in data:" loop so it would only process one line at a time but
this seemed to take forever! I'm imagining there's a way to process the
data in something like 1 meg chunks, but I'm not too sure how to do that.
I'm using python 2.2.1 on a RH linux 7.2 system with 256 ram and a
Athlon XP1700 if it makes any diffrence.
Any suggestions?
TIA,
Brent
More information about the Python-list
mailing list