Fastest way to read / procsess / write text?

Martin Franklin mfranklin1 at gatwick.westerngeco.slb.com
Wed Jun 5 09:37:00 EDT 2002


On Wednesday 05 Jun 2002 1:31 pm, you wrote:
> On Tuesday 04 Jun 2002 11:47 pm, you wrote:
> > I'm somewhat new to python and programing in general and I trying to
> > write a simple script that I can use to format large text files so that
> > they are suitable for importing into MySQL. So far, I have came up with
> > this:
> >
> > #!/usr/bin/python2
> >
> > import sys, re
> >
> > infile = sys.stdin
> > data = infile.read()
> > infile.close()
>
> This was the standard idiom (untill xreadlines came along I guess)
>
>
> while 1:
>     data=infile.readlines(100000) # hint to readlines for 100000 bytes....
>     if not data:
>         break ## EOF reached
>     for line in data:
>         ## process data
>
> > data = re.sub('[.](?=\d\d\d\d\d\d)|[.](?=\w+[ ][<|>])|[.](?=\w+[:])|[
> > ](?!0x)', '\t', data)
> >
> > outfile = open("/mnt/storage/output.txt", "w")
> > outfile.write(data)
> > outfile.close()



Havine re-read you original version this is how to do it;-)

while 1:
    data=infile.read(100000)
    if not data:
        break ## EOF reached
    ## process data.....

















> >
> >
> > This works beautifully most of the time (it's super fast), except when I
> > pipe large files (>50megs) to it and then it usually dies half way
> > through complaining of memory errors because it ran out of ram.
> >
> > Short of buying more ram, is there a way I can make this more effecient
> > without taking a big peformance hit? I tried the above where I used a
> > "for line in data:" loop so it would only process one line at a time but
> > this seemed to take forever! I'm imagining there's a way to process the
> > data in something like 1 meg chunks, but I'm not too sure how to do that.
> >
> > I'm using python 2.2.1 on a RH linux 7.2 system with 256 ram and a
> > Athlon XP1700 if it makes any diffrence.
> >
> > Any suggestions?
> >
> > TIA,
> > Brent





More information about the Python-list mailing list