.readline() - VERY SLOW compared to PERL

Duncan Booth duncan at rcp.co.uk
Tue Nov 21 06:48:32 EST 2000


h_schneider at marketmix.com (Harald Schneider) wrote in
<8vd94d$mjo$06$1 at news.t-online.com>: 

>Thanks for your reply.
>
>The alternate methods posted here (readlines with chunks)  cut down the
>weak perfomance to nearly the
>results, you posted.
>
>For everyone interested, here are the scripts used for testing:
>
>

Just for interest, I wondered what difference, if any, does it make to the 
Python scripts you posted if you put everything inside a function so that 
it uses local variables in place of the global variables?

I tried this and, although the times fluctuate substantially on different 
runs, I got your 5.6 second script running on my test file (48Mb) in 12.66 
seconds. My version with local variables took 10.12 seconds.

Not a major difference, but possibly worthwhile. Interestingly, most of the 
speedup comes from using a local variable for string.split. Without this 
optimisation it takes about 12.06 seconds. Also note that I used the -O 
command line option as otherwise the times were all about 5 seconds slower.

Here is my version:

======================================
import sys, string, time

def run():
    print "Running..."
    dbname = 'test.dat'
    secStart = time.time()

    db = open(dbname, 'r')

    read = db.readlines
    split = string.split
    
    while 1:
        lines = read(250000)
        if not lines:
            break
        for dbline in lines:
            rs = split(dbline, ';')

            if rs[0] == 'TEST':
                print dbline + "\n"
                break

    print "DONE!\n"
    db.close
    print "Elapsed time: %f sec." % (time.time() - secStart)

if __name__=='__main__':
    run()
========================================



More information about the Python-list mailing list