FileInput too slow

Steven D'Aprano steven at REMOVE.THIS.cybersource.com.au
Mon Jan 4 22:27:36 EST 2010


On Mon, 04 Jan 2010 23:35:02 +0100, wiso wrote:

> I'm trying the fileinput module, and I like it, but I don't understand
> why it's so slow... 


Because it is written for convenience, not speed. From the source code:

"Performance: this module is unfortunately one of the slower ways of
processing large numbers of input lines."



> look:
> 
> from time import time
> from fileinput import FileInput
> 
> file = ['r1_200907.log', 'r1_200908.log', 'r1_200909.log',
> 'r1_200910.log', 'r1_200911.log']
> 
> def f1():
>   n = 0
>   for f in file:
>     print "new file: %s" % f
>     ff = open(f)
>     for line in ff:
>       n += 1
>     ff.close()
>   return n
> 
> def f2():
>   f = FileInput(file)
>   for line in f:
>     if f.isfirstline(): print "new file: %s" % f.filename()
>   return f.lineno()
> 
> def f3(): # f2 simpler
>   f = FileInput(file)
>   for line in f:
>     pass
>   return f.lineno()
> 
> 
> t = time(); f1(); print time()-t # 1.0
> t = time(); f2(); print time()-t # 7.0 !!! 
> t = time(); f3(); print time()-t # 5.5
> 
> I'm using text files, there are 2563150 lines in total.

The extra second and a half in f2() is probably due to the time it takes 
to call f.isfirstline() 2563150 times.



-- 
Steven



More information about the Python-list mailing list