FileInput too slow

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Mon Jan 4 20:01:39 EST 2010


En Mon, 04 Jan 2010 19:35:02 -0300, wiso <gtu2003 at alice.it> escribió:

> I'm trying the fileinput module, and I like it, but I don't understand  
> why
> it's so slow... look:
>
> from time import time
> from fileinput import FileInput
>
> file = ['r1_200907.log', 'r1_200908.log', 'r1_200909.log',  
> 'r1_200910.log',
> 'r1_200911.log']
>
> def f1():
>   n = 0
>   for f in file:
>     print "new file: %s" % f
>     ff = open(f)
>     for line in ff:
>       n += 1
>     ff.close()
>   return n
>
> def f2():
>   f = FileInput(file)
>   for line in f:
>     if f.isfirstline(): print "new file: %s" % f.filename()
>   return f.lineno()
>
> def f3(): # f2 simpler
>   f = FileInput(file)
>   for line in f:
>     pass
>   return f.lineno()

Yes, the fileinput module is A LOT slower than normal file processing.
You may use itertools.chain instead:

def f4():
   f = itertools.chain.from_iterable(open(fn) for fn in file)
   n = 0
   for line in f:
     n += 1
   return n

I get similar timings as f1() above.

Known major issues of this "poor man's" implementation:

- no lineno/filelineno/isfirstline attributes
- close() is implicit
- only for reading; inplace and backup don't work

-- 
Gabriel Genellina




More information about the Python-list mailing list