Generator slower than iterator?

MRAB google at mrabarnett.plus.com
Tue Dec 16 10:25:50 EST 2008


Federico Moreira wrote:
> Hi all,
> 
> Im parsing a 4.1GB apache log to have stats about how many times an ip 
> request something from the server.
> 
> The first design of the algorithm was
> 
> for line in fileinput.input(sys.argv[1:]):
>     ip = line.split()[0]
>     if match_counter.has_key(ip):
>         match_counter[ip] += 1
>     else:
>         match_counter[ip] = 1
> 
> And it took 3min 58 seg to give me the stats
> 
> Then i tried a generator solution like
> 
> def generateit():
>     for line in fileinput.input(sys.argv[1:]):
>         yield line.split()[0]
> 
> for ip in generateit():
>     ...the same if sentence
> 
> Instead of being faster it took 4 min 20 seg
> 
> Should i leave fileinput behind?
> Am i using generators with the wrong aproach?
> 
Your first design is already simple to understand, so I think that using 
a generator isn't necessary (and probably isn't worth the cost!).

You might want to try defaultdict instead of dict to see whether that 
would be faster:

from collections import defaultdict

match_counter = defaultdict(int)
for line in fileinput.input(sys.argv[1:]):
     ip = line.split()[0]
     match_counter[ip] += 1




More information about the Python-list mailing list