Generator slower than iterator?

Lie Ryan lie.1296 at gmail.com
Tue Dec 16 11:10:54 EST 2008


On Tue, 16 Dec 2008 12:07:14 -0300, Federico Moreira wrote:

> Hi all,
> 
> Im parsing a 4.1GB apache log to have stats about how many times an ip
> request something from the server.
> 
> The first design of the algorithm was
> 
> for line in fileinput.input(sys.argv[1:]):
>     ip = line.split()[0]
>     if match_counter.has_key(ip):
>         match_counter[ip] += 1
>     else:
>         match_counter[ip] = 1

nitpick:
dict.has_key is usually replaced with 
if ip in match_counter: ...

also, after investigating your code further, I see that you've 
unnecessarily used generators, the first code is simpler and you've not 
avoided any creation of huge intermediate list by using the generator 
this way. You won't get any performance improvement with this, and 
instead get a performance hit due to function overhead and name look up.




More information about the Python-list mailing list