Generator slower than iterator?
MRAB
google at mrabarnett.plus.com
Tue Dec 16 10:25:50 EST 2008
Federico Moreira wrote:
> Hi all,
>
> Im parsing a 4.1GB apache log to have stats about how many times an ip
> request something from the server.
>
> The first design of the algorithm was
>
> for line in fileinput.input(sys.argv[1:]):
> ip = line.split()[0]
> if match_counter.has_key(ip):
> match_counter[ip] += 1
> else:
> match_counter[ip] = 1
>
> And it took 3min 58 seg to give me the stats
>
> Then i tried a generator solution like
>
> def generateit():
> for line in fileinput.input(sys.argv[1:]):
> yield line.split()[0]
>
> for ip in generateit():
> ...the same if sentence
>
> Instead of being faster it took 4 min 20 seg
>
> Should i leave fileinput behind?
> Am i using generators with the wrong aproach?
>
Your first design is already simple to understand, so I think that using
a generator isn't necessary (and probably isn't worth the cost!).
You might want to try defaultdict instead of dict to see whether that
would be faster:
from collections import defaultdict
match_counter = defaultdict(int)
for line in fileinput.input(sys.argv[1:]):
ip = line.split()[0]
match_counter[ip] += 1
More information about the Python-list
mailing list