Generator slower than iterator?

Gary Herron gherron at islandtraining.com
Tue Dec 16 11:30:00 EST 2008


Lie Ryan wrote:
> On Tue, 16 Dec 2008 12:07:14 -0300, Federico Moreira wrote:
>
>   
>> Hi all,
>>
>> Im parsing a 4.1GB apache log to have stats about how many times an ip
>> request something from the server.
>>
>> The first design of the algorithm was
>>
>> for line in fileinput.input(sys.argv[1:]):
>>     ip = line.split()[0]
>>     if match_counter.has_key(ip):
>>         match_counter[ip] += 1
>>     else:
>>         match_counter[ip] = 1
>>
>> And it took 3min 58 seg to give me the stats
>>
>> Then i tried a generator solution like
>>
>> def generateit():
>>     for line in fileinput.input(sys.argv[1:]):
>>         yield line.split()[0]
>>
>> for ip in generateit():
>>     ...the same if sentence
>>
>> Instead of being faster it took 4 min 20 seg
>>
>> Should i leave fileinput behind?
>> Am i using generators with the wrong aproach?
>>     
>
> What's fileinput? A file-like object (unlikely)? Also, what's 
> fileinput.input? I guess the reason why you don't see much difference 
> (and is in fact slower) lies in what fileinput.input does.
>
>   

Fileinput is a standard module distributed with Python:


>From the manual:

11.2 fileinput -- Iterate over lines from multiple input streams

This module implements a helper class and functions to quickly write a
loop over standard input or a list of files.

The typical use is:

import fileinput
for line in fileinput.input():
    process(line)

 ...


> Generators excels in processing huge data since it doesn't have to create 
> huge intermediate lists which eats up memory, given an infinite memory, a 
> generator solution is almost always slower than straight up solution 
> using lists. However in real life we don't have infinite memory, hogging 
> our memory with the huge intermediate list would make the system start 
> swapping, swapping is very slow and is a big hit to performance. This is 
> the way generator could be faster than list.
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>   




More information about the Python-list mailing list