Generator slower than iterator?
Gary Herron
gherron at islandtraining.com
Tue Dec 16 11:30:00 EST 2008
Lie Ryan wrote:
> On Tue, 16 Dec 2008 12:07:14 -0300, Federico Moreira wrote:
>
>
>> Hi all,
>>
>> Im parsing a 4.1GB apache log to have stats about how many times an ip
>> request something from the server.
>>
>> The first design of the algorithm was
>>
>> for line in fileinput.input(sys.argv[1:]):
>> ip = line.split()[0]
>> if match_counter.has_key(ip):
>> match_counter[ip] += 1
>> else:
>> match_counter[ip] = 1
>>
>> And it took 3min 58 seg to give me the stats
>>
>> Then i tried a generator solution like
>>
>> def generateit():
>> for line in fileinput.input(sys.argv[1:]):
>> yield line.split()[0]
>>
>> for ip in generateit():
>> ...the same if sentence
>>
>> Instead of being faster it took 4 min 20 seg
>>
>> Should i leave fileinput behind?
>> Am i using generators with the wrong aproach?
>>
>
> What's fileinput? A file-like object (unlikely)? Also, what's
> fileinput.input? I guess the reason why you don't see much difference
> (and is in fact slower) lies in what fileinput.input does.
>
>
Fileinput is a standard module distributed with Python:
>From the manual:
11.2 fileinput -- Iterate over lines from multiple input streams
This module implements a helper class and functions to quickly write a
loop over standard input or a list of files.
The typical use is:
import fileinput
for line in fileinput.input():
process(line)
...
> Generators excels in processing huge data since it doesn't have to create
> huge intermediate lists which eats up memory, given an infinite memory, a
> generator solution is almost always slower than straight up solution
> using lists. However in real life we don't have infinite memory, hogging
> our memory with the huge intermediate list would make the system start
> swapping, swapping is very slow and is a big hit to performance. This is
> the way generator could be faster than list.
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
More information about the Python-list
mailing list