generator expressions: performance anomaly?
John Machin
sjmachin at lexicon.net
Sun Jan 16 16:51:49 EST 2005
On Sun, 16 Jan 2005 12:18:23 GMT, "Raymond Hettinger"
<vze4rx4y at verizon.net> wrote:
>"John Machin" <sjmachin at lexicon.net> wrote in message
>news:1105854381.117059.59940 at c13g2000cwb.googlegroups.com...
>> Please consider the timings below, where a generator expression starts
>> out slower than the equivalent list comprehension, and gets worse:
>>
>> >python -m timeit -s "orig=range(100000)" "lst=orig[:];lst[:]=(x for x
>> in orig)"
> . . .
>> >python -m timeit -s "orig=range(200000)" "lst=orig[:];lst[:]=(x for x
>> in orig)"
>
>This has nothing to do with genexps and everything to do with list slice
>assignment.
>
>List slice assignment is an example of a tool with a special case optimization
>for inputs that know their own length -- that enables the tool to pre-allocate
>its result rather than growing and resizing in spurts. Other such tools include
>tuple(), map() and zip().
>
My reading of the source: if the input is not a list or tuple, a
(temporary) tuple is built from the input, using PySequence_Tuple() in
abstract.c. If the input cannot report its own length, then that
function resorts to "growing and resizing in spurts", using the
following code:
if (j >= n) {
if (n < 500)
n += 10;
else
n += 100;
if (_PyTuple_Resize(&result, n) != 0) {
Perhaps it could be changed to use a proportional increase, like
list_resize() in listobject.c, which advertises (amortised) linear
time. Alternative: build a temporary list instead?
More information about the Python-list
mailing list