One Python 2.1 idea

Mon Dec 25 23:18:02 EST 2000

[posted & mailed]

[on speeding listcomps]

[Alex Martelli]
> I might be, given that I think list comprehensions *deserve* wider
> usage and I'd be loath to think of people who *might* use them...
> and won't because of a silly 5%-performance-issue (I guess I might
> 'think of it as evolution in action', but, I'm a softy!-).

Cool!

> If I'm right that the issue is allocation, then I'd need a different
> opcode at the start (a build-list variation) and one at the end to
> say 'ok, listbuilding done now, please ensure it's a normal list
> without too much extra fat'; the former would change the append
> method pointer of the list object to {either reallocate exponentially,
> or, probably better, build the list in disjoint pieces -- as nothing
> BUT 'append' will be used on the list until ok-done-now time},
> the latter would (if needed) recompact it up and restore the normal
> append method for possible future uses.
>
> Am I wildly off-base...?  If not, it seems to me that procuring
> an acceptable diff-program is going to be the largest part of
> making a patch to submit...:-).

Easy one first:  the Cygwin diff works fine on Windows; or if you grab a
copy of the Python source off of SourceForge, cvs diff works OK too.  There
are detailed instructions for setting up cvs and ssh for use with
SourceForge on Windows at:

    http://python.sourceforge.net/winssh.txt

Harder one:  Unless lists grow very large, all evidence to date has said
that one-at-a-time appending displays linear-time behavior on all platforms
examined.  It's unfortunately the platforms that Guido uses that *never*
display worse behavior than that, so it's never been possible to convince
him it can be a drag in some cases.

That said, the builtin map currently uses PyList_SetItem on a preallocated
list when it can(*), while listcomps use a bound .append method which goes
thru listobject.c's hairer ins1.  Even if resizing is free, ins1 is simply
more involved than PyList_SetItem so costs more.  And it's unclear to me why
builtin_map doesn't use the cheaper-still PyList_SET_ITEM macro.

(*) (Apologies if you already know this) Nothing that's simple, because it's
OK for sequences to lie to map and listcomps about their lengths.  For
example,

>>> class LieLow:
	def __len__(self): return 0
	def __getitem__(self, i): return "abc"[i]

>>> class LieHigh:
	def __len__(self): return 1000000
	def __getitem__(self, i): return "xyz"[i]

>>> x, y = LieLow(), LieHigh()
>>> len(x), len(y)
(0, 1000000)
>>> map(lambda i: i, x) + map(lambda i: i, y)
['a', 'b', 'c', 'x', 'y', 'z']
>>> [i for i in x] + [i for i in y]
['a', 'b', 'c', 'x', 'y', 'z']
>>>

It's actually the raising of IndexError that controls how long the results
are.  In the "map" example, the map over x falls back to using one-at-a-time
append because len(x) guessed too low about the true length, while the map
over y grossly overallocates a result list and cuts it back at the end.

In short, len() can't be trusted, so preallocation can't rely on it
(nevertheless, it's *usually* correct, and trusting it as far as is possible
pays off big for map).

Curiously, listcomps were implemented without any additions to the PVM; in
effect, [f(x) for x in s] is macro expanded to:

    anonymous1 = []
    __1__ = anonymous1.append
    for anonymous2 in s:
        __1__(f(anonymous2))
    # result is now anonymous1

One way to speed that is map-like (where var names are really invisible
temps):

    guess = len(s)
    result = [None] * guess
    i = 0
    try:
        while 1:
            elt = s[i]  # this may raise IndexError
            val = f(elt)
            if i < guess:
                result[i] = val
            else:
                result.append(val)
            i += 1
    except IndexError:
        pass
    if i < guess:
        del result[i:]

There are of course many ways to introduce opcodes to speed that.  It's
curious that map() requires that sequences implement __len__ (although
tolerates them lying), while listcomps currently don't.

Another way to speed it would be to leave the listcomp implementation alone
but make one-at-a-time append faster.  That may be more attractive!
one-at-a-time append is very common in Python programs.  Kevin O'Connor was
interested in this back when we were all too busy finishing 2.0 to play
along; unfortunately, he appears to have vanished.  A DejaNews search should
turn up a patch of his that sped listcomps by 30% for him this way.  That's
both easier and harder <0.5 wink -- see the DejaNews thread for why>.