Pre-pep discussion material: in-place equivalents to map and filter

Sat Nov 5 04:42:55 EDT 2016

I've been giving your proposal a bit more thought, and while I can't say I'm
really keep on the idea, I have warmed slightly to it.

On Fri, 4 Nov 2016 07:29 am, Arthur Havlicek wrote:

> I understand that, the cost of change is such that it's very unlikely
> something like this ever goes into Python, but I feel like the interest of
> the proposition is being underestimated here, that's why I'm going to
> argue a few points and give a bit more context as needed.
> 
>> While mapping and filtering are common operations, I'm not sure mapping
>> and filtering and then reassigning back to the original sequence is
>> especially common.
> 
> It depends of your context. On the last 3 months, I stumbled across this
> at least 3 times,

I don't know who you are quoting there. It is considered impolite to quote
people without giving attribution, and makes it harder to respond. But for
what it's worth, I agree with this person.

In my code, it is quite common for me to map back to the same *variable*,
for example I might write something like this:

def process(alist):
    alist = [float(x) for x in alist]
    ...

But that is not the same as *in-place* modification of the list. I would
very rarely modify the list in place, but if I did, I would write it like
this:

    alist[:] = [float(x) for x in alist]

In my experience, wanting to modify a list in-place without creating a
temporary version first is unusual. And *needing* to do so (rather than
doing so as premature optimization) is even rarer.

That's my experience. But if you can demonstrate the need for in-place
modifications, that changes the situation somewhat.

> which is 3 times more than I used a lambda or a
> metaclass or a decorator or other fancy language feature that we simply
> avoid whenever possible.

You aren't helping your case by describing "lambda or ... decorator" as
fancy language features to be avoided.

The impression that gives (hopefully this is the wrong impression!) is that
you are a barely competent Python developer, fearful and suspicious of some
rather simple, powerful and widely-used features, stuck in an ancient 1980s
programming paradigm.

I trust that isn't actually the case, but by describing decorators and
lambda as too fancy to use, you're giving a good impression of somebody who
doesn't know what they're doing.

[...]
> The reason I'm being especially impacted by this is because I am
> maintainer of a decent-sized Python application (~50-100K lines of code)
> that extensively uses lists and dictionaries. We value "low level" data
> manipulation and efficiency a lot more than complex, non-obvious
> constructs.

And what do you consider "complex, non-obvious"?

[...]
> Like most Python programmers, I'm not in the case of needing a performance
> boost very bad, but that does not mean I disregard performance entirely.
> The need of performance is not so binary that it either don't matter at
> all or is enough to motivate a rewrite.

Indeed.

But you're arguing for a new feature on the basis that it will boost
performance, without giving any reason to think that it actually will lead
to a performance improvement!

I know that there is a chicken-and-egg problem here. Nobody wants to spend
time writing a new feature if there's no possibly chance for it to be
accepted, but when your sole argument for a new feature is that it will be
more efficient (in both memory and time!), then you need to prove that is
the case!

In other words, in my opinion:

- you would need to prove that there is a widespread need for in-place
modification of lists, where the lists are big enough that using a
temporary list is not practical;

- you would have to demonstrate a good reason to think that this new feature
will actually be more efficient and faster

before this feature would be seriously considered for addition to the
language.

You might think that it is obvious that in-place modification MUST be faster
since it avoids making a temporary copy, but that's not obviously true at
all! Not for a high-level language like Python. Its often the case that
algorithms can exchange space for time (use more memory to save time, or
use more time to save memory). It may be that this is one of the times.

Even when doing (nearly) everything in pure Python, making a new list and
then doing a slice assignment is virtually as fast as writing directly to
the original list:

py> from timeit import Timer
py> setup = "data = list(range(10000))"
py> t1 = Timer("""for i, a in enumerate(data):
...     data[i] = a+1
... """, setup=setup)
py> 
py> t2 = Timer("""tmp = [None]*len(data)
... for i, a in enumerate(data):
...     tmp[i] = a+1
... data[:] = tmp
... """, setup=setup)
py>
py> min(t1.repeat(number=10000, repeat=7))
36.63875983003527
py>
py> min(t2.repeat(number=10000, repeat=7))
37.70047474466264

Taking the plain Python for-loop, modifying the list directly in place, we
see that making a temporary list and then copying it over the original list
is just 3% slower. (And that's probably just random noise.)

But moving the heavy work into a list comprehension before copying over the
original: 

py> t3 = Timer("data[:] = [a+1 for a in data]", setup=setup)
py> min(t3.repeat(number=10000, repeat=7))
21.755106460303068

is 40% faster! Based on these results, my conclusion is that there's no
speed advantage to writing directly into the list, compared to writing to a
temporary list and then copying the lot in one go.

(I find these results unintuitive, but I've seen them often enough that I'm
confident they are real.)

> About Readability & Redundancy
> 
> I have misused the terms here, but I wasn't expecting so much nitpicking.

You must be new here :-)

-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.