Multi-dimensional list initialization

Shambhu Rajak shambhu.rajak at calsoftinc.com
Tue Nov 6 08:27:27 EST 2012


Well said Steve, I agree with you...
-Shambhu

-----Original Message-----
From: Steven D'Aprano [mailto:steve+comp.lang.python at pearwood.info] 
Sent: Tuesday, November 06, 2012 2:35 PM
To: python-list at python.org
Subject: Re: Multi-dimensional list initialization

On Mon, 05 Nov 2012 21:51:24 -0800, Andrew Robinson wrote:

> The most compact notation in programming really ought to reflect the 
> most *commonly* desired operation.  Otherwise, we're really just 
> making people do extra typing for no reason.

There are many reasons not to put minimizing of typing ahead of all other
values:

* Typically, code is written once and read many times. Minimizing
  typing might save you a second or two once, and then cost you many
  seconds every time you read the code. That's why we tell people to
  choose meaningful variable names, instead of naming everything "a" 
  and "b".

* Consistency of semantics is better than a plethora of special
  cases. Python has a very simple and useful rule: objects should
  not be copied unless explicitly requested to be copied. This is
  much better than having to remember whether this operation or
  that operation makes a copy. The answer is consistent: 

  (pardon me for belabouring the point here)

    Q: Does [0]*10 make ten copies of the integer object?
    A: No, list multiplication doesn't make copies of elements.

    Q: How about [0.0]*10?
    A: No, the elements are never copied.

    Q: What if I use a singleton? Does [None]*10 try to copy?
    A: No, the elements are never copied.

    Q: How about things like file objects that can't be copied?
    A: No, the elements are never copied.

    Q: What about [[]]*10?
    A: No, the elements are never copied.

    Q: How about if the elements are subclasses of list?
    A: No, the elements are never copied.

    Q: What about other mutable objects like sets or dicts?
    A: No, the elements are never copied.

    Q: What about instances of custom classes?
    A: No, the elements are never copied.

    Q: What if they are old-style Classic classes?
    A: No, the elements are never copied.

    Q: What if I do some funny tricks with the metaclass?
    A: No, the elements are never copied.

    Q: How about on Tuesdays? I bet they're copied on Tuesdays.
    A: No, the elements are never copied.



Your proposal throws away consistency for a trivial benefit on a rare use- case, and replaces it with a bunch of special cases:

    Q: What about [[]]*10?
    A: Oh yeah, I forgot about lists, they're copied.

    Q: How about if the elements are subclasses of list?
    A: Hmmm, that's a good one, I'm not actually sure.

    Q: How about if I use delegation to proxy a list?
    A: Oh no, they definitely won't be copied.

    Q: What about other mutable objects like sets or dicts?
    A: No, definitely not. Unless people complain enough.

    Q: What about instances of custom classes?
    A: That's a definite maybe.

    Q: How about on Tuesdays? I bet they're copied on Tuesdays.
    A: Only if you're in Belgium.


Losing consistency in favour of saving a few characters for something as uncommon as list multiplication is a poor tradeoff. That's why this proposal has been rejected again and again and again every time it has been suggested.

List multiplication [x]*n is conceptually equivalent to:

newlist = []
for i in range(n):
    newlist.append(x)

or if you prefer a list comp:

[x for i in range(n)]

This is nice and simple and efficient. Some objects cannot be copied at all. Copying other objects is slow and inefficient. Keeping list multiplication consistent, and fast, is MUCH more important than making it work as expected for the rare case of 2D arrays:

[[0]*n]*m

where there are other alternatives.


> Further, list comprehensions take quite a bit longer to run than low 
> level copies; by a factor of roughly 10. SO, it really would be worth 
> implementing the underlying logic -- even if it wasn't super easy.

Copying those elements does not come for free.

It is true that list multiplication can be much faster than a list comp. 
But that's because the list multiply doesn't have to inspect the elements, copy them, or engage the iteration machinery. Avoiding copying gives you a big saving:


[steve at ando ~]$ python3.3 -m timeit -s "x = range(1000)" 
"[x for _ in range(100)]"  # not copied
100000 loops, best of 3: 11.9 usec per loop

[steve at ando utilities]$ python3.3 -m timeit -s "x = range(1000)" 
"[x[:] for _ in range(100)]"  # copied
10000 loops, best of 3: 103 usec per loop

So there's a factor of ten difference right there. If list multiplication had to make copies, it would lose much of its speed advantage. For large enough lists, or complicated enough objects, it would become slower than a list comprehension.

It would be even slower if list multiplication had to inspect each element first and decide whether or not to copy.



> I really don't think doing a shallow copy of lists would break anyone's
> program.

Anyone who is currently using list multiplication with mutable objects is 
expecting that they will be the same object, and relying on that fact. 
Otherwise they wouldn't be using list multiplication.

You're suggesting a semantic change. Therefore they will be expecting 
something different from what actually happens. Result: broken code.

It's not just mutable objects. It's also objects that can't be copied. 
Result: mylist*3 used to work, now it raises an exception. And 
performance issues: what used to be fast is now slow.

Even if this change was allowed, it would have to go through a multi-year 
process. Python 3.3 is too late -- the absolute earliest would be Python 
3.4, which is scheduled for about 18 months from now. So in Python 3.4 
you could write:

from __future__ import list_multiplication_copying

to get the behaviour you want, and then in Python 3.5 it would become the 
default. That's three years until it becomes the standard. Meanwhile, 
there will still be millions of people using Python 2.7 or 3.2, and their 
code will behave differently from your code.

Conservatively, if you write code to support three previous releases, 
that means you can't use this feature until Python 3.7. So that's about 
six years before it can be used widely.

If the problem being solved was big enough, this would be worth doing. 
But it's not.


> The non-list elements, whatever they are, can be left as reference
> copies -- but any element which is a list ought to be shallow copied.

That's even worse than "list multiplication always copies". At least that 
is simple and consistent, even if it isn't consistent with the rest of 
the language, at least it is self-consistent. You are proposing something 
much worse: special cases to remember. "Objects aren't copied, except for 
lists, which are copied."

And then people will wonder why sets aren't copied, and dicts. People 
will make a 2D array like so:

[[0]*5]*10

and it will work. Then they'll write this:

[{}]*5

and wonder why it doesn't work the way they expect. Consistency is *much* 
more valuable than ad hoc DWIM semantics. Languages that try to Do What I 
Mean somehow end up Doing What Somebody Else Meant, But Not Me.



-- 
Steven





More information about the Python-list mailing list