Efficient way to break up a list into two pieces

Sat Feb 20 22:40:42 EST 2010

On Sat, 20 Feb 2010 17:55:18 -0800, marwie wrote:

> On 21 Feb., 02:30, Steven D'Aprano <st... at REMOVE-THIS-
> cybersource.com.au> wrote:
>> Python lists are arrays of pointers to objects, so copying a slice is
>> fast: it doesn't have to copy the objects, just pointers. Deleting from
>> the end of the list is also quick, because you don't have to move
>> memory, just clear some pointers and change the length field.
>>
>> Splitting such an array without copying data is, essentially,
>> impossible. Python lists aren't linked lists.
> 
> Well, to split a C array I would simply set l2 to point to l1[10] and
> then change the length of l1 (which I store somewhere else). No copying
> of elements needed. I would have assumed that python can do something
> like this with its internal arrays of pointers, too.

Python lists aren't C arrays either.

Python lists are *objects*. Everything in Python is an object. 
Consequently, Python lists have a header, which includes the len. You 
don't store the length somewhere else, you store it in the object: this 
makes for less complexity. You can't just point l2 at an arbitrary index 
in an array and expect it to work, it needs a header.

Additionally, Python lists are over-allocated so that appends are fast. A 
list of (say) 1000 items might be over-allocated to (say) 1024 items, so 
that you can do 24 appends before the array is full and the array needs 
to be resized. This means that, on average, Python list appending is O(1) 
instead of O(N). You can't just change the length blindly, you need to 
worry about the over-allocation.

I'm sympathetic to your concern: I've often felt offended that doing 
something like this:

x = SomeReallyBigListOrString
for item in x[1:]:
    process(item)

has to copy the entire list or string (less the first item). But 
honestly, I've never found a situation where it actually mattered.

> Anyway, this was more a question about coding style. I use l1.extend(l2)
> or l1 += l2 rather than l1 = l1 + l2 because it's as readable and
> possibly faster.

I really, really hate augmented assignment for everything other than ints 
and floats because you can't predict what it will do. Take this for 
example:

>>> mylist = [1, 2, 3, 4]
>>> same_again = mylist
>>> mylist.append(5)
>>> same_again
[1, 2, 3, 4, 5]

mylist and same_again are the same object, so append and extend behave 
predictably and simply. So is simple too:

>>> mylist = mylist + [-1, -2, -3]
>>> mylist
[1, 2, 3, 4, 5, -1, -2, -3]
>>> same_again
[1, 2, 3, 4, 5]

Because you have re-bound the name mylist to a new list, same_again 
doesn't get modified. Nice.

But what about mylist += something else? Is it an in-place modification, 
like extend and append, or a rebinding, like +? Who can remember? The 
answer will depend on whether mylist is a builtin list, or a subclass.

And then there's this mystery:

>>> mylist = [1, 2, 3]
>>> mytuple = (mylist, None)
>>> mytuple[0] += [4]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
>>> mylist
[1, 2, 3, 4]

So in fact, += is *always* a rebinding, but *sometimes* it is an in-place 
modification as well. Yuck.

-- 
Steven