[Tutor] Copy and Deepcopy (List aliasing)

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Sat, 14 Jul 2001 22:28:45 -0700 (PDT)


On Sat, 14 Jul 2001, Christopher Smith wrote:

> I create a list
>
> 	>>> l=[0,0]
>
> and make a copy with a new id (by using l+[] on the rhs instead of
> just l)
>
> 	>>> n=l+[]
>

[... some text cut]

>
> I know that using n=deepcopy(l) fixes the problem, but can anyone help
> me understand what is going on?  Is there any other simple way to make
> a copy of l without using deepcopy?  The "=" is a little more tricky
> than I have suspected.



[warning: if you haven't played around with lists yet, this message might
be confusing.]


Using list addition is a good approach to create a new list.  Another way
that I've seen people do it is to take a slice of the whole list:

    n = l[:]

Slices, too, create a new "shallow" copy of the list.  I think that the
subject of shallow vs. deep copying is at the heart of your question.  To
see what the difference between shallow and deep copying, we will need to
draw some pictures.


First, let's talk about what happens when we do something like this:

###
>>> mylist_1 = [1, 2, 3, 4]
>>> mylist_2 = mylist_1
>>> mylist_2[-1] = 'four'
>>> mylist_1
[1, 2, 3, 'four']
>>> mylist_2
[1, 2, 3, 'four']
###


What we end up holding in our hands might look something like this:


Variables                    List Instance
=========                    =============

mylist_1 --------------     +----+----+----+----+
                       \    |    |    |    |    |
                        +-->|  . |  . |  . |  . |
mylist_2 --------------/    +--+-+--+-+--+-+--+-+
                               |    |    |    |
                               |    |    |    |
                               v    v    v    v
                               1    2    3    4


(*pant* There just HAS to be an easier way to do ASCII graphics...)


Whenever we use '=', we are always direct a "name", the variable, off in
the direction of some "thing".  In this example, the incantation:

    mylist_1 = [1, 2, 3, 4]

causes Python to create a list with four numbers.  Except that, in truth,
the list doesn't actually "contain" those numbers!  Instead, we get four
boxes that act like variables: they too can direct toward things, like
numbers.  That's why the numbers aren't drawn within the smaller boxes
above.  It seems like a needless distinction, but give it a moment.



What happens when we do a shallow copy of a simple list?

###
>>> mylist = [1, 2, 3, 4]
>>> shallow_copy = mylist[:]
>>> mylist[0] = 'one'
>>> mylist
['one', 2, 3, 4]
>>> shallow_copy
[1, 2, 3, 4]
###

Again, pictures will help a lot:


mylist_1 --------------     +----+----+----+----+
                       \    |    |    |    |    |
                        +-->|  . |  . |  . |  . |
                            +--+-+--+-+--+-+--+-+
                               |    |    |    |
                               |    |    |    |
                               V    V    V    V
                          1  'one'  2    3    4
                          ^         ^    ^    ^
                          +----+    |    |    |
                               |    |    |    |
                            +--+-+--+-+--+-+--+-+
                            |  | |  | |  | |  | |
                        +-->|  . |  . |  . |  . |
shallow_copy ----------/    +----+----+----+----+
                               
                            

This is a "shallow" copy: Python will create a new list with the length
that we request.  However, each of the list elements will get directed
toward the same thing that the original list pointed to.  Although
'shallow_copy' is somewhat separate from 'mylist_1', there's still some
potential interference possible between the two.



And that's precisely the behavior we see with nested lists:

###
>>> books = [['molecular biology of the cell'],
...          ['the prydain chronicles']]
>>> shallow_books = books[:]
###


books -----------------     +----+----+
                       \    |    |    |
                        +-->|  . |  . |
                            +--+-+--+-+
                               |    |
                               |    +-------+
                               V            V
                            +----+        +----+
    "molecular bio..." <----+-.  |        |  .-+---> "the pry..."
                            |    |        |    |
                            +----+        +----+  
                               ^            ^
                               |    +-------+
                               |    |
                            +--+-+--+-+
                            |  | |  | |
                        +-->|  . |  . |
shallow_books ---------/    +----+----+



This model predicts that if we make a change to the large lists, we
shouldn't see that change in its shallow "clone".  However, if we make a
change in one of the small inner lists, we'll see that change from both
'books' and 'shallow books'.  Let's see:

###
>>> books[0] = ['core python programming'] 
>>> books[1][0] = 'the return of the king'
>>> books
[['core python programming'], ['the return of the king']]
>>> shallow_books
[['molecular biology of the cell'], ['the return of the king']]
###

If we draw out the picture, what we'll see will be complicated, but it
should also make sense.

Hopefully, this shows some of the motivations for deep copying: deep
copying not only makes a top-level copy of the list, but it delves into
the structure, making copies of every inner list too.  But I'd better stop
here, or else the next picture won't fit in 23 rows...

Hope this helps!