Deep vs. shallow copy?

Zachary Ware zachary.ware+pylist at gmail.com
Wed Mar 12 11:00:09 EDT 2014


On Wed, Mar 12, 2014 at 9:25 AM, Alex van der Spek <zdoor at xs4all.nl> wrote:
> I think I understand the difference between deep vs. shallow copies but
> I was bitten by this:
>
> with open(os.path.join('path', 'foo.txt', 'rb') as txt:
>      reader = csv.reader(txt)
>      data = [row.append(year) for row in reader]
>
> This does not work although the append does complete. The below works:
>
> with open(os.path.join('path', 'foo.txt', 'rb') as txt:
>      reader = csv.reader(txt)
>      data = [row + [year] for row in reader]
>
> However in this context I am baffled. If someone can explain what is
> going on here, I would be most grateful.

Deep/shallow copying doesn't really come into this.  row.append()
mutates the list (row), it doesn't return a new list.  Like most
in-place/mutating methods in Python, it returns None instead of self
to show that mutation was done, so your listcomp fills `data` with
Nones; there is no copying done at all.  The second example works as
you expected because `row + [year]` results in a new list, which the
listcomp is happy to append to `data`--which does mean that `row` is
copied.

To avoid the copy that the second listcomp is doing (which really
shouldn't be necessary anyway, unless your rows are astronomically
huge), you have a couple of options.  First, you can expand your
listcomp and use append:

   with open(os.path.join('path', 'foo.txt'), 'rb') as txt: # with
your typo fixed ;)
       reader = csv.reader(txt)
       data = []
       for row in reader:
           row.append(year)
           data.append(row)

To me, that's pretty readable and pretty clear about what it's doing.
Then there's this option, which I don't recommend:

   import operator
   with open(os.path.join('path', 'foo.txt'), 'rb') as txt:
       reader = csv.reader(txt)
       data = [operator.iadd(row, [year]) for row in reader]

This works because operator.iadd is basically shorthand for
row.__iadd__([year]), which does return self (otherwise, the
assignment part of `row += [year]` couldn't work).  But, it's not as
clear about what's happening, and only saves a whole two lines (maybe
3 if you already have operator imported).

Hope this helps,
-- 
Zach



More information about the Python-list mailing list