So what's happening here?

Steven D'Aprano steve at pearwood.info
Fri Jun 5 09:51:55 EDT 2015


On Fri, 5 Jun 2015 11:11 pm, Paul Appleby wrote:

> On Fri, 05 Jun 2015 14:55:11 +0200, Todd wrote:
> 
>> Numpy arrays are not lists, they are numpy arrays. They are two
>> different data types with different behaviors.  In lists, slicing is a
>> copy.  In numpy arrays, it is a view (a data structure representing some
>> part of another data structure).  You need to explicitly copy the numpy
>> array using the "copy" method to get a copy rather than a view:
> 
> OK, thanks.  I see.
> 
> (I'd have thought that id(a[1]) and id(b[1]) would be the same if they
> were the same element via different "views", but the id's seem to change
> according to rules that I can't fathom.)

They're the same element, but not the same object.

The id() function operates on an object, and returns some arbitrary ID
number for that object. The only thing promised about that ID number is
that for any two distinct objects existing at the same time, they will have
distinct IDs.

Now, let's see what happens when we extract elements from a regular Python
list:

py> a = [1, 2, 3]
py> x = a[0]
py> y = a[0]
py> x is y
True
py> id(x) == id(y)
True

This tells us that extracting the first (or zeroth, if you prefer) element
from the list gives us the same object each time.

Now let's try it with a numpy array:

py> import numpy as np
py> b = np.array([1, 2, 3])
py> x = b[0]
py> y = b[0]
py> x is y
False
py> id(x), id(y)
(149859472, 151810312)


The IDs are clearly different, therefore they are different objects. What's
going on?

The secret is that lists contain objects, so when you extract the zeroth
item using a[0], you get the same object each time. But numpy arrays do not
contain objects, they are a wrapper around a C array of machine ints.

(The numpy array itself is an object, but the elements of that array are
not.)

This is one of the reasons why numpy is so fast: it can bypass all the
high-level Python object-oriented machinery, and perform calculations using
high-speed, low-level C code taking advantage of unboxed machine primitive
values.

But when you extract an element using b[0], numpy has to give you an object.
(Python itself has no concept of low-level machine values.) So it grabs the
32-bit integer at offset 0, converts it into an object, and returns that.
When you do it again, numpy goes through the same process, and returns a
second object with the same numeric value. Hence, the IDs are different.



-- 
Steven




More information about the Python-list mailing list