[Numpy-discussion] dimension aligment

Anne Archibald peridot.faceted at gmail.com
Tue May 20 14:04:46 EDT 2008


2008/5/20 Thomas Hrabe <thrabe at burnham.org>:

> given a 3d array
> a =
> numpy.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]],[[13,14,15],[16,17,18]],[[19,20,21],[22,23,24]]])
> a.shape
> returns (4,2,3)
>
> so I assume the first digit is the 3rd dimension, second is 2nd dim and
> third is the first.
>
> how is the data aligned in memory now?
> according to the strides it should be
> 1,2,3,4,5,6,7,8,9,10,...
> right?
>
> if I had an array of more dimensions, the first digit returned by shape
> should always be the highest dim.

You are basically right, but this is a surprisingly subtle issue for numpy.

A numpy array is basically a block of memory and some description. One
piece of that description is the type of data it contains (i.e., how
to interpret each chunk of memory) for example int32, float64, etc.
Another is the sizes of all the various dimensions. A third piece,
which makes many of the things numpy does possible, is the "strides".
The way numpy works is that basically it translates

A[i,j,k]

into a lookup of the item in the memory block at position

i*strides[0]+j*strides[1]+k*strides[2]

This means, if you have an array A and you want every second element
(A[::2]), all numpy needs to do is hand you back a new array pointing
to the same data block, but with strides[0] doubled. Similarly if you
want to transpose a two-dimensional array, all it needs to do is
exchange strides[0] and strides[1]; no data need be moved.

This means, though, that if you are handed a numpy array, the elements
can be arranged in memory in quite a complicated fashion. Sometimes
this is no problem - you can always use the strides to find it all.
But sometimes you need the data arranged in a particular way. numpy
defines two particular ways: "C contiguous" and "FORTRAN contiguous".

"C contiguous" arrays are what you describe, and they're what numpy
produces by default; they are arranged so that the rightmost index has
the smallest stride. "FORTRAN contiguous" arrays are arranged the
other way around; the leftmost index has the smallest stride. (This is
how FORTRAN arrays are arranged in memory.)

There is also a special case: the reshape() function changes the shape
of the array. It has an "order" argument that describes not how the
elements are arranged in memory but how you want to think of the
elements as arranged in memory for the reshape operation.

Anne



More information about the NumPy-Discussion mailing list