[Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering

Sun Mar 31 01:38:09 EDT 2013

On Sun, Mar 31, 2013 at 12:50 AM, Matthew Brett <matthew.brett at gmail.com> wrote:
> Hi,
>
> On Sat, Mar 30, 2013 at 9:37 PM,  <josef.pktd at gmail.com> wrote:
>> On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>> Hi,
>>>
>>> On Sat, Mar 30, 2013 at 7:02 PM,  <josef.pktd at gmail.com> wrote:
>>>> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>> On Sat, Mar 30, 2013 at 7:50 PM,  <josef.pktd at gmail.com> wrote:
>>>>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
>>>>>> <brad.froehle at gmail.com> wrote:
>>>>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett <matthew.brett at gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On Sat, Mar 30, 2013 at 2:20 PM,  <josef.pktd at gmail.com> wrote:
>>>>>>>> > On Sat, Mar 30, 2013 at 4:57 PM,  <josef.pktd at gmail.com> wrote:
>>>>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
>>>>>>>> >> <matthew.brett at gmail.com> wrote:
>>>>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd at gmail.com> wrote:
>>>>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
>>>>>>>> >>>> <matthew.brett at gmail.com> wrote:
>>>>>>>> >>>>>
>>>>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index
>>>>>>>> >>>>> ordering.
>>>>>>>> >>>>>
>>>>>>>> >>>>> This is very confusing.  We think the index ordering and memory
>>>>>>>> >>>>> ordering ideas need to be separated, and specifically, we should
>>>>>>>> >>>>> avoid
>>>>>>>> >>>>> using "C" and "F" to refer to index ordering.
>>>>>>>> >>>>>
>>>>>>>> >>>>> Proposal
>>>>>>>> >>>>> -------------
>>>>>>>> >>>>>
>>>>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>>>>>>>> >>>>> index ordering for ravel, reshape
>>>>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling
>>>>>>>> >>>>> in
>>>>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>>>>>>>> >>>>> naming idea by Paul Ivanov)
>>>>>>>> >>>>>
>>>>>>>> >>>>> What do y'all think?
>>>>>>>> >>>>
>>>>>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought
>>>>>>>> >>>> about
>>>>>>>> >>>> the content and never about the memory when using it.
>>>>>>>> >>
>>>>>>>> >> changing the names doesn't make it easier to understand.
>>>>>>>> >> I think the confusion is because the new A and K refer to existing
>>>>>>>> >> memory
>>>>>>>> >>
>>>>>>>>
>>>>>>>> I disagree, I think it's confusing, but I have evidence, and that is
>>>>>>>> that four out of four of us tested ourselves and got it wrong.
>>>>>>>>
>>>>>>>> Perhaps we are particularly dumb or poorly informed, but I think it's
>>>>>>>> rash to assert there is no problem here.
>>>>>>
>>>>>> I think you are overcomplicating things or phrased it as a "trick question"
>>>>>
>>>>> I don't know what you mean by trick question - was there something
>>>>> over-complicated in the example?  I deliberately didn't include
>>>>> various much more confusing examples in "reshape".
>>>>
>>>> I meant making the "candidates" think about memory instead of just
>>>> column versus row stacking.
>>>
>>> To be specific, we were teaching about reshaping a (I, J, K, N) 4D
>>> array, it was an image, with time as the 4th dimension (N time
>>> points).   Raveling and reshaping 3D and 4D arrays is a common thing
>>> to do in neuroimaging, as you can imagine.
>>>
>>> A student asked what he would get back from raveling this array, a
>>> concatenated time series, or something spatial?
>>>
>>> We showed (I'd worked it out by this time) that the first N values
>>> were the time series given by [0, 0, 0, :].
>>>
>>> He said - "Oh - I see - so the data is stored as a whole lot of time
>>> series one by one, I thought it would be stored as a series of
>>> images'.
>>>
>>> Ironically, this was a Fortran-ordered array in memory, and he was wrong.
>>>
>>> So, I think the idea of memory ordering and index ordering is very
>>> easy to confuse, and comes up naturally.
>>>
>>> I would like, as a teacher, to be able to say something like:
>>>
>>> This is what C memory layout is (it's the memory layout  that gives
>>> arr.flags.C_CONTIGUOUS=True)
>>> This is what F memory layout is (it's the memory layout  that gives
>>> arr.flags.F_CONTIGUOUS=True)
>>> It's rather easy to get something that is neither C or F memory layout
>>> Numpy does many memory layouts.
>>> Ravel and reshape and numpy in general do not care (normally) about C
>>> or F layouts, they only care about index ordering.
>>>
>>> My point, that I'm repeating, is that my job is made harder by
>>> 'arr.ravel('F')'.
>>
>> But once you know that ravel and reshape don't care about memory, the
>> ravel is easy to predict (maybe not easy to visualize in 4-D):
>
> But this assumes that you already know that there's such a thing as
> memory layout, and there's such a thing as index ordering, and that
> 'C' and 'F' in ravel refer to index ordering.  Once you have that,
> you're golden.  I'm arguing it's markedly harder to get this
> distinction, and keep it in mind, and teach it, if we are using the
> 'C' and 'F" names for both things.

No, I think you are still missing my point.
I think explaining ravel and reshape F and C is easy (kind of) because the
students don't need to know at that stage about memory layouts.

All they need to know is that we look at n-dimensional objects in
C-order or in  F-order
(whichever index runs fastest)

>
>> order=C: stack the last dimension, N, time series of one 3d pixels,
>> then stack the time series of the next pixel...
>>     process pixels by depth and the row by row (like old TVs)
>>
>> I assume you did this because your underlying array is C contiguous.
>> so your ravel('C') is a c-contiguous view (instead of some weird
>> strides or a copy)
>
> Sorry - what do you mean by 'this' in 'did this'?  Reshape?   Why
> would it matter what my underlying array memory layout was?

`this` was use ravel('C') and have time series as last index.
Because if we have a few gigabytes of video recordings, we better
match the ravel order with the memory order.
I thought you picked time N in the last axis, so you can have
fast access to time series (assuming you didn't specify F-contiguous).
(it's not confusing: we have two orders, index/iterator and memory,
and to get a nice view, the two should match)

rereading: since you had F-ordered memory, ravel('F') gives the nice
view (a picture at a time instead of a timeseries at a time)

>
>> I usually prefer time in the first dimension, and stack order=F, then
>> I can start at the front, stack all time periods of the first pixel,
>> keep going and work pixels down the columns, first page, next page,
>> ...
>> (and I hope I have a F-contiguous array, so my raveled array is also
>> F-contiguous.)
>>
>> (note: I'm bringing memory back in as optimization, but not to predict
>> the stacking)
>>
>> Josef
>> (I think brains are designed for Fortran order and C-ordering in numpy
>> is a accident,
>> except, reading a Western language book is neither)
>
> Yes, I find first axis fastest changing easier to think about, and I
> came from MATLAB (about 8 years ago mind), so that also made it more
> natural.
>
> I had (until yesterday) simply assumed that numpy unraveled that way,
> because it seemed more obvious to me, and knew that the unravel index
> order need have nothing to do with the memory order, or the fact that
> arrays are C contiguous by default.   Not so of course.  That's not my
> complaint as you know - it's just a convention, I guessed the
> convention wrong.

Numpy was written by C developers, and one of the first things I learned
about numpy is the ``order``:
Default is always C     (except for linalg)
and axis=None (except in scipy.stats), and dimensions disappear in reduce

Cheers,

Josef

>
> Cheers,
>
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion