[Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering

josef.pktd at gmail.com josef.pktd at gmail.com
Sun Mar 31 00:37:50 EDT 2013


On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett <matthew.brett at gmail.com> wrote:
> Hi,
>
> On Sat, Mar 30, 2013 at 7:02 PM,  <josef.pktd at gmail.com> wrote:
>> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>> Hi,
>>>
>>> On Sat, Mar 30, 2013 at 7:50 PM,  <josef.pktd at gmail.com> wrote:
>>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
>>>> <brad.froehle at gmail.com> wrote:
>>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett <matthew.brett at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> On Sat, Mar 30, 2013 at 2:20 PM,  <josef.pktd at gmail.com> wrote:
>>>>>> > On Sat, Mar 30, 2013 at 4:57 PM,  <josef.pktd at gmail.com> wrote:
>>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
>>>>>> >> <matthew.brett at gmail.com> wrote:
>>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd at gmail.com> wrote:
>>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
>>>>>> >>>> <matthew.brett at gmail.com> wrote:
>>>>>> >>>>>
>>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index
>>>>>> >>>>> ordering.
>>>>>> >>>>>
>>>>>> >>>>> This is very confusing.  We think the index ordering and memory
>>>>>> >>>>> ordering ideas need to be separated, and specifically, we should
>>>>>> >>>>> avoid
>>>>>> >>>>> using "C" and "F" to refer to index ordering.
>>>>>> >>>>>
>>>>>> >>>>> Proposal
>>>>>> >>>>> -------------
>>>>>> >>>>>
>>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>>>>>> >>>>> index ordering for ravel, reshape
>>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling
>>>>>> >>>>> in
>>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>>>>>> >>>>> naming idea by Paul Ivanov)
>>>>>> >>>>>
>>>>>> >>>>> What do y'all think?
>>>>>> >>>>
>>>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought
>>>>>> >>>> about
>>>>>> >>>> the content and never about the memory when using it.
>>>>>> >>
>>>>>> >> changing the names doesn't make it easier to understand.
>>>>>> >> I think the confusion is because the new A and K refer to existing
>>>>>> >> memory
>>>>>> >>
>>>>>>
>>>>>> I disagree, I think it's confusing, but I have evidence, and that is
>>>>>> that four out of four of us tested ourselves and got it wrong.
>>>>>>
>>>>>> Perhaps we are particularly dumb or poorly informed, but I think it's
>>>>>> rash to assert there is no problem here.
>>>>
>>>> I think you are overcomplicating things or phrased it as a "trick question"
>>>
>>> I don't know what you mean by trick question - was there something
>>> over-complicated in the example?  I deliberately didn't include
>>> various much more confusing examples in "reshape".
>>
>> I meant making the "candidates" think about memory instead of just
>> column versus row stacking.
>
> To be specific, we were teaching about reshaping a (I, J, K, N) 4D
> array, it was an image, with time as the 4th dimension (N time
> points).   Raveling and reshaping 3D and 4D arrays is a common thing
> to do in neuroimaging, as you can imagine.
>
> A student asked what he would get back from raveling this array, a
> concatenated time series, or something spatial?
>
> We showed (I'd worked it out by this time) that the first N values
> were the time series given by [0, 0, 0, :].
>
> He said - "Oh - I see - so the data is stored as a whole lot of time
> series one by one, I thought it would be stored as a series of
> images'.
>
> Ironically, this was a Fortran-ordered array in memory, and he was wrong.
>
> So, I think the idea of memory ordering and index ordering is very
> easy to confuse, and comes up naturally.
>
> I would like, as a teacher, to be able to say something like:
>
> This is what C memory layout is (it's the memory layout  that gives
> arr.flags.C_CONTIGUOUS=True)
> This is what F memory layout is (it's the memory layout  that gives
> arr.flags.F_CONTIGUOUS=True)
> It's rather easy to get something that is neither C or F memory layout
> Numpy does many memory layouts.
> Ravel and reshape and numpy in general do not care (normally) about C
> or F layouts, they only care about index ordering.
>
> My point, that I'm repeating, is that my job is made harder by
> 'arr.ravel('F')'.

But once you know that ravel and reshape don't care about memory, the
ravel is easy to predict (maybe not easy to visualize in 4-D):

order=C: stack the last dimension, N, time series of one 3d pixels,
then stack the time series of the next pixel...
    process pixels by depth and the row by row (like old TVs)

I assume you did this because your underlying array is C contiguous.
so your ravel('C') is a c-contiguous view (instead of some weird
strides or a copy)

I usually prefer time in the first dimension, and stack order=F, then
I can start at the front, stack all time periods of the first pixel,
keep going and work pixels down the columns, first page, next page,
...
(and I hope I have a F-contiguous array, so my raveled array is also
F-contiguous.)

(note: I'm bringing memory back in as optimization, but not to predict
the stacking)

Josef
(I think brains are designed for Fortran order and C-ordering in numpy
is a accident,
except, reading a Western language book is neither)


>
> Cheers,
>
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion



More information about the NumPy-Discussion mailing list