[Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering

Ralf Gommers ralf.gommers at gmail.com
Sun Mar 31 17:03:05 EDT 2013


On Sun, Mar 31, 2013 at 10:43 PM, <josef.pktd at gmail.com> wrote:

> On Sun, Mar 31, 2013 at 3:54 PM, Matthew Brett <matthew.brett at gmail.com>
> wrote:
> > Hi,
> >
> > On Sat, Mar 30, 2013 at 10:38 PM,  <josef.pktd at gmail.com> wrote:
> >> On Sun, Mar 31, 2013 at 12:50 AM, Matthew Brett <
> matthew.brett at gmail.com> wrote:
> >>> Hi,
> >>>
> >>> On Sat, Mar 30, 2013 at 9:37 PM,  <josef.pktd at gmail.com> wrote:
> >>>> On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett <
> matthew.brett at gmail.com> wrote:
> >>>>> Hi,
> >>>>>
> >>>>> On Sat, Mar 30, 2013 at 7:02 PM,  <josef.pktd at gmail.com> wrote:
> >>>>>> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett <
> matthew.brett at gmail.com> wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> On Sat, Mar 30, 2013 at 7:50 PM,  <josef.pktd at gmail.com> wrote:
> >>>>>>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
> >>>>>>>> <brad.froehle at gmail.com> wrote:
> >>>>>>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett <
> matthew.brett at gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> On Sat, Mar 30, 2013 at 2:20 PM,  <josef.pktd at gmail.com> wrote:
> >>>>>>>>>> > On Sat, Mar 30, 2013 at 4:57 PM,  <josef.pktd at gmail.com>
> wrote:
> >>>>>>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
> >>>>>>>>>> >> <matthew.brett at gmail.com> wrote:
> >>>>>>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd at gmail.com>
> wrote:
> >>>>>>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
> >>>>>>>>>> >>>> <matthew.brett at gmail.com> wrote:
> >>>>>>>>>> >>>>>
> >>>>>>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense
> of index
> >>>>>>>>>> >>>>> ordering.
> >>>>>>>>>> >>>>>
> >>>>>>>>>> >>>>> This is very confusing.  We think the index ordering and
> memory
> >>>>>>>>>> >>>>> ordering ideas need to be separated, and specifically, we
> should
> >>>>>>>>>> >>>>> avoid
> >>>>>>>>>> >>>>> using "C" and "F" to refer to index ordering.
> >>>>>>>>>> >>>>>
> >>>>>>>>>> >>>>> Proposal
> >>>>>>>>>> >>>>> -------------
> >>>>>>>>>> >>>>>
> >>>>>>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and
> forwards
> >>>>>>>>>> >>>>> index ordering for ravel, reshape
> >>>>>>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of
> unraveling
> >>>>>>>>>> >>>>> in
> >>>>>>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively
> (excellent
> >>>>>>>>>> >>>>> naming idea by Paul Ivanov)
> >>>>>>>>>> >>>>>
> >>>>>>>>>> >>>>> What do y'all think?
> >>>>>>>>>> >>>>
> >>>>>>>>>> >>>> I always thought "F" and "C" are easy to understand, I
> always thought
> >>>>>>>>>> >>>> about
> >>>>>>>>>> >>>> the content and never about the memory when using it.
> >>>>>>>>>> >>
> >>>>>>>>>> >> changing the names doesn't make it easier to understand.
> >>>>>>>>>> >> I think the confusion is because the new A and K refer to
> existing
> >>>>>>>>>> >> memory
> >>>>>>>>>> >>
> >>>>>>>>>>
> >>>>>>>>>> I disagree, I think it's confusing, but I have evidence, and
> that is
> >>>>>>>>>> that four out of four of us tested ourselves and got it wrong.
> >>>>>>>>>>
> >>>>>>>>>> Perhaps we are particularly dumb or poorly informed, but I
> think it's
> >>>>>>>>>> rash to assert there is no problem here.
> >>>>>>>>
> >>>>>>>> I think you are overcomplicating things or phrased it as a "trick
> question"
> >>>>>>>
> >>>>>>> I don't know what you mean by trick question - was there something
> >>>>>>> over-complicated in the example?  I deliberately didn't include
> >>>>>>> various much more confusing examples in "reshape".
> >>>>>>
> >>>>>> I meant making the "candidates" think about memory instead of just
> >>>>>> column versus row stacking.
> >>>>>
> >>>>> To be specific, we were teaching about reshaping a (I, J, K, N) 4D
> >>>>> array, it was an image, with time as the 4th dimension (N time
> >>>>> points).   Raveling and reshaping 3D and 4D arrays is a common thing
> >>>>> to do in neuroimaging, as you can imagine.
> >>>>>
> >>>>> A student asked what he would get back from raveling this array, a
> >>>>> concatenated time series, or something spatial?
> >>>>>
> >>>>> We showed (I'd worked it out by this time) that the first N values
> >>>>> were the time series given by [0, 0, 0, :].
> >>>>>
> >>>>> He said - "Oh - I see - so the data is stored as a whole lot of time
> >>>>> series one by one, I thought it would be stored as a series of
> >>>>> images'.
> >>>>>
> >>>>> Ironically, this was a Fortran-ordered array in memory, and he was
> wrong.
> >>>>>
> >>>>> So, I think the idea of memory ordering and index ordering is very
> >>>>> easy to confuse, and comes up naturally.
> >>>>>
> >>>>> I would like, as a teacher, to be able to say something like:
> >>>>>
> >>>>> This is what C memory layout is (it's the memory layout  that gives
> >>>>> arr.flags.C_CONTIGUOUS=True)
> >>>>> This is what F memory layout is (it's the memory layout  that gives
> >>>>> arr.flags.F_CONTIGUOUS=True)
> >>>>> It's rather easy to get something that is neither C or F memory
> layout
> >>>>> Numpy does many memory layouts.
> >>>>> Ravel and reshape and numpy in general do not care (normally) about C
> >>>>> or F layouts, they only care about index ordering.
> >>>>>
> >>>>> My point, that I'm repeating, is that my job is made harder by
> >>>>> 'arr.ravel('F')'.
> >>>>
> >>>> But once you know that ravel and reshape don't care about memory, the
> >>>> ravel is easy to predict (maybe not easy to visualize in 4-D):
> >>>
> >>> But this assumes that you already know that there's such a thing as
> >>> memory layout, and there's such a thing as index ordering, and that
> >>> 'C' and 'F' in ravel refer to index ordering.  Once you have that,
> >>> you're golden.  I'm arguing it's markedly harder to get this
> >>> distinction, and keep it in mind, and teach it, if we are using the
> >>> 'C' and 'F" names for both things.
> >>
> >> No, I think you are still missing my point.
> >> I think explaining ravel and reshape F and C is easy (kind of) because
> the
> >> students don't need to know at that stage about memory layouts.
> >>
> >> All they need to know is that we look at n-dimensional objects in
> >> C-order or in  F-order
> >> (whichever index runs fastest)
> >
> > Would you accept that it may or may not be true that it is desirable
> > or practical not to mention memory layouts when teaching numpy?
>
> I think they should be in two different sections.
>
> basic usage:
> ravel, reshape in pure index order, and indexing, broadcasting, ...
>
> advanced usage:
> memory layout and some ability to predict when you get a view and
> when you get a copy.
>
> And I still think words can mean different things in different context
> (with a qualifier maybe)
> indexing in fortran order
> memory in fortran order
>
> Disclaimer: I never tried to teach numpy
> and with GSOC students my explanations only went a little bit
> beyond what they needed to know for the purpose at hand (I hope)
>
> >
> > You believe it is desirable, I believe that it is not - that teaching
> > numpy naturally involves some discussion of memory layout.
> >
> > As evidence:
> >
> > * My student, without any prompting about memory layouts, is asking
> about it
> > * Travis' numpy book has a very early section on this (section 2.3 -
> > memory layout)
> > * I often think about memory layouts, and from your discussion, you do
> > too.  It's uncommon that you don't have to teach something that
> > experienced users think about often.
>
> I'm mentioning memory layout because I'm talking to you.
> I wouldn't talk about memory layout if I would try to explain ravel,
> reshape and indexing for the first time to a student.
>
> > * The most common use of 'order' only refers to memory layout.  For
> > example np.array "order" doesn't refer to index ordering but to memory
> > layout.
>
> No, as I tried to show with the statsmodels example.
> I don't require GSOC students (that are relatively new to numpy) to
> understand
> much about memory layout.
> The only use of ``order`` in statsmodels refers to *index* order in
> ravel and reshape.
>
> > * The current docstring of 'reshape' cannot be explained without
> > referring to memory order.
>
> really ?
> I thought reshape only refers to *index* order for "F" and "C"
>
> I don't think I can express my preference for reshape order="F" any
> better than I did, so maybe it's time for some additional users/developers
> to chime in.


My 2cents: while I can't go back and un-read earlier emails in this thread,
I don't see what's ambiguous in the case of ravel. For reshape I can see
though that it's possible to interpret it in two ways. In such cases I open
up IPython and play with a 2x3 array to check my understanding. That's OK,
and certainly better than adding duplicate names now for C/F even if that
would solve the issue (which it probably wouldn't). Therefore I'm -1 on the
initial proposal.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130331/a40076f0/attachment.html>


More information about the NumPy-Discussion mailing list