[Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering

Sebastian Berg sebastian at sipsolutions.net
Wed Apr 3 12:52:43 EDT 2013


On Wed, 2013-04-03 at 08:52 -0700, Chris Barker - NOAA Federal wrote:
> On Wed, Apr 3, 2013 at 6:24 AM, Sebastian Berg
> <sebastian at sipsolutions.net> wrote:
> >> the context where it gets applied. So giving the same strategy two
> >> different names is silly; if anything it's the contexts that should
> >> have different names.
> >>
> >
> > Yup, thats how I think about it too...
> 
> me too...
> 
> > But I would really love if someone would try to make the documentation
> > simpler!
> 
> yes, I think this is where the solution lies.
> 
> > There is also never a mention of "contiguity", even though when
> > we refer to "memory order", then having a C/F contiguous array is often
> > the reason why
> 
> good point -- in fact, I have no idea what would happen in many of
> these cases for a discontiguous array (or one with arbitrarily weird
> strides...)
> 
> >  Also 'A' seems often explained not
> > quite correctly (though that does not matter (except for reshape, where
> > its explanation is fuzzy), it will matter more in the future -- even if
> > I don't expect 'A' to be actually used).
> 
> I wonder about having a 'A' option in reshape at all -- what the heck
> does it mean? why do we need it? Again, I come back to the fact that
> memory order is kind-of orthogonal to index order. So for reshape (or
> ravel, which is really just a special case of reshape...) the 'A' flag
> and 'K' flag (huh?) is pretty dangerous, and prone to error. I think
> of it this way:
> 

Actually 'K' + reshape is not even implemented sensibly and in current
master I changed it to an error. I would not even know how to define it,
and even if you find a definition I cannot imagine it being useful...
Deprecating 'A' for reshape would seem OK to me since I doubt anyone
actually uses it. It is currently equivalent to `'F' if input.flags.fnc
else 'C'` (fnc means "fortran not c"), and as such is shaky business.

I just realized that 'A' is a bit funny. Basically it means anything
(Anyorder), including discontinuous memory chunks for np.array with
copy=False. But if you do a copy (or reshape), lacking a more free way
to do it, it means `'F' if input.flags.fnc else 'C'` again.

Not sure about the history, but it seems to me 'K' basically supersedes
'A' for most stuff and its usage as Fortran or C, is more an accident
because it is the simplest way to implement "I don't care".

The use of 'K' is very sensible for copies of course. 'K' actually does
make some sense for ravel, since if you don't care, it has the best
chance of no copy. 'A' for ravel could/should in my opinion be
deprecated just like for reshape, since it is pretty unpredictable.


> Much of the beauty of numpy is that it presents a consistent interface
> to various forms of strided data -- that way, folks can write code
> that works the same way for any ndarray, while still being able to
> have internal storage be efficient for the use at hand -- i.e. C order
> for the common case, Fortran order for interaction with libraries that
> expect that order (or for algorithms that are more efficient in that
> order, though that's mostly external libs..), and non-contiguous data
> so one can work on sub-parts of arrays without copying data around.
> 
> In most places, the numpy API hides the internal memory order -- this
> is a good thing, most people have no need to think about it (or most
> code, anyway), and you can write code that works (even if not
> optimally) for any (strided) memory layout. All is good.
> 
> There are times when you really need to understand, or control or
> manipulate the memory layout, to make sure your routines are
> optimized, or the data is in the right form to pass of to an external
> lib, or to make sense of raw data read from a file, or... That's what
> we have .view() and friends for.
> 

Yeah, I somewhat dislike the fact that "view" only works right for
(roughly) C-contiguous arrays, thats another one of those old traps that
is difficult to impossible to get rid of. Maybe some or all of view
usages should be superseded by a new command...

Regards,

Sebastian

> However, the 'A' and 'K' flags mix and match these concepts -- and I
> think that's dangerous. it would be easy for the a to use the 'A'
> flag, and have everything work fine and dandy with all their test
> cases, only to have it blow up when  someone passes in a
> different-than-expected array. So really, they should only be used in
> cases where the code has checked memory order before hand, or in a
> really well-defined interface where you know exactly what you're
> getting. In those cases, it makes the code far more clear an less
> error prone to do you re-arranging of the memory in a separate step,
> rather than built-in to a ravel() or reshape() call.
> 
> [note] -- I wrote earlier that I wasn't confused by the ravel()
> examples -- true for teh 'c' and 'F' flags, but I'm still not at all
> clear what 'A' and 'K' woudl give me -- particularly for 'A' and
> reshape()
> 
> So I think the cause of the confusion here is not that we use "order"
> in two different contexts, nor the fact that 'C' and 'F' may not mean
> anything to some people, but that we are conflating two different
> process in one function, and with one flag.
> 
> My (maybe) proposal: we deprecate the 'A' and 'K' flags in ravel() and
> reshape(). (maybe even deprecate ravel() -- does it add anything to
> reshape? If not deprecate, at least encourage people in the docs not
> to use them, and rather do their memory-structure manipulations with
> .view or stride manipulation, or...
> 
> I'm still trying to figure out when you'd want the 'A' flag -- it
> seems at the end of your operation you will want:
> 
> The resulting array to be a particular shape, with the elements in a
> particular order
> 
> and
> 
> You _may_ want the in-memory layout a certain way.
> 
> but 'A' can't ensure both of those.
> 
> -Chris
> 
> 





More information about the NumPy-Discussion mailing list