[Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering

Matthew Brett matthew.brett at gmail.com
Wed Apr 3 14:39:33 EDT 2013


Hi,

On Wed, Apr 3, 2013 at 5:19 AM,  <josef.pktd at gmail.com> wrote:
> On Tue, Apr 2, 2013 at 9:09 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>> Hi,
>>
>> On Tue, Apr 2, 2013 at 7:09 PM,  <josef.pktd at gmail.com> wrote:
>>> On Tue, Apr 2, 2013 at 5:52 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>>> On Tue, Apr 2, 2013 at 10:21 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>>>>> This is like observing that if I say "go North" then it's ambiguous
>>>>>> about whether I want you to drive or walk, and concluding that we need
>>>>>> new words for the directions depending on what sort of vehicle you
>>>>>> use. So "go North" means drive North, "go htuoS" means walk North,
>>>>>> etc. Totally silly. Makes much more sense to have one set of words for
>>>>>> directions, and then make clear from context what the directions are
>>>>>> used for -- "drive North", "walk North". Or "iterate C-wards", "store
>>>>>> F-wards".
>>>>>>
>>>>>> "C" and "Z" mean exactly the same thing -- they describe a way of
>>>>>> unraveling a cube into a straight line. The difference is what we do
>>>>>> with the resulting straight line. That's why I'm suggesting that the
>>>>>> distinction should be made in the name of the argument.
>>>>>
>>>>> Could you unpack that for the 'ravel' docstring?  Because these
>>>>> options all refer to the way of unraveling and not the memory layout
>>>>> that results.
>>>>
>>>> Z/C/column-major/whatever-you-want-to-call-it is a general strategy
>>>> for converting between a 1-dim representation and a n-dim
>>>> representation. In the case of memory storage, the 1-dim
>>>> representation is the flat space of pointer arithmetic. In the case of
>>>> ravel, the 1-dim representation is the flat space of a 1-dim indexed
>>>> array. But the 1-dim-to-n-dim part is the same in both cases.
>>>>
>>>> I think that's why you're seeing people baffled by your proposal -- to
>>>> them the "C" refers to this general strategy, and what's different is
>>>> the context where it gets applied. So giving the same strategy two
>>>> different names is silly; if anything it's the contexts that should
>>>> have different names.
>>>
>>> And once we get into memory optimization (and avoiding copies and
>>> preserving contiguity), it is necessary to keep both orders in mind,
>>> is memory order in "F" and am I iterating/raveling in "F" order
>>> (or slicing columns).
>>>
>>> I think having two separate keywords give the impression we can
>>> choose two different things at the same time.
>>
>> I guess it could not make sense to do this:
>>
>> np.ravel(a, index_order='C', memory_order='F')
>>
>> It could make sense to do this:
>>
>> np.reshape(a, (3,4), index_order='F, memory_order='F')
>>
>> but that just points out the inherent confusion between the uses of
>> 'order', and in this case, the fact that you can only do:
>>
>> np.reshape(a, (3, 4), index_order='F')
>>
>> correctly distinguishes between the meanings.
>
> So, if index_order and memory_order are never in the same function,
> then the context should be enough. It was always enough for me.

It was not enough for me or the three others who will publicly admit
to the shame of finding it confusing without further thought.

Again, I just can't see a reason not to separate these ideas.  We are
not arguing about backwards compatibility here, only about clarity.  I
guess you do accept that some people, other than yourself, might be
less likely to get tripped up by:

np.reshape(a, (3, 4), index_order='F')

than

np.reshape(a, (3, 4), order='F')

?

> np.reshape(a, (3,4), index_order='F, memory_order='F')
> really hurts my head because you mix a function that operates on
> views, indexing and shapes with memory creation, (or I have
> no idea what memory_order should do in this case).

Right.   I think you may now be close to my own discomfort when faced
with working out (fast) what:

np.reshape(a, (3,4), order='F')

means, given 'order' means two different things, and both might be
relevant here.

Or are you saying that my brain should have quickly calculated that
that 'order' would be difficult to understand as memory layout and
therefore rejected that and seen immediately that index order was the
meaning?   Speaking as a psychologist,  I don't think that's the way
it works.

Cheers,

Matthew



More information about the NumPy-Discussion mailing list