[Numpy-discussion] Request for enhancement to numpy.random.shuffle
Jaime Fernández del Río
jaime.frio at gmail.com
Sun Oct 12 13:56:13 EDT 2014
On Sun, Oct 12, 2014 at 9:29 AM, Warren Weckesser <
warren.weckesser at gmail.com> wrote:
>
>
> On Sun, Oct 12, 2014 at 12:14 PM, Warren Weckesser <
> warren.weckesser at gmail.com> wrote:
>
>>
>>
>> On Sat, Oct 11, 2014 at 6:51 PM, Warren Weckesser <
>> warren.weckesser at gmail.com> wrote:
>>
>>> I created an issue on github for an enhancement
>>> to numpy.random.shuffle:
>>> https://github.com/numpy/numpy/issues/5173
>>> I'd like to get some feedback on the idea.
>>>
>>> Currently, `shuffle` shuffles the first dimension of an array
>>> in-place. For example, shuffling a 2D array shuffles the rows:
>>>
>>> In [227]: a
>>> Out[227]:
>>> array([[ 0, 1, 2],
>>> [ 3, 4, 5],
>>> [ 6, 7, 8],
>>> [ 9, 10, 11]])
>>>
>>> In [228]: np.random.shuffle(a)
>>>
>>> In [229]: a
>>> Out[229]:
>>> array([[ 0, 1, 2],
>>> [ 9, 10, 11],
>>> [ 3, 4, 5],
>>> [ 6, 7, 8]])
>>>
>>>
>>> To add an axis keyword, we could (in effect) apply `shuffle` to
>>> `a.swapaxes(axis, 0)`. For a 2-D array, `axis=1` would shuffles
>>> the columns:
>>>
>>> In [232]: a = np.arange(15).reshape(3,5)
>>>
>>> In [233]: a
>>> Out[233]:
>>> array([[ 0, 1, 2, 3, 4],
>>> [ 5, 6, 7, 8, 9],
>>> [10, 11, 12, 13, 14]])
>>>
>>> In [234]: axis = 1
>>>
>>> In [235]: np.random.shuffle(a.swapaxes(axis, 0))
>>>
>>> In [236]: a
>>> Out[236]:
>>> array([[ 3, 2, 4, 0, 1],
>>> [ 8, 7, 9, 5, 6],
>>> [13, 12, 14, 10, 11]])
>>>
>>> So that's the first part--adding an `axis` keyword.
>>>
>>> The other part of the enhancement request is to add a shuffle
>>> behavior that shuffles the 1-d slices *independently*. That is,
>>> for a 2-d array, shuffling with `axis=0` would apply a different
>>> shuffle to each column. In the github issue, I defined a
>>> function called `disarrange` that implements this behavior:
>>>
>>> In [240]: a
>>> Out[240]:
>>> array([[ 0, 1, 2],
>>> [ 3, 4, 5],
>>> [ 6, 7, 8],
>>> [ 9, 10, 11],
>>> [12, 13, 14]])
>>>
>>> In [241]: disarrange(a, axis=0)
>>>
>>> In [242]: a
>>> Out[242]:
>>> array([[ 6, 1, 2],
>>> [ 3, 13, 14],
>>> [ 9, 10, 5],
>>> [12, 7, 8],
>>> [ 0, 4, 11]])
>>>
>>> Note that each column has been shuffled independently.
>>>
>>> This behavior is analogous to how `sort` handles the `axis`
>>> keyword. `sort` sorts the 1-d slices along the given axis
>>> independently.
>>>
>>> In the github issue, I suggested the following signature
>>> for `shuffle` (but I'm not too fond of the name `independent`):
>>>
>>> def shuffle(a, independent=False, axis=0)
>>>
>>> If `independent` is False, the current behavior of `shuffle`
>>> is used. If `independent` is True, each 1-d slice is shuffled
>>> independently (in the same way that `sort` sorts each 1-d
>>> slice).
>>>
>>> Like most functions that take an `axis` argument, `axis=None`
>>> means to shuffle the flattened array. With `independent=True`,
>>> it would act like `np.random.shuffle(a.flat)`, e.g.
>>>
>>> In [247]: a
>>> Out[247]:
>>> array([[ 0, 1, 2, 3, 4],
>>> [ 5, 6, 7, 8, 9],
>>> [10, 11, 12, 13, 14]])
>>>
>>> In [248]: np.random.shuffle(a.flat)
>>>
>>> In [249]: a
>>> Out[249]:
>>> array([[ 0, 14, 9, 1, 13],
>>> [ 2, 8, 5, 3, 4],
>>> [ 6, 10, 7, 12, 11]])
>>>
>>>
>>> A small wart in this API is the meaning of
>>>
>>> shuffle(a, independent=False, axis=None)
>>>
>>> It could be argued that the correct behavior is to leave the
>>> array unchanged. (The current behavior can be interpreted as
>>> shuffling a 1-d sequence of monolithic blobs; the axis argument
>>> specifies which axis of the array corresponds to the
>>> sequence index. Then `axis=None` means the argument is
>>> a single monolithic blob, so there is nothing to shuffle.)
>>> Or an error could be raised.
>>>
>>> What do you think?
>>>
>>> Warren
>>>
>>>
>>
>>
>> It is clear from the comments so far that, when `axis` is None, the
>> result should be a shuffle of all the elements in the array, for both
>> methods of shuffling (whether implemented as a new method or with a boolean
>> argument to `shuffle`). Forget I ever suggested doing nothing or raising
>> an error. :)
>>
>> Josef's comment reminded me that `numpy.random.permutation` returns a
>> shuffled copy of the array (when its argument is an array). This function
>> should also get an `axis` argument. `permutation` shuffles the same way
>> `shuffle` does--it simply makes a copy and then calls `shuffle` on the
>> copy. If a new method is added for the new shuffling style, then it would
>> be consistent to also add a new method that uses the new shuffling style
>> and returns a copy of the shuffled array. Then we would then have four
>> methods:
>>
>> In-place Copy
>> Current shuffle style shuffle permutation
>> New shuffle style (name TBD) (name TBD)
>>
>> (All of them will have an `axis` argument.)
>>
>>
>
> That table makes me think that, *if* we go with new methods, the names
> should be `shuffleXXX` and `permutationXXX`, where `XXX` is a common suffix
> that is to be determined. That will ensure that the names appear together
> in alphabetical lists, and should show up together as options in
> tab-completion or code-completion.
>
Just to add some noise to a productive conversation: if you add a 'copy'
flag to shuffle, then all the functionality is in one place, and
'permutation' can either be deprecated, or trivially implemented in terms
of the new 'shuffle'.
Jaime
>
>
> Warren
>
>
>> I suspect this will make some folks prefer the approach of adding a
>> boolean argument to `shuffle` and `permutation`.
>>
>> Warren
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
--
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20141012/395b7ddc/attachment.html>
More information about the NumPy-Discussion
mailing list