[Numpy-discussion] numpy.random.shuffle

Wed Nov 22 15:28:20 EST 2006

Robert Kern wrote:
> Tim Hochberg wrote:
>> Robert Kern wrote:
> 
>>> One possibility is to check if the object is an ndarray (or subclass) and use
>>> .copy() if so; otherwise, use the current implementation and hope that you
>>> didn't pass it a Numeric or numarray array (or some other view-based object).
>>>   
>> I think I would invert this test and instead check if the object is a 
>> Python list and *not* copy in that case. Otherwise, use copy.copy to 
>> copy the object whatever it is. This looks like it would be more robust 
>> in that it would work in all sensible case, and just be a tad slower in 
>> some of them.
> 
> I don't want to assume that the only two sequence types are lists and arrays.
> The problem with using copy.copy() on non-arrays is that it, well, makes copies
> of the elements. The objects in the shuffled sequence are not the same objects
> before and after the shuffling. I consider that to be a violation of the spec.
> 
> Views are rare outside of numpy/Numeric/numarray, partially because Guido
> considers them to be evil. I'm beginning to see why.
> 
>> Another possible refinement / complication would be to special case 1D 
>> arrays so that they run fastish.
>>
>> A third possibility involves rewriting this in this form:
>>
>>     indices = arange(len(x))
>>     _shuffle_core(indices) # This just does what current shuffle now does
>>     x[:] = take(x, indices, 0)
> 
> That's problematic since the elements all turn into numpy scalar objects:
> 
> In [1]: from numpy import *
> 
> In [2]: a = range(9,-1,-1)
> 
> In [3]: idx = arange(len(a))
> 
> In [4]: a[:] = take(a, idx, 0)
> 
> In [5]: a
> Out[5]: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
> 
> In [6]: type(a[0])
> Out[6]: <type 'numpy.int32'>
> 

a[:]=take(asarray(a,object),idx,0)  ?  works also correct with ndarray's even if I didn't dig the reason why... all element will be probably re-casted twice.

Think the take-method on shuffled indizes is basically right and natural for a numpy-shuffler. 
The example is just possibly another vote against the default behavior of letting numpy.scalar types out of arrays, which are set up with a "harmless" type.

>>> array([1,2,3],float)
array([ 1.,  2.,  3.])
>>> type(_[0])
<type 'numpy.float64'>
>>> 

is just ill as it I think. 
In (Guido's) Python objects should probably come out of collections best as typy as they went in. Currently numpy-scalars will just "infect" the whole app almost like a virus (and kill performance and pickle's etc.)
Of course views are essential for an efficient array type, but type-altering possibly not.
For rare cases for generalized algs (I need to think hard to find even an example), where the array-interface is needed on elements (and a array(obj) cast is too uncomfortable), there could be still the different possibilty:

>>> array([1,2,3],numpy.float64)

then its natural that numpy.float64, numpy.int32.... come out, as the programmer would even expect it so.

Thus maybe for array types:  
* float!=numpy.float64  (but common base class (or 'float' itself) maybe)
* int !=numpy.intXX
* complex !=numpy.complex128
* default array type is (python.)float
* default array type from list of ints is (python.)int
* default array type from list of complex is (python.)complex
* default array type of other lists is always <object>

currently this is also problematic: 
>>> array([1,2,"3",[]])
array(['1', '2', '3', '[]'], 
      dtype='|S4')

and even

>>> array([1,2,"3ef",'wefwfewoiwjefo iwjef'])
array(['1', '2', '3ef', 'wefwfewoiwjefo iwjef'], 
      dtype='|S20')
>>> _[0]='woeifjwo woie pwioef wliuefh lwieufh wleifuh welfiu '
>>> _
array(['woeifjwo woie pwioef', '2', '3ef', 'wefwfewoiwjefo iwjef'], 
      dtype='|S20')

is rarely what a Pythoneer would expect. Guess fix string arrays should only be created explicitely

Robert