numpy magic: cast scalar returns auto to python types float & int ?

Fri Nov 17 11:06:45 EST 2006

Tim Hochberg wrote:
> robert wrote:
>> To avoid this you'd need a type cast in Python code everywhere you get 
>> scalars from numpy into a python variable. Error prone task. Or 
>> check/re-render your whole object tree.
>> Wouldn't it be much better if numpy would return Python scalars for 
>> float64 (maybe even for float32) and int32, int64 ... where possible? 
>> (as numarray and Numeric did)
>> I suppose numpy knows internally very quickly how to cast. 
> 
> The short answer is no, it would not be better. There are some trade 
> offs involved here, but overall, always returning numpy scalars is a 
> significant improvement over returning Python scalars some of the time. 
> Which is why numpy does it that way now; it was a conscious choice, it 
> didn't just happen.  Please search the archives of numpy-discussion for 
> previous discussions of this and if that is not enlightening enough 
> please ask at on the numpy-discussion list (the address of which just 
> changed and I don't have it handy, but I'm sure you can find it).

Didn't find the relevant reasoning within time. Yet guess the reason is isolated-module-centric. 
All further computations in python are much slower and I cannot even see a speed increase when (rare case) puting a numpy-ic scalar back into a numpy array:

>>> a=array([1.,0,0,0,0])
>>> f=1.0
>>> fn=a[0]
>>> type(fn)
<type 'numpy.float64'>
>>> timeit.Timer("f+f",glbls=globals()).timeit(10000)
0.0048265910890909324
>>> timeit.Timer("f+f",glbls=globals()).timeit(100000)
0.045992158221226376
>>> timeit.Timer("fn+fn",glbls=globals()).timeit(100000)
0.14901307289054877
>>> timeit.Timer("a[1]=f",glbls=globals()).timeit(100000)
0.060825607723899111
>>> timeit.Timer("a[1]=fn",glbls=globals()).timeit(100000)
0.059519575812004177
>>> timeit.Timer("x=a[0]",glbls=globals()).timeit(100000)
0.12302317752676117
>>> timeit.Timer("x=float(a[0])",glbls=globals()).timeit(100000)
0.31556273213496411

creation of numpy scalar objects seems not be cheap/advantagous anyway:

>>> oa=array([1.0,1.0,1.0,1.0,1],numpy.object)
>>> oa
array([1.0, 1.0, 1.0, 1.0, 1], dtype=object)
>>> timeit.Timer("x=a[0]",glbls=globals()).timeit(100000)
0.12025438987348025
>>> timeit.Timer("x=oa[0]",glbls=globals()).timeit(100000)
0.050609225474090636
>>> timeit.Timer("a+a",glbls=globals()).timeit(100000)
1.3081539692893784
>>> timeit.Timer("oa+oa",glbls=globals()).timeit(100000)
1.5201345422392478

> For your particular issue, you might try tweaking pickle to convert 
> int64 objects to int objects. Assuming of course that you have enough of 
> these to matter, otherwise, I suggest just leaving things alone.

( int64've not had so far don't know whats with python L's )

the main problem is with hundreds of all-day normal floats (now numpy.float64) and ints (numpy.int32) variables.
Speed issues, memory consumption... And a pickled tree cannot be read by an app which has not numpy available. and the pickles are very big.

I still really wonder how all this observations and the things which I can imagine so far can sum up to an overall advantage for letting numpy.float64 & numpy.int32 scalars out by default - and also possibly not for numpy.float32 which has somewhat importance in practice ?
Letting out nan and inf.. objects and offering an explicit type case is of course ok.

Robert