[Numpy-discussion] Psyco MA?

Tim Hochberg tim.hochberg at ieee.org
Fri Feb 7 15:09:04 EST 2003


Chris Barker wrote:

>oops, sorry about the blank message.
>
>Paul F Dubois wrote:
>  
>
>>{ CC to GvR just to show why I'm +1 on the if-PEP. I liked this in another
>>    
>>
>
>What the heck is the if-PEP ?
>  
>

Pep 308. It's stirring up a bit of a ruckos on CLP as we speak.

>>Perhaps knowlegeable persons could comment on the feasibility of coding MA
>>(masked arrays) in straight Python and then using Psyco on it?
>>    
>>
>
>Is there confusion between Psyco and Pyrex? Psyco runs regular old
>Python bytecode, and individually compiles little pieces of it as needed
>into machine code. AS I understand it, this should make loops where the
>inner part is a pretty simple operation very fast. 
>
>However, Psyco is pretty new, and I have no idea how robust and stable,
>but certainly not cross platform. As it generates machine code, it needs
>to be carefully ported to each hardware platform, and it currently only
>works on x86.
>  
>
Psyco seems fairly stable these days. However it's one of those things 
that probably needs to get a larger cabal of users to shake the bugs out 
of it. I still only use it to play around with because all things that I 
need speed from I end up doing in Numeric anyway.

>Pyrex, on the other hand, is a "Python-like" language that is tranlated
>into C, and then the C is compiled. It generates pretty darn platform
>independent, so it should be able to be used on all platforms.
>
>
>In regard to your question about MA (and any ther similar project): I
>think Psyco has the potential to be the next-generation Python VM, which
>will have much higher performance, and therefore greatly reduce the need
>to write extensions for the sake of performance. I supsect that it could
>do its best with large, multi-dimensional arrays of numbers if there is
>a Python native object of such a type. Psycho, however is not ready for
>general use on all platforms, so in the forseeable future, there is a
>need for other ways to get decent performance. My suggestion follows:
>
>  
>
>>It could have been written a lot simpler if performance didn't dictate
>>trying to leverage off Numeric. In straight Python one can imagine an add,
>>for example, that was roughly:
>>    for k in 0<= k < len(a.data):
>>       result.mask[k] = a.mask[k] or b.mask[k]
>>       result.data[k] = a.data[k] if result.mask[k] else a.data[k] +
>>b.data[k]
>>    
>>
>
>This looks like it could be written in Pyrex. If Pyrex were suitably
>NumArray aware, then it could work great.
>
>What this boils down to, in both the Pyrex and Psyco options, is that
>having a multi-dimensional homogenous numeric data type that is "Native"
>Python is a great idea! With Pyrex and/or Psyco, Numeric3 (NumArray2 ?)
>could be implimented by having only the samallest core in C, and then
>rest in Python (or Pyrex)
>  
>
For Psyco at least you don't need a multidimensional type. You can get 
good results with flat array, in particular array.array. The number I 
posted earlier showed comparable performance for Numeric and a 
multidimensional array type written all in python and psycoized.

And since I suspect  that I'm the mysterious person who's name Paul 
couldn't remember, let me say I suspect the MA would be faster in 
psycoized python than what your doing now as long as a.data was an 
instance of array.array. However, there are at least three problems. 
Psyco doesn't fully support the floating point type('f') right now 
(although it does support most of the various  integral types in 
addition to 'd'). I assume that these masked arrays are 
multidimensional, so someone would have to build the basic 
multidimensional machinery around array.array to make them work. I have 
a good start on this, but I'm not sure when I'm going to have time to 
work on this more. The biggy though is that psyco only works on x86 
machines.  What we really need to do is to clone Armin.

>While the Psyco option is the rosy future of Python, Pyrex is here now,
>and maybe adopting it to handle NumArrays well would be easier than
>re-writing a bunch of NumArray in C.
>  
>
This sounds like you're conflating two different issues. The first issue 
is that Numarray is relatively slow for small arrays.  Pyrex may indeed 
be an easier way to attack this although I wouldn't know, I've only 
looked at it not tried to use it. However, I think that this is 
something that can and should wait. Once use cases of numarray being 
_too_ slow for small arrays start piling up, then it will be time to 
attack the overhead. Premature optimization is the root of all evil and 
all that.

The second issue is how to deal with code that does not vectorize well. 
Here Pyrex again might help if it were made Numarray aware. However, 
isn't this what scipy.weave already does? Again, I haven't used weave, 
but as I understand it, it's another python-c bridge, but one that's 
more geared toward numerics stuff.


-tim











More information about the NumPy-Discussion mailing list