[Numpy-discussion] a Numeric.where re-coded for weave.inline: very fast...

Fri Jul 1 14:01:38 EDT 2005

I tried weave for executing a C routine over a Numeric array. 

The Numeric data array is c_int16 with 14 significant bits (bit 13 is the sign) and I wanted it as "normal" Int16.
So:
        t1=clock()
        m = where(greater(dataArray[:1000], 8191), 
                        dataArray[:1000]-16383, 
                        dataArray[:1000])
        t2 = clock()
        print 'where', round((t2-t1)*1000000, 1), 'us'
Usually gives: where 634.3 us

PyInline has about the same calling overhead as ctypes (I had tried a DLL and ctypes:
http://sourceforge.net/mailarchive/forum.php?thread_id=7630224&forum_id=24606 ), apparently, about 240us for a simple C statement. It also requires some extra hoops to digest the pointer to short I need (  void b2int(short *f, int N)   ), so I didn't fully try it.

weave.inline testing was very good, with different issues. I found I had to patch msvccompiler.py for MS C 7.1 .NET ( http://www.vrplumber.com/programming/mstoolkit/ ). After the free but massive download/install...

        import weave
        inline = weave.inline # for speed
        N = 1000
        code="int i; for(i = 0; i < N; i++){if (dataArray[i]>8191){dataArray[i]-=16383;}}"
        inline(code, ['dataArray', 'N']) # just pass Python object's names

This created  sc_019a1cf36209cb2dfc688820080541ef0.pyd in C:\Documents and Settings\rays\Local Settings\Temp\rays\python24_compiled\
Using the above code is slow; ~32000 us, as the compiler checks or runs each time. However, after much fiddling and dir()s, I copied the long-name.pyd to C:\Python24\DLLs and just did

        import sc_019a1cf36209cb2dfc688820080541ef0
        b2iw = sc_019a1cf36209cb2dfc688820080541ef0.compiled_func   # note the exposed function!
        N = 1
        t1=clock()
        b2iw({'dataArray':dataArray}, {'N':N})    # note the dicts!
        t2 = clock()
        print 'weave', round((t2-t1)*1000000, 1), 'us'

25us! (~300us for N=10000) I think that's as close to C speed as I can expect, although I'm looking at the compiler options in msvccompiler.py for P4 optimization...
I still get "Missing compiler_cxx fix for MSVCCompiler" on the initial compile, but apparently to no harm.

As a final note, I also found that psyco _slows down_ both  ctypes and weave calls. I did psyco.full() at the app's start.
Without pysco:
  b2i 210.8 us
  weave 53.7 us
with:
  b2i 250.0 us
  weave 234.7 us

Comment/criticism ?

Ray