Pyrex speed

Sat May 27 23:10:38 EDT 2006

Hi John,

John Machin escribió:

>On 28/05/2006 12:10 AM, Gonzalo Monzón wrote:
>
>[good advice snipped]
>
>  
>
>>Example A:
>>This code is more than 80 times faster than a "easy" Python 
>>implementation. For every call, it does some bitwise operations and does 
>>an array lookup for every string character from argument. Its a lot 
>>faster because in Python approach a list lookup is done and it is a lot 
>>faster to do a C array lookup -thought that in these C loops no Python 
>>type value conversions are needed, if it where the case, C approach 
>>would not be so faster than python. I don't know how would perform an 
>>array based Python code, but I expect it to be a lot faster than using a 
>>list, so Python code can be speed up a lot if you know how to do it.
>>
>>// C code:
>>int CRC16Table[256]; // Filled elsewhere
>>int CalcCRC16(char *str)
>>{
>>   int crc;
>>     for(crc = 0xFFFF; *str != 0; str++) {
>>       crc = CRC16Table [(( crc >> 8 ) & 255 )] ^ ( crc << 8 ) ^ *str;
>>    
>>
>
>Gonzalo, just in case there are any C compilers out there which need to 
>be told:
>
> >      for(crc = 0xFFFF; *str != 0;) {
> >        crc = CRC16Table [(( crc >> 8 ) & 255 )] ^ ( crc << 8 ) ^ *str++;
>  
>
Thank you for the advise! I didn't know you couldn't advance pointer in 
the for in some compilers...

>
>  
>
>>   }
>>     return crc;
>>}
>>
>># Python code
>>gCRC16Table = [] # Filled elsewhere
>>def CalcCRC16(astr):
>>   crc = 0xFFFFL
>>    
>>
>
>Having that L on the end (plus the fact that you are pointlessly 
>maintaining "crc" as an *unsigned* 32-bit quantity) will be slowing the 
>calculation down -- Python will be doing it in long integers. You are 
>calculating a *sixteen bit* CRC! The whole algorithm can be written 
>simply so as to not need more than 16-bit registers, and not to pollute 
>high-order bits in 17-or-more-bit registers.
>
>  
>
Yes I know but I plan to post a quick example for Jim, and got the first 
one file from several versions... :-) The issue was about Jim 
understanding how some code can be speed-up a lot and some other not and 
how that's not a trivial question.

>>   for c in astr:
>>       crc = gCRC16Table[((crc >> 8) & 255)] ^ ((crc & 0xFFFFFF) << 8) ^ 
>>ord(c)
>>    
>>
>
>Note that *both* the C and Python routines still produce a 32-bit result 
>with 16 bits of high-order rubbish -- I got the impression from the 
>previous thread that you were going to fix that.
>  
>
Yes of course! I plan to spend some time on this issue, the last week I 
had not much time to work on this, but  thought it worth the pain to 
setup a compiling environment -ms.evc++ obviously-, and got succesfuly 
compiled Python and some of these own custom Pyrex extensions for the 
PocketPC, easily, only adding the C files to makefile, as Pyrex glue 
code compiles well on ARM, so I have to make some timings and decide 
what version to use for the code that won't be likely to be changed in 
long time. I still have to test the last improved Python array based 
approach and make some timings on the PDA.

>This Python routine never strays outside 16 bits, so avoiding your "& 
>255" and a final "& 0xFFFF" (which you don't have).
>
>def CalcCRC16(astr):
>     crc = 0xFFFF
>     for c in astr:
>         crc = gCRC16Table[crc >> 8] ^ ((crc & 0xFF) << 8) ^ ord(c)
>     return crc
>  
>
Thank you again for your thoughts John! :-)

Regards,
Gonzalo

>==============
>To the OP:
>
>I'd just like to point out that C code and Pyrex code can gain 
>signicantly (as the above example does) by not having to use ord() and 
>chr().
>
>As Gonzalo says, read the generated C code. Look for other cases of 
>using Python built-ins that could be much faster with a minor bit of 
>effort in Pyrex e.g. "max(a, b)" -> "(a) > (b) ? (a) : (b) " or if you 
>don't like that, a cdef function to get the max of 2 ints will be *way* 
>faster than calling Python's max()
>  
>