[Python-Dev] memcmp performance

Fri Oct 21 21:18:58 CEST 2011

On Fri, 21 Oct 2011 18:23:24 +0000 (GMT)
Richard Saunders <richismyname at me.com> wrote:
> 
> If both loops are the same unicode kind, we can add memcmp
> to unicode_compare for an optimization:
>   
>     Py_ssize_t len = (len1<len2) ? len1: len2;
> 
>     /* use memcmp if both the same kind */
>     if (kind1==kind2) {
>       int result=memcmp(data1, data2, ((int)kind1)*len);
>       if (result!=0) 
> 	return result<0 ? -1 : +1; 
>     }

Hmm, you have to be a bit subtler than that: on a little-endian
machine, you can't compare two characters by comparing their bytes
representation in memory order. So memcmp() can only be used for the
one-byte representation.
(actually, it can also be used for equality comparisons on any
representation)

> Rerunning the test with this small change to unicode_compare:
> 
> 17.84 seconds:  -fno-builtin-memcmp 
> 36.25 seconds:  STANDARD memcmp
> 
> The standard memcmp is WORSE that the original unicode_compare
> code, but if we compile using memcmp with -fno-builtin-memcmp, we get that
> wonderful 2x performance increase again.

The standard memcmp being worse is a bit puzzling. Intuitively, it
should have roughly the same performance as the original function.
I also wonder whether the slowdown could materialize on non-glibc
systems.

> I am still rooting for -fno-builtin-memcmp in both Python 2.7 and 3.3 ...
> (after we put memcmp in unicode_compare)

A patch for unicode_compare would be a good start. Its performance can
then be checked on other systems (such as Windows).

Regards

Antoine.