[Python-checkins] r46002 - in python/branches/release24-maint: Misc/ACKS Misc/NEWS Objects/unicodeobject.c

"Martin v. Löwis" martin at v.loewis.de
Tue May 16 08:10:48 CEST 2006


M.-A. Lemburg wrote:
> Could you please make this fix apply only on Solaris,
> e.g. using an #ifdef ?!

That shouldn't be done. The code, as it was before, had
undefined behaviour in C. With the fix, it is now correct.

If you want to drop usage of memcpy on systems where you
think it isn't needed, you should make a positive list of
such systems, e.g. through an autoconf test (although such
a test is difficult to formulate).

> The memcpy is a lot more expensive than a simple memory
> copy via registers and this operation is done per code point
> in the Unicode string, so any change to the inner loop makes
> a difference.

This is a bit too pessimistic. On Linux/x86, with gcc 4.0.4,
this memcpy call is compiled into

        movl    8(%ebp), %eax          ; eax = s
        movzwl  (%eax), %edx           ; (e)dx = *s
        movl    -32(%ebp), %eax        ; eax = p
        movw    %dx, (%eax)            ; *p = dx (= *s)

So it *is* a simple memory copy via registers. Any modern C
compiler should be able to achieve this optimization: it can
know what memcpy does, it can compute the number of bytes to
be moved at compile time, see that this is two bytes only,
and avoid calling a function, or generating a copy loop.

(if you want to see what your compiler generates, put two
function calls, say, foo() and bar(), around this statement,
and find these function calls in the assembler output).

If you worry about compilers which cannot do this optimization,
you should use individual char assignments, e.g. through

        ((char*)p)[0] = s[0];
        ((char*)p)[1] = s[1];

(and similarly for Py_UNICODE_WIDE). While this also avoids
the function call, it does generate worse code for gcc 4.0.4:

        movl    8(%ebp), %eax
        movzbl  (%eax), %edx
        movl    -32(%ebp), %eax
        movb    %dl, (%eax)

        movl    8(%ebp), %eax
        movzbl  1(%eax), %edx
        movl    -32(%ebp), %eax
        movb    %dl, 1(%eax)

(other compiler might be able to compile this into a single
 two-byte move, of course).

Regards,
Martin


More information about the Python-checkins mailing list