[Python-Dev] [Python-checkins] r46002 - in python/branches/release24-maint: Misc/ACKS Misc/NEWS Objects/unicodeobject.c

Tim Peters tim.peters at gmail.com
Tue May 16 19:21:36 CEST 2006


[M.-A. Lemburg]
>>> Could you please make this fix apply only on Solaris,
>>> e.g. using an #ifdef ?!

[Martin v. Löwis]
>> That shouldn't be done. The code, as it was before, had
>> undefined behaviour in C. With the fix, it is now correct.

[Marc-Andre]
> I don't understand - what's undefined in:
>
> const char *s;
> Py_UNICODE *p;
> ...
> *p = *(Py_UNICODE *)s;

The pointer cast:

    A pointer to an object or incomplete type may be converted to a pointer
    to a different object or incomplete type. If the resulting pointer is not
    correctly aligned for the pointed-to type, the behavior is undefined.

Since Py_UNICODE has a stricter alignment requirement than char,
there's no guarantee that _the content_ of p is correctly aligned for
Py_UNICODE after the cast.  Indeed, that's why the code segfaulted on
the Solaris box.  On other architectures it may not segfault but
"just" take much longer for the HW and SW to hide improperly aligned
access.

>> If you want to drop usage of memcpy on systems where you
>> think it isn't needed, you should make a positive list of
>> such systems, e.g. through an autoconf test (although such
>> a test is difficult to formulate).

> I don't want to drop memcpy() - just keep the existing
> working code on platforms where the memcpy() is not
> needed.

There's no clear way I know of to guess which platforms that may be.
Is it possible to fiddle _PyUnicode_DecodeUnicodeInternal's _callers_
so that the char* `s` argument passed to it is always properly aligned
for Py_UNICODE?  Then the pointer cast would be fine.

> ...
> A modern compiler should know the alignment requirements
> of Py_UNICODE* on the platform and generate appropriate
> code.

The trend in modern compilers and architectures is to be less
forgiving of standard violations, not more.

> AFAICTL, only 64-bit platforms are subject to any
> such problems due to their requirement to have pointers
> aligned on 8-byte boundaries.

It's not the alignment of the pointer but of what the pointer points
_at_ that's at issue here.  While the effect of the pointer cast is
undefined, it's not the pointer cast that blows up.  It's
dereferencing the _result_ of the pointer cast that blows up:  it was
trying to read up a Py_UNICODE from an address that wasn't properly
aligned for Py_UNICODE.  That can blow up (or be very slow, or return
gibberish -- it's undefined) even if Py_UNICODE has an alignment
requirement of "just" 2 (which I expect was actually the case on the
Solaris box).


More information about the Python-Dev mailing list