[Python-Dev] Re: [Python-checkins]python/dist/src/Objects unicodeobject.c, 2.197, 2.198

M.-A. Lemburg mal at lemburg.com
Mon Sep 22 15:13:26 EDT 2003


Tim Peters wrote:
> [Tim]
> 
>>>At the moment, it appears there's no identified reason to care about
>>>signedness of a greater-than 16-bit type,
> 
> 
> [M.-A. Lemburg]
> 
>>Sure there is: first of all, having a single type that can
>>be signed on some platforms and unsigned on others is a bad
>>thing per se
> 
> 
> We inherit that from C, though -- it's fine by C if wchar_t is signed or
> unsigned, just as it refused to define the signedness of char.

It maybe fine for C... it is not for the Unicode implementation
since that has always assumed Py_UNICODE to be unsigned. This
is fixed now.

>>and second the 32-bit signed wchar_t value was what triggered this
>>thread in the first place.
> 
> What triggered the thread originally was a segfault due to the code making a
> branch based on the content of uninitialized memory.  The code clearly
> didn't *think* it was reading up random heap bits, so that was a bug
> regardless of wchar_t's signedness. 

True, but the test (unicode->str[0] < 256) is what revealed a
second bug and that's what we've been discussing all along.

> That wchar_t happened to be a signed
> 32-bit type on Jeremy's box is what uncovered the read-uninitialized-memory
> bug.
>
> If there's no other code vulnerable to bad behavior if wchar_t is a signed
> 32-bit type (nobody has identified another case), objections to it being
> signed anyway seem technically groundless. 

There are more comparisons of the above type in the code and
even worse: it is documented that Py_UNICODE is unsigned,
so it's very likely that code external to the Python distribution
such as codec packages or applications talking to libraries
use that assumption as well.

> Martin did give a technical
> reason (efficiency) for wanting to continue to use wchar_t on Jeremy's
> system.

Python won't be using wchar_t on those systems anymore, so
the problem is solved and the original intent restored. If
efficiency matters programmers are always free to cast Py_UNICODE
to wchar_t on these systems for fast read-only access.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Software directly from the Source  (#1, Sep 22 2003)
 >>> Python/Zope Products & Consulting ...         http://www.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::




More information about the Python-Dev mailing list