[Python-Dev] PEP 393 review
Stefan Behnel
stefan_ml at behnel.de
Fri Aug 26 07:21:11 CEST 2011
Stefan Behnel, 25.08.2011 23:30:
> Stefan Behnel, 25.08.2011 20:47:
>> "Martin v. Löwis", 24.08.2011 20:15:
>>> - issues to be considered (unclarities, bugs, limitations, ...)
>>
>> A problem of the current implementation is the need for calling
>> PyUnicode_(FAST_)READY(), and the fact that it can fail (e.g. due to
>> insufficient memory). Basically, this means that even something as trivial
>> as trying to get the length of a Unicode string can now result in an error.
>
> Oh, and the same applies to PyUnicode_AS_UNICODE() now. I doubt that there
> is *any* code out there that expects this macro to ever return NULL. This
> means that the current implementation has actually broken the old API. Just
> allocate an "80% of your memory" long string using the new API and then
> call PyUnicode_AS_UNICODE() on it to see what I mean.
>
> Sadly, a quick look at a couple of recent commits in the pep-393 branch
> suggested that it is not even always obvious to you as the authors which
> macros can be called safely and which cannot. I immediately spotted a bug
> in one of the updated core functions (unicode_repr, IIRC) where
> PyUnicode_GET_LENGTH() is called without a previous call to
> PyUnicode_FAST_READY().
>
> I find it everything but obvious that calling PyUnicode_DATA() and
> PyUnicode_KIND() is safe as long as the return value is being checked for
> errors, but calling PyUnicode_GET_LENGTH() is not safe unless there was a
> previous call to PyUnicode_Ready().
And, adding to my own mail yet another time, the current header file states
this:
"""
/* String contains only wstr byte characters. This is only possible
when the string was created with a legacy API and PyUnicode_Ready()
has not been called yet. Note that PyUnicode_KIND() calls
PyUnicode_FAST_READY() so PyUnicode_WCHAR_KIND is only possible as a
intialized value not as a result of PyUnicode_KIND(). */
#define PyUnicode_WCHAR_KIND 0
"""
From my understanding, this is incorrect. When I call PyUnicode_KIND() on
an old style object and it fails to allocate the string buffer, I would
expect that I actually get PyUnicode_WCHAR_KIND back as a result, as the
SSTATE_KIND_* value in the "state" field has not been initialised yet at
that point.
Stefan
More information about the Python-Dev
mailing list