[Python-Dev] PEP 393 review

Fri Aug 26 18:56:07 CEST 2011

Am 26.08.2011 17:55, schrieb Stefan Behnel:
> Stefan Behnel, 25.08.2011 23:30:
>> Sadly, a quick look at a couple of recent commits in the pep-393 branch
>> suggested that it is not even always obvious to you as the authors which
>> macros can be called safely and which cannot. I immediately spotted a bug
>> in one of the updated core functions (unicode_repr, IIRC) where
>> PyUnicode_GET_LENGTH() is called without a previous call to
>> PyUnicode_FAST_READY().
> 
> Here is another example from unicodeobject.c, commit 56aaa17fc05e:
> 
> +    switch(PyUnicode_KIND(string)) {
> +    case PyUnicode_1BYTE_KIND:
> +        list = ucs1lib_splitlines(
> +            (PyObject*) string, PyUnicode_1BYTE_DATA(string),
> +            PyUnicode_GET_LENGTH(string), keepends);
> +        break;
> +    case PyUnicode_2BYTE_KIND:
> +        list = ucs2lib_splitlines(
> +            (PyObject*) string, PyUnicode_2BYTE_DATA(string),
> +            PyUnicode_GET_LENGTH(string), keepends);
> +        break;
> +    case PyUnicode_4BYTE_KIND:
> +        list = ucs4lib_splitlines(
> +            (PyObject*) string, PyUnicode_4BYTE_DATA(string),
> +            PyUnicode_GET_LENGTH(string), keepends);
> +        break;
> +    default:
> +        assert(0);
> +        list = 0;
> +    }
> 
> The assert(0) at the end will hit when the system is running out of
> memory while working on a wchar string.

No, that should not happen: it should never get to this point.

I agree with your observation that somebody should be done about error
handling, and will update the PEP shortly. I propose that
PyUnicode_Ready should be explicitly called on input where raising an
exception is feasible. In contexts where it is not feasible (such
as reading a character, or reading the length or the kind), failing to
ready the string should cause a fatal error.

What do you think?

Regards,
Martin