[Python-Dev] [Python-checkins] cpython: Change decoders to use Unicode API instead of Py_UNICODE.

Wed Nov 9 11:15:25 CET 2011

First of all, thanks for having upgraded this huge part (codecs) to the new 
Unicode API!

> +static int
> +unicode_widen(PyObject **p_unicode, int maxchar)
> +{
> +    PyObject *result;
> +    assert(PyUnicode_IS_READY(*p_unicode));
> +    if (maxchar <= PyUnicode_MAX_CHAR_VALUE(*p_unicode))
> +        return 0;
> +    result = PyUnicode_New(PyUnicode_GET_LENGTH(*p_unicode),
> +                           maxchar);
> +    if (result == NULL)
> +        return -1;
> +    PyUnicode_CopyCharacters(result, 0, *p_unicode, 0,
> +                             PyUnicode_GET_LENGTH(*p_unicode));
> +    Py_DECREF(*p_unicode);
> +    *p_unicode = result;
> +    return 0;
> +}

PyUnicode_CopyCharacters() result must be checked. If you are sure that the 
call cannot fail, use copy_characters() which uses assertions in debug mode ( 
and no check in release mode).

> -#ifndef DONT_MAKE_RESULT_READY
> -    if (_PyUnicode_READY_REPLACE(&v)) {
> -        Py_DECREF(v);
> -        return NULL;
> -    }
> -#endif

Why did you remove this call from PyUnicode_DecodeRawUnicodeEscape(), 
_PyUnicode_DecodeUnicodeInternal(), PyUnicode_DecodeASCII() and 
PyUnicode_DecodeCharmap()? It may reuse latin1 characters singletons to share 
a little bit more memory (there is already a special case for empty string).

"_PyUnicode_READY_REPLACE" is maybe not the best name :-)

Victor