[Python-Dev] PEP 393 review

Mon Aug 29 21:20:53 CEST 2011

Am 29.08.2011 11:03, schrieb Dirkjan Ochtman:
> On Sun, Aug 28, 2011 at 21:47, "Martin v. Löwis" <martin at v.loewis.de> wrote:
>>  result strings. In PEP 393, a buffer must be scanned for the
>>  highest code point, which means that each byte must be inspected
>>  twice (a second time when the copying occurs).
> 
> This may be a silly question: are there things in place to optimize
> this for the case where two strings are combined? E.g. highest
> character in combined string is max(highest character in either of the
> strings).

Unicode_Concat goes like this

    maxchar = PyUnicode_MAX_CHAR_VALUE(u);
    if (PyUnicode_MAX_CHAR_VALUE(v) > maxchar)
        maxchar = PyUnicode_MAX_CHAR_VALUE(v);

    /* Concat the two Unicode strings */
    w = (PyUnicodeObject *) PyUnicode_New(
                            PyUnicode_GET_LENGTH(u) +
PyUnicode_GET_LENGTH(v),
                            maxchar);
    if (w == NULL)
        goto onError;
    PyUnicode_CopyCharacters(w, 0, u, 0, PyUnicode_GET_LENGTH(u));
    PyUnicode_CopyCharacters(w, PyUnicode_GET_LENGTH(u), v, 0,
                             PyUnicode_GET_LENGTH(v));

> Also, this PEP makes me wonder if there should be a way to distinguish
> between language PEPs and (CPython) implementation PEPs, by adding a
> tag or using the PEP number ranges somehow.

Well, no. This would equally apply to every single patch, and is just
not feasible. Instead, alternative implementations typically target a
CPython version, and then find out what features they need to implement
to claim conformance.

Regards,
Martin