[Python-Dev] PEP 393 Summer of Code Project

Fri Aug 26 19:26:38 CEST 2011

On Fri, Aug 26, 2011 at 10:13 AM, Paul Moore <p.f.moore at gmail.com> wrote:
> On 26 August 2011 18:02, Guido van Rossum <guido at python.org> wrote:
>
>> Eek. No, please. Those platforms' native string types have length and
>> slicing operations that are O(1) and work in terms of 16-bit code
>> points. Python should use those. It would be awful if Java and Python
>> code doing the same manipulations on the same string would come to
>> different conclusions because Python tried to paper over surrogates.
>
> *That* is actually the erroneous assumption I had made - that the Java
> and .NET native string type had code point semantics (i.e., took
> surrogates into account). As that isn't the case, my comments aren't
> valid - and I agree that having common semantics (and hence exposing
> surrogates) is too important to lose.

Those platforms probably *also* have libraries of operations to
support writing apps that conform to the Unicode standard. But those
apps will have to be aware of the difference between the "naive"
length of a string and the number of code points of characters in it.

> On the other hand, that pretty much establishes that whatever PEP 393
> achieves in terms of allowing all builds of CPython to offer code
> point semantics, the language definition can't mandate it.

The most severe consequence to me seems that the stdlib (which is
reused by those other platforms) cannot assume CPython's ideal world
-- even if specific apps sometimes can.

-- 
--Guido van Rossum (python.org/~guido)