[Python-Dev] UCS2/UCS4 default

Wed Jul 2 20:47:02 CEST 2008

On Wed, Jul 2, 2008 at 11:35 AM, Jeroen Ruigrok van der Werven
<asmodai at in-nomine.org> wrote:
> -On [20080702 20:27], Guido van Rossum (guido at python.org) wrote:
>>I disagree. Instead, I would say that such code needs to be aware of
>>surrogates.
>
> Just to make sure I understood you:
>
> Python's code needs to be made aware of surrogates?

No, Python already is aware of surrogates. I meant applications
processing non-BMP text should beware of them.

> If so, do you want me to log issues for the things encountered?

If you find places where the Python core or standard library is doing
Unicode processing that would break when surrogates are present you
should file a bug. However this does not mean that every bit of code
that slices a string at an arbitrary point (and hence risks slicing in
the middle of a surrogate) is incorrect -- it all depends on what is
done next with the slice.

I'd also prefer to receive bug reports about breakages actually
encountered in the wild than purely theoretical issues. And in all
cases a fragment of test code to reproduce the problem would be
appreciated.

> --
> Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
> イェルーン ラウフロック ヴァン デル ウェルヴェン
> http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B
> Learn from the past -- don't wear it like a yoke around your neck...
>

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)