[Python-Dev] Support for "wide" Unicode characters

Guido van Rossum guido@digicool.com
Fri, 29 Jun 2001 11:24:56 -0400


> I'd suggest not to use the term character in this PEP at all;
> this is also what Mark Davis recommends in his paper on Unicode.

I like this idea!  I know that I *still* have a hard time not to think
"C 'char' datatype, i.e. an 8-bit byte" when I read "character"...

> Why not make the codec used by Python to convert Unicode
> literals to Unicode strings an option just like the default
> encoding ?
> 
> That way we could have a version of the unicode-escape codec
> which supports surrogates and one which doesn't.

Smart idea, but how practical is this?  Can you spec this out a bit more?

> +1 on removing knowledge about surrogates from the Unicode
> implementation core (it's also the easiest: there is none :-)

Except for \U currently -- or is that not part of the implementation core?

> We should provide a new module which provides a few handy
> utilities though: functions which provide code point-, 
> character-, word- and line- based indexing into Unicode 
> strings.

But its design is outside the scope of this PEP, I'd say.

--Guido van Rossum (home page: http://www.python.org/~guido/)