[Python-Dev] Support for "wide" Unicode characters
Guido van Rossum
guido@digicool.com
Fri, 29 Jun 2001 11:24:56 -0400
> I'd suggest not to use the term character in this PEP at all;
> this is also what Mark Davis recommends in his paper on Unicode.
I like this idea! I know that I *still* have a hard time not to think
"C 'char' datatype, i.e. an 8-bit byte" when I read "character"...
> Why not make the codec used by Python to convert Unicode
> literals to Unicode strings an option just like the default
> encoding ?
>
> That way we could have a version of the unicode-escape codec
> which supports surrogates and one which doesn't.
Smart idea, but how practical is this? Can you spec this out a bit more?
> +1 on removing knowledge about surrogates from the Unicode
> implementation core (it's also the easiest: there is none :-)
Except for \U currently -- or is that not part of the implementation core?
> We should provide a new module which provides a few handy
> utilities though: functions which provide code point-,
> character-, word- and line- based indexing into Unicode
> strings.
But its design is outside the scope of this PEP, I'd say.
--Guido van Rossum (home page: http://www.python.org/~guido/)