[Python-Dev] Tcl and Unicode
Guido van Rossum
guido@python.org
Sat, 07 Oct 2000 08:51:12 -0500
> Fix for next iteration of SF bug 115690 (Unicode headaches in IDLE). The
> parsing functions in support of auto-indent weren't expecting Unicode
> strings, but text.get() can now return them (although it remains muddy as
> to exactly when or why that can happen). Fixed that with a Big Hammer.
I apologize, I should have explained when text.get() returns Unicode:
Any string returned from Tcl/Tk that contains a byte with the 8th bit
set is translated from UTF-8 into Unicode, unless the translation
fails (in which case the original raw 8-bit string is returned as a
fallback).
This *should* be correct because Tcl/Tk always uses UTF-8 internally.
(Even though it is "lenient" when receiving strings -- if a sequence
of characters has no valid Unicode representation, it appears to falls
back to Latin-1; I don't know the details of this algorithm.)
--Guido van Rossum (home page: http://www.python.org/~guido/)