[Python-Dev] Tcl and Unicode
Tim Peters
tim_one@email.msn.com
Sat, 7 Oct 2000 14:43:55 -0400
>> Fix for next iteration of SF bug 115690 (Unicode headaches in
>> IDLE). ...
[Guido]
> I apologize, I should have explained when text.get() returns Unicode:
>
> Any string returned from Tcl/Tk that contains a byte with the 8th bit
> set is translated from UTF-8 into Unicode, unless the translation
> fails (in which case the original raw 8-bit string is returned as a
> fallback).
Except that's *why* it was muddy <wink>: in the specific case that popped
up in the bug, text.get() appeared to return a Unicode string of length 1
containing only a newline. No high-bit byte appeared to be involved.
However, that was an illusion I didn't unmask until later. All is clear
now.
> This *should* be correct because Tcl/Tk always uses UTF-8 internally.
> (Even though it is "lenient" when receiving strings -- if a sequence
> of characters has no valid Unicode representation, it appears to falls
> back to Latin-1; I don't know the details of this algorithm.)
Dunno, but wouldn't be surprised if they had a notion of default encoding,
and that it simply appears to be Latin-1 to us because American Windows uses
a superset of Latin-1. If BeOpen would like to buy me a version of Chinese
Windows, happy to lend it to you <wink>.
as-american-as-they-come-ly y'rs - tim