idle6.0 german umlauts (ascii >128 Exception)

jepler epler jepler.lnk at lnk.ispi.net
Sun Nov 19 16:11:20 EST 2000


On Sun, 19 Nov 2000 18:23:59 GMT, Nick Bensema
 <nickb at fnord.io.com> wrote:
>In article <4kpd0t8lpk6g37qrqg5tejqh6l82c086dq at 4ax.com>,
>Steve Horne  <sh at ttsoftware.co.uk> wrote:
>>Characters with umlauts are not defined in the ASCII character set.
>>You were probably getting away with something before that you
>>shouldn't have been - though to be honest, use of the unicode
>>character set rather than ASCII would be better these days.
>
>Correct me if I'm wrong, but isn't that still Idle's problem?
>
>In fact, if I'm not mistaken, characters 128 through 255 of Unicode
>conform exactly to the high ASCII in common use today.

In the UTF-8 encoding, the byes with values 0..127 are identical to ASCII.
Bytes with values 128..255 are used in UTF-8 for multibyte encodings of
characters not present in ASCII.  This is not compatible with the use
of byte values 128..255 to store the ISO 8859-1 characters.

It may be correct that the UCS-16 characters with values 0..255 correspond
to the ISO 8859-1 characterset.  But, not working with windows with any
regularity, UTF-8 is the only encoding of Unicode with which I am familiar
(it offers a reasonable trade-off between ability to use Unicode and
compatibility with old (but 8-bit-clean) code.  For instance, the Linux
filesystem was designed to be 8-bit-clean, and as a happy coincidence
you can choose to use the UTF-8 encoding on filenames---except you'll be
unable to read filenames which were in the iso-8859-1 characterset)
I think that UTF-8 is also the internal coding used in TCL 8.x.

Jeff



More information about the Python-list mailing list