[I18n-sig] Re: [Python-Dev] Unicode debate

Moshe Zadka Moshe Zadka <moshez@math.huji.ac.il>
Wed, 3 May 2000 16:55:37 +0300 (IDT)


On Wed, 3 May 2000, Tim Peters wrote:

[Moshe Zadka]
> ...
> I'd much prefer Python to reflect a fundamental truth about Unicode,
> which at least makes sure binary-goop can pass through Unicode and
> remain unharmed, then to reflect a nasty problem with UTF-8 (not
> everything is legal).

[Tim Peters]
> Then you don't want Unicode at all, Moshe.  All the official encoding
> schemes for Unicode 3.0 suffer illegal byte sequences

Of course I don't, and of course you're right. But what I do want is for
my binary goop to pass unharmed through the evil Unicode forest. Which is
why I don't want it to interpret my goop as a sequence of bytes it tries
to decode, but I want the numeric values of my bytes to pass through to
Unicode uharmed -- that means Latin-1 because of the second design
decision of the horribly western-specific unicdoe - the first 256
characters are the same as Latin-1. If it were up to me, I'd use Latin-3,
but it wasn't, so it's not.

> (for example, 0xffff
> is illegal in UTF-16 (whether BE or LE)

Tim, one of us must have cracked a chip. 0xffff is the same in BE and LE
-- isn't it.

--
Moshe Zadka <moshez@math.huji.ac.il>
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com