[I18n-sig] Re: [Python-Dev] Unicode debate

Peter Funk pf@artcom-gmbh.de
Wed, 3 May 2000 15:43:29 +0200 (MEST)


Hi!

[me]:
> >> I aggree with Just, Paul, Fredrik and Ping.

> At 8:30 AM -0400 02-05-2000, Guido van Rossum wrote:
> >
> >Sorry, this is not a democracy. :-)  I'm not counting votes, I'm
> >looking for contributions to the discussion.
 
Just van Rossum:
> Of course it's not, and of course you shouldn't be counting votes. However,
> the fact that more and more people chime in on the Latin-1 side (even
> non-western oriented people like Ping and Moshe!) should ring a bell.

Just: Thank you for trying to defend me... ;-)  But Guido was right, that I
didn't contribute any new argument to the discussion.  In the meantime
it has become really hard with somethinng really new.  
Nevertheless I will try:

May be the situation will become clearer and easier to understand,
if we simply rename the new Unicode string objects into "wide string
objects".  From this POV wide string objects are simply members of
a family of string objects in the same sense as integers, arbitrary
long ints and floats are members of the family of number types.

The whole encoding debate then becomes pointless, since the
interpretation of the content of a wide string object doesn't have
to be unicode at all.  (Although there might be no other useful 16
Bit wide encoding scheme available today).  This intepretation of
the encoding will be left over to the application in the same way
applications interpret the meaning of 8 bit strings as they like.
(usually as latin1 here but that's not the point).

So if mixing normal 8-bit strings with wide strings the expected 
behaviour should be similar to what happens, if mixing floats, long ints
and plain integers:  the value range is extended to fit the largest
operand.  Every other behaviour would be very surprising.

[ascii:]
Please don't drop the 8-bit transparency we already achieved during the
last decade:  I still remember the late 80s, where mailers, news transports
and other pieces of software tends to drop or truncate the eight bit.
So going back to ASCII won't do any good:  It will bother people in the same
way as the octal
>>> "Viel Glück"
'Viel Gl\374ck'
doesn't make much sense on an otherwise 8-bit clean system.

Regards, Peter