[I18n-sig] Re: [Python-Dev] Unicode debate

Guido van Rossum guido@python.org
Tue, 02 May 2000 10:15:50 -0400


[me]
> >Why not Latin-1?  Because it gives us Western-alphabet users a false
> >sense that our code works, where in fact it is broken as soon as you
> >change the encoding.

[Just]
> Yeah, and? It least it'll *show* it's broken instead of *silently* doing
> the wrong thing with utf-8.
> 
> It's like using Python ints all over the place, and suddenly a user of the
> application enters data that causes an integer overflow. Boom. Program
> needs to be fixed. What's the big deal?

The big deal is that in some cultures, 8-bit strings with non-ASCII
bytes are unlikely to be Latin-1.  Under the Latin-1 convention, they
would get garbage when mixing Unicode and regular strings.  This is
more like ingoring overflow on integer addition (so that 2000000000*2
yields -2442450944).  I am against silently allowing erroneous results
like this if I can help it.

[Just, in a different message]
> Of course it's not, and of course you shouldn't be counting votes. However,
> the fact that more and more people chime in on the Latin-1 side (even
> non-western oriented people like Ping and Moshe!) should ring a bell.

Significantly, neither Ping nor Moshe cares for Latin-1 at all: they
don't have a use for a default encoding.  This is because they have no
hope that their preferred encoding would be elected as the default
encoding.

Note that I think that the ASCII default encoding is essential --
ASCII is the character set used by the Python language for
identifiers, and any 8-bit source encoding should always be a superset
of ASCII.  Essentially, Python has always made the (implicit)
guarantee that programs using only the ASCII character set are
portable w.r.t. character encodings -- I think this is important.

Having no default encoding would be like having no automatic coercion
between ints and long ints -- I tried this in very early Python
versions (around 0.9.1 I believe) but Tim Peters and/or Steve Majewski
quickly dissuaded me of this bad idea.

--Guido van Rossum (home page: http://www.python.org/~guido/)