[Python-Dev] unicodeobject.c,2.139,2.140 checkin
Jack Jansen
Jack.Jansen@oratrix.com
Thu, 25 Apr 2002 23:40:51 +0200
On donderdag, april 25, 2002, at 08:59 , Guido van Rossum wrote:
>> I don't know why it is, but Unicode always seems to unnecessarily
>> heat up any discussion involving it. I would really like to know
>> what is causing this: is it a religious issue, does it have to do
>> with the people involved or is Unicode inherently controversial ?
>
[...]
> Another issue is that adding Unicode was probably the most invasive
> set of changes ever made to the Python code base. It has complicated
> many parts of the code, and added at least a proportional share of
> bugs. (I found 166 source files in CVS containing some variation on
> the string "unicode", and 110 bug reports mentioning "unicode" in the
> SF bug tracker.)
Another thing that bothers me is that it retroactively changed
the interpretation of other Python objects. For me it's
perfectly logical that a character string is a character string,
unless there's a very good reason to treat it differently (a
framebuffer scanline, a binary blob, etc). And so if I have an
API OpenFileWithUnicodeName() that accepts a unicode filename I
expect that if I pass an 8-bit filename it would be converted on
the fly. Other people focus on different sets of API's, however,
and think there's nothing more logical than interpreting the
string object as a binary buffer containing UTF16 values or
what-have-you.
Scanlines or binary blobs hardly ever mixed with filenames, so
there wasn't an issue before unicode raised its pretty/ugly head.
(of course it could be argued that unicode has demonstrated a
design flaw in Python, namely that a single data-type was used
to store both binary data of unknown interpretation and
character arrays, and that there's now little more to be done
about that).
--
- Jack Jansen <Jack.Jansen@oratrix.com>
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution --
Emma Goldman -