why isn't Unicode the default encoding?

Mon Mar 20 15:00:49 EST 2006

John Salerno wrote:
> Forgive my newbieness, but I don't quite understand why Unicode is still 
> something that needs special treatment in Python (and perhaps 
> elsewhere). I'm reading Dive Into Python right now, and it constantly 
> refers to a 'regular string' versus a 'Unicode string' and how you need 
> to convert back and forth. But why isn't Unicode considered a regular 
> string by now? Is it for historical reasons that we still use ASCII and 
> Latin-1?

Well, *I* use UTF-8, but that's neither here nor there.

> Why can't Unicode replace them so we no longer need the 'u' 
> prefix or the encoding tricks?

It would break a hell of a lot of code. Try using the -U command line argument
to the Python interpreter. That makes unicode strings default.

[~]$ python -U
Python 2.4.1 (#2, Mar 31 2005, 00:05:10)
[GCC 3.3 20030304 (Apple Computer, Inc. build 1666)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 'foo'
u'foo'
>>>

Python tries very hard to remain backwards compatible. Python 3.0 is the
designated "break compatibility so we can remove all of the cruft that's built
up" release. It is still several years away although Guido is starting to work
on it now.

-- 
Robert Kern
robert.kern at gmail.com

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco