unicode encoding usablilty problem

"Martin v. Löwis" martin at v.loewis.de
Fri Feb 18 15:16:01 EST 2005


aurora wrote:
> The Java 
> has a much more  usable model with unicode used internally and 
> encoding/decoding decision  only need twice when dealing with input and 
> output.

In addition to Fredrik's comment (that you should use the same model
in Python) and Walter's comment (that you can enforce it by setting
the default encoding to "undefined"), I'd like to point out the
historical reason: Python predates Unicode, so the byte string type
has many convenience operations that you would only expect of
a character string.

We have come up with a transition strategy, allowing existing
libraries to widen their support from byte strings to character
strings. This isn't a simple task, so many libraries still expect
and return byte strings, when they should process character strings.
Instead of breaking the libraries right away, we have defined
a transitional mechanism, which allows to add Unicode support
to libraries as the need arises. This transition is still in
progress.

Eventually, the primary string type should be the Unicode
string. If you are curious how far we are still off that goal,
just try running your program with the -U option.

Regards,
Martin



More information about the Python-list mailing list