unicode encoding usablilty problem
aurora
aurora00 at gmail.com
Fri Feb 18 19:21:02 EST 2005
On Fri, 18 Feb 2005 21:16:01 +0100, Martin v. Löwis <martin at v.loewis.de>
wrote:
> I'd like to point out the
> historical reason: Python predates Unicode, so the byte string type
> has many convenience operations that you would only expect of
> a character string.
>
> We have come up with a transition strategy, allowing existing
> libraries to widen their support from byte strings to character
> strings. This isn't a simple task, so many libraries still expect
> and return byte strings, when they should process character strings.
> Instead of breaking the libraries right away, we have defined
> a transitional mechanism, which allows to add Unicode support
> to libraries as the need arises. This transition is still in
> progress.
I understand. So I wasn't yelling "why can't Python be more like Java". On
the other hand I also want to point out making individual decision for
each string wasn't practical and is very error prone. The fact that
unicode and 8 bit string look alike and work alike in common situation but
only run into problem with non-ASCII is very confusing for most people.
> Eventually, the primary string type should be the Unicode
> string. If you are curious how far we are still off that goal,
> just try running your program with the -U option.
Lots of errors. Amount them are gzip (binary?!) and strftime??
I actually quite appriciate Python's power in processing binary data as
8-bit strings. But perhaps we should transition to use unicode as text
string as treat binary string as exception. Right now we have
'' - 8bit string; u'' unicode string
How about
b'' - 8bit string; '' unicode string
and no automatic conversion. Perhaps this can be activated by something
like the encoding declarations, so that transition can happen module by
module.
> Regards,
> Martin
More information about the Python-list
mailing list