[I18n-sig] Unicode strings: an alternative

Just van Rossum just@letterror.com
Fri, 5 May 2000 19:32:23 +0100


[Tom Emerson]
> Hmmmm... how often do you expect to compare narrow vs. wide strings,
> using default comparison (i.e. = or !=)? What if I'm using Latin 3 and
> use the byte comparison? I may very well have two strings (one narrow,
> one wide) that compare equal, even though they're not. Not exactly
> what I would expect.

True enough. The reason I don't mind this behavior is because I believe
it's largely unavoidable, since in many cases the encoding is unknown (to
the Python internals). Eg. I may very well have two _narrow_ strings that
compare equal, even though they're not... Not exactly what you would
expect, but there's nothing you can do about it. What I don't like about
the 7-bit proposal is that it tries to protect me from something that
should be my own responsibility. Imagine if the 7-bit proposal were used
for narrow strings:

>>> "\377" == "\377"
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
EncodingError: Not sure about that encoding, dude!

;-)

I just saw Guido's latest idea (triggered by Peter Funk I suppose, who had
some very good points): using the locale may indeed be a better compromise.

Just