[Python-Dev] Unicode and comparisons

M.-A. Lemburg mal@lemburg.com
Tue, 04 Apr 2000 11:26:53 +0200


Fredrik bug report made me dive a little deeper into compares
and contains tests.

Here is a snapshot of what my current version does:

>>> '1' == None
0
>>> u'1' == None
0
>>> '1' == 'aäöü'
0
>>> u'1' == 'aäöü'
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: UTF-8 decoding error: invalid data

>>> '1' in ('a', None, 1)
0
>>> u'1' in ('a', None, 1)
0
>>> '1' in (u'aäöü', None, 1)
0
>>> u'1' in ('aäöü', None, 1)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: UTF-8 decoding error: invalid data

The decoding errors occur because 'aäöü' is not a valid
UTF-8 string (Unicode comparisons coerce both arguments
to Unicode by interpreting normal strings as UTF-8
encodings of Unicode).

Question: is this behaviour acceptable or should I go
even further and mask decoding errors during compares
and contains tests too ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/