Very strange unicode behaviour

Fri Jan 16 11:35:30 EST 2004

Syver Enstad <syver at inout.no> writes:

> Here's the interactive session
>
> Python 2.2.2 (#37, Oct 14 2002, 17:02:34) [MSC 32 bit (Intel)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
>>>> ord('\xe5')
> 229
>>>> '\xe5'.find(u'')
> -1
>>>> 'p\xe5'.find(u'')
> UnicodeError: ASCII decoding error: ordinal not in range(128)
>>>> 'p\xe4'.find(u'')
> -1
>>>> 'p\xe5'.find(u'')
> UnicodeError: ASCII decoding error: ordinal not in range(128)
>>>> print '\xe5'
> Õ
>>>> print 'p\xe5'
> pÕ
>>>> 'p\xe5'
> 'p\xe5'
>>>> def func():
> ...     try:
> ...         '\xe5'.find(u'')
> ...     except UnicodeError:
> ...         pass
> ...
>>>> func()
>>>> for each in range(1):
> ...     func()
> ...
> UnicodeError: ASCII decoding error: ordinal not in range(128)
>>>>
>
> It's weird that \xe5 throws and not \xe4 but even weirder that the
> exception is not cleared so that the loop reports it.
>
> Is this behaviour the same on Python 2.3?

No, it behaves correctly as it seems:

Python 2.3.3 (#51, Dec 18 2003, 20:22:39) [MSC v.1200 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> ord('\xe5')
229
>>> '\xe5'.find(u'')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in range(128)
>>> '\xe4'.find(u'')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)
>>>

Thomas