unicode codecs

Mon Feb 9 17:45:55 EST 2004

Ivan Voras wrote:

> When concatenating strings (actually, a constant and a string...) i get
> the following error:
> 
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 1:
> ordinal not in range(128)
> 
> Now I don't think either string is unicode, but I'm working with
> win32api so it might be... :) The point is: I know all values will fit
> in a particular code page (iso-8859-2), so how do I change the 'ascii'
> codec in the above error into something that will work?

You can either convert all strings to unicode or to iso-8859-2.
A hands on approach:

>>> u,s
(u'R\xfcbe', 'R\xfcbe')
>>> u+s
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 1:
ordinal not in range(128)

This error is prevented by an explicit conversion:

>>> u.encode("iso-8859-1") + s
'R\xfcbeR\xfcbe'

or 

>>> u + s.decode("iso-8859-1")
u'R\xfcbeR\xfcbe'

If you aren't sure which string is unicode and which is not:

>>> def toiso(s):
...     if isinstance(s, unicode):
...             return u.encode("iso-8859-1")
...     return s
...
>>> toiso(u) + toiso(s)
'R\xfcbeR\xfcbe'

Peter