unicode codecs

Ivan Voras ivoras at __geri.cc.fer.hr
Mon Feb 9 17:01:08 EST 2004


Christopher Koppler wrote:

> To get a real solution, you should also post the offending code, but
> you might try to convert your values to unicode with the built-in
> unicode() and the string method decode(). See the library reference
> sections 2.1 and 2.2.6.

I tried that, without luck. It is somewhat difficult to reproduce the 
problem, but here's how I see it:

- win32api function returns a string (8bit) with some of the characters 
from the upper half of code page, let's call it s1
- a statement such as a='x'+s1 fails with the above error.

I don't really know why should concatenation check if characters are 
7-bit clean (or indeed if they represent anything in whatever code page).

Since win32api functions exist also in unicode version, I tried this:

- call the unicode version of function. Returned is a unicode string 
(checked, it really is unicode) like u'R\xfcgenwald.txt', let's call it s2
- a statement a='x'+s2.encode('iso-8859-2') also fails with the exact 
same error.

It is strange that if I execute similar code in Idle (e.g. manually 
assigning string constants to variables and concatenating), everything 
works!

The exact error is:
   File "E:\develop\pynetdb\netdbcreate.py", line 32, in walkdirs
     fullname = root+'\\'+filename
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: 
ordinal not in range(128)

The filename variable contains (in my latest effort) utf-8 encoded value 
'R\xc3\xbcgenwald.mp3', and root variable contains a normal non-unicode 
string.

I tried various combinations of unicode and non-unicode types, and thay 
all fail sooner or later when they meet with a non-unicode string that 
is not 7-bit clean.



More information about the Python-list mailing list