unicode codecs

"Martin v. Löwis" martin at v.loewis.de
Mon Feb 9 17:39:05 EST 2004


Ivan Voras wrote:
> - win32api function returns a string (8bit) with some of the characters 
> from the upper half of code page, let's call it s1

Are you absolutely certain that type(s1) is str?

> - a statement such as a='x'+s1 fails with the above error.

Are you absolutely certain the constant is the literal string 'x'?

> I don't really know why should concatenation check if characters are 
> 7-bit clean (or indeed if they represent anything in whatever code page).

As you have shown, there would be no need, and indeed, Python will not
check code pages in this case. So you must be doing something else.

> - call the unicode version of function. Returned is a unicode string 
> (checked, it really is unicode) like u'R\xfcgenwald.txt', let's call it s2
> - a statement a='x'+s2.encode('iso-8859-2') also fails with the exact 
> same error.

How do you know it is the concatenation that causes the exception?

> The exact error is:
>   File "E:\develop\pynetdb\netdbcreate.py", line 32, in walkdirs
>     fullname = root+'\\'+filename
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: 
> ordinal not in range(128)
> 
> The filename variable contains (in my latest effort) utf-8 encoded value 
> 'R\xc3\xbcgenwald.mp3', and root variable contains a normal non-unicode 
> string.

Which string precisely (what is its repr())?

Regards,
Martin




More information about the Python-list mailing list