Problem with sets and Unicode strings

Laurent Pointal laurent.pointal at limsi.fr
Wed Jun 28 04:02:12 EDT 2006


Dennis Benzinger a écrit :
> No, byte strings contain characters which are at least 8-bit wide
> <http://docs.python.org/ref/types.html>. But I don't understand what
> Python is trying to decode and why the exception says something about
> the ASCII codec, because my file is encoded with UTF-8.

[addendum to others replies]

The file encoding directive is used by Python to convert u"xxx" strings
into unicode objects using right conversion rules when compiling the code.
When a string is written simply with "xxx", its a 8 bits string with NO
encoding data associated. When these strings must be converted they are
considered to be using sys.getdefaultencoding() [generally ascii -
forced ascii in python 2.5]

So a short reply: the utf8 directive has no effect on 8 bits strings,
use unicode strings to manage correctly non-ascii texts.

A+

Laurent.




More information about the Python-list mailing list