[issue7649] "u'%c' % char" broken for chars in range '\x80'-'\xFF'

Thu Feb 25 17:43:17 CET 2010

Marc-Andre Lemburg <mal at egenix.com> added the comment:

Ezio Melotti wrote:
> 
> Ezio Melotti <ezio.melotti at gmail.com> added the comment:
> 
> The latest patch (issue7649v4.diff) checks if the char is ASCII or non-ASCII and then, if the char is ASCII, it converts it directly to Unicode, otherwise it tries to decode it using the default encoding, raising a UnicodeDecodeError if the decoding fails.

Thanks. The patch looks good now... but doesn't apply cleanly anymore,
since your first version has already made it into trunk and the 2.6 branch.

> I tested it setting iso-8859-1 and utf-8 as default encoding and the behavior was consistent with "%s", however the tests assume that the default encoding is always ASCII, so they failed (both the tests included in the patch and others in test_unicode). I'm not sure if in this case they should be changed/skipped or not.

I think that's fine. While we do still allow setting the default
to something other than ASCII in 2.x, we don't support such tricks,
so there's no need to test for them.

> (Also http://docs.python.org/c-api/unicode.html#built-in-codecs says that "Setting encoding to NULL causes the default encoding to be used which is ASCII.", but this is not always true. If you think it should be fixed I'll do it in a separate commit.)

The last part of that sentence should be removed.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue7649>
_______________________________________