python 2.7 and unicode (one more time)

Thu Nov 20 09:59:13 EST 2014

On Fri, Nov 21, 2014 at 12:59 AM,  <random832 at fastmail.us> wrote:
> On Thu, Nov 20, 2014, at 07:35, Peter Otten wrote:
>> >>> "%s nötig %s" % (u"üblich", u"ähnlich")
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4:
>> ordinal not in range(128)
>
> This is surprising to me - why is it trying to decode the format string,
> rather than encode the arguments?

Why should it encode to bytes? Makes much better sense to work in
Unicode. But mainly, it has to do one of them, and be predictable. If
you add a float and an int, you have to predictably get back one of
those two types, and since neither is a perfect superset of the other,
one just has to be picked. (And that's float, because it's more likely
to be the better option.) In this case, picking Unicode to meet on is
easily the better option, because you'll often have pure-ASCII string
literals as format strings, and Unicode data being interpolated into
it. So using an ASCII codec is far more likely to succeed if you
decode the format string than if you encode the data.

Personally, I'd much rather be very clear about what's text and what's
bytes, and not have any automatic encoding at all. That's why I use
Python 3.

ChrisA