python 2.7 and unicode (one more time)

Thu Nov 20 13:26:53 EST 2014

random832 at fastmail.us wrote:

> On Thu, Nov 20, 2014, at 09:59, Chris Angelico wrote:
>> On Fri, Nov 21, 2014 at 12:59 AM,  <random832 at fastmail.us> wrote:
>> > On Thu, Nov 20, 2014, at 07:35, Peter Otten wrote:
>> >> >>> "%s nötig %s" % (u"üblich", u"ähnlich")
>> >> Traceback (most recent call last):
>> >>   File "<stdin>", line 1, in <module>
>> >> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
>> >> 4: ordinal not in range(128)
>> >
>> > This is surprising to me - why is it trying to decode the format
>> > string, rather than encode the arguments?
>> 
>> Why should it encode to bytes?
> 
> Because a bytes format string suggests a bytes result. Why does unicode
> always "win", rather than the type of the format string always winning?

My guess is that when unicode was introduced the decision to propagate str 
to unicode in some cases was made because the developers expected that more 
old code that was unaware of unicode would continue to work. 

The old methods __mod__(), replace(), and join() that conceptually deal with 
strings propate while those that deal with characters -- center(), 
r/ljust(), translate() -- dont.

The newer format() method doesn't propagate which is probably due to a 
change in attitude rather than an oversight.