python 2.7 and unicode (one more time)

random832 at fastmail.us random832 at fastmail.us
Thu Nov 20 12:42:59 EST 2014


On Thu, Nov 20, 2014, at 09:59, Chris Angelico wrote:
> On Fri, Nov 21, 2014 at 12:59 AM,  <random832 at fastmail.us> wrote:
> > On Thu, Nov 20, 2014, at 07:35, Peter Otten wrote:
> >> >>> "%s nötig %s" % (u"üblich", u"ähnlich")
> >> Traceback (most recent call last):
> >>   File "<stdin>", line 1, in <module>
> >> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4:
> >> ordinal not in range(128)
> >
> > This is surprising to me - why is it trying to decode the format string,
> > rather than encode the arguments?
> 
> Why should it encode to bytes?

Because a bytes format string suggests a bytes result. Why does unicode
always "win", rather than the type of the format string always winning?

> Makes much better sense to work in
> Unicode. But mainly, it has to do one of them, and be predictable.

Yeah, but string % is not a symmetrical operator. People's mental model
of it is likely to be that it acts like format (which does use the type
of the format string) or C sprintf/wsprintf (both of which use the same
type for the format string and result). And literally every other type
is converted to the type of the format string when used with %s - having
unicode be special adds cognitive load, and it means you can't safely
blindly use %s with an unknown object.



More information about the Python-list mailing list