python 2.7 and unicode (one more time)

Chris Angelico rosuav at gmail.com
Thu Nov 20 18:41:34 EST 2014


On Fri, Nov 21, 2014 at 4:42 AM,  <random832 at fastmail.us> wrote:
> On Thu, Nov 20, 2014, at 09:59, Chris Angelico wrote:
>>
>> Why should it encode to bytes?
>
> Because a bytes format string suggests a bytes result. Why does unicode
> always "win", rather than the type of the format string always winning?

For the same reason that float always "wins":

>>> 1.0 + 2
3.0
>>> 1 + 2.0
3.0

>> Makes much better sense to work in
>> Unicode. But mainly, it has to do one of them, and be predictable.
>
> Yeah, but string % is not a symmetrical operator. People's mental model
> of it is likely to be that it acts like format (which does use the type
> of the format string) or C sprintf/wsprintf (both of which use the same
> type for the format string and result). And literally every other type
> is converted to the type of the format string when used with %s - having
> unicode be special adds cognitive load, and it means you can't safely
> blindly use %s with an unknown object.

True, but Python 2 deliberately lets you conflate the two, so you get
a bit of convenience at the expensive of complexity when things go
wrong. Python 3, on the other hand, is much more careful about the
difference:

>>> "asdf %s qwer" % b"zxcv"
"asdf b'zxcv' qwer"
>>> b"asdf %s qwer" % "zxcv"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for %: 'bytes' and 'str'

So your complaint *has* been resolved... but only in Python 3, because
the change would break stuff.

ChrisA



More information about the Python-list mailing list