[Python-Dev] PEP 460 reboot

Guido van Rossum guido at python.org
Tue Jan 14 19:52:05 CET 2014


On Tue, Jan 14, 2014 at 9:45 AM, Chris Barker <chris.barker at noaa.gov> wrote:
> On Tue, Jan 14, 2014 at 9:29 AM, Yury Selivanov <yselivanov.ml at gmail.com>
> wrote:
>>
>>  - Try str(), and do ".encode(‘ascii’, ‘stcict’)” on the result.
>
>
> please no -- that's the source of a lot of pain in py2 now.
>
> having a failure as a result of the value, rather than the type, of an
> object just makes hard-to-test for bugs. Everything will be hunky dory for
> development and testing, then in deployment some idiot ( ;-) ) will pass in
> some non-ascii compatible string and you get  failure. And the person who
> gets the failure doesn't understand why, or they wouldn't have passed in
> non-ascii values in the first place...
>
> Ease of porting is nice, but let's not make it easy to port bug-prone code.

Right. This is a big red flag to me as well.

I think there is some inherent conflict between the extensible design
of str.format() and the practical needs of people who are actually
going to use formatting operations (either % or .format()) with bytes.

The *practical* needs are mostly limited to supporting basic number
formatting (decimal, hex, padding) and interpolation of anything that
supports the buffer interface. It would also be nice if you didn't
have to specify the type at all in the format string, i.e. {} should
do the right thing for numbers and (all sorts of) bytes.

But the way to arrive at this behavior without duplicating a whole lot
of code seems to be to call the existing text-based __format__ API and
convert the result to bytes -- for numbers this should be safe (their
formatting produces just ASCII digits and a selected few other ASCII
characters) but leads to an undesirable outcome for other types -- not
just str but also e.g. lists or dicts containing str instances, since
those call __repr__ on the contained items, and repr() may produce
non-ASCII bytes.

This is why my earlier proposal used ascii(), which is a "nerfed"(*)
version of repr(). This does the right thing for numbers as well as
for many other types (e.g. None, bool) and does something unpleasant
for text strings that is perhaps better than the alternative.

Which reminds me. Quite a few people have spoken out in favor of loud
failures rather than silent "wrong" output. But I think that in the
specific context of formatting output, there is a long and IMO good
tradition of producing (slightly) wrong output in favor of more strict
behavior. Consider for example what to do when a number doesn't fit in
the given width. Would you rather raise an exception, truncate the
value, or mess up the formatting? All languages newer than Fortran
that I've used have chosen the latter, and I still agree it's a good
idea. Similar with infinities, NaN, or None. (Yes, it's embarrassing
to have a website displaying 'null'. But isn't a 500 even *more*
embarrassing?)

This doesn't mean I'm insensitive to the argument in favor of loud and
early failure. It's just that I can see both sides of the coin, and
I'm still deciding which argument is more important.

(*) Gamer slang for a weapon made less dangerous. :-)

-- 
--Guido van Rossum (python.org/~guido)


More information about the Python-Dev mailing list