[Python-ideas] format specifier for "not bytes"

Daniel Holth dholth at gmail.com
Fri Aug 24 22:21:48 CEST 2012


On Fri, Aug 24, 2012 at 4:03 PM, Paul Moore <p.f.moore at gmail.com> wrote:
> On 24 August 2012 20:21, Daniel Holth <dholth at gmail.com> wrote:
>> I was merely surprised by the implicit bytes to
>> "b'string'" conversion, and would like to be able to turn it off.
>
> The conversion is not really "implicit". It's precisely what the %s
> (or {!s}) conversion format *explicitly* requests - insert the str()
> of the supplied argument at this point in the output string. See
> library reference 6.1.3 "Format String Syntax" (I don't know if
> there's an equivalent description for % formatting).
>
> If you want to force an argument to be a string, you could always do
> something like this:
>
> def must_be_str(s):
>   if isinstance(s, str):
>     return s
>   raise ValueError
>
> x = "The value is {}".format(must_be_str(s))
>
> There's no "only insert a string here, raise an error for other types"
> format specifier, largely because formatting is in principle about
> *formatting* - converting other types to strings. In practice, most of
> my uses of formatting (and I suspect many other people's) is more
> about interpolation - inserting chunks of text into templates. For
> that application, a stricter form could be more useful, I guess.
>
> I could see value in a {!S} conversion specifier (in the terminology
> of library reference 6.1.3 "Format String Syntax") which overrode
> __format__ with a conversion function equivalent to must_be_str above.
> But I don't know if it would get much use (anyone careful enough to
> use it is probably careful enough of their types to not need it).
>
> Also, is it *really* what you want? Did your code accidentally pass
> bytes to a {!s} formatter, and yet *never* pass a number and get the
> right result? Or conversely, would you be willing to audit all your
> conversions to be sure that numbers were never passed, and yet *still*
> not be willing to ensure you have no bytes/str confusion? (Although as
> your use case was encode/decode dances, maybe bytes really are
> sufficiently special in your code - but I'd argue that needing to
> address this issue implies that you have some fairly subtle bugs in
> your encoding process that you should be fixing before worrying about
> this).

Hi Paul! You could probably guess that this is the wheel digital
signatures package. All the string formatting arguments (I hope) are
now passed to binary() or native() string conversion functions that do
less on Python 2.7 than on Python 3.

Yes, I would be willing to audit my code to ensure that numbers were
never passed. I am already calling .encode() and .decode() on most
objects in this pipeline. In my opinion int-when-usually-str is in
most cases as likely to be a bug as getting bytes() when you expect
str(). Python even has the -bb argument to help with this thing that
is almost never the right thing to do. How often does anyone who is
not writing a REPL ever expect "%s" % bytes() to produce b''?

In this particular case I could also make my life a lot easier by
extending the JSON serializer to accept bytes(), but I suppose I would
lose the string formatting operations.



More information about the Python-ideas mailing list