[Python-Dev] PEP 461 Final?

Ethan Furman ethan at stoneleaf.us
Sat Jan 18 23:01:03 CET 2014


On 01/18/2014 05:48 AM, Nick Coghlan wrote:
> On 18 Jan 2014 11:52, "Ethan Furman" wrote:
>>
>> I'll admit to being somewhat on the fence about %a.
>>
>> It seems there are two possibilities with %a:
>>
>>   1) have it be ascii(repr(obj))
>>
>>   2) have it be str(obj).encode('ascii', 'strict')
>
> This gets very close to crossing the line into implicit encoding of text again. Binary interpolation is being added back
> for the specific use case of working with ASCII compatible segments in binary formats, and it's at best arguable that
> supporting %a will help with that use case.

Agreed.


> However, without it, there may be a greater temptation to inappropriately define __bytes__ just to support binary
> interpolation, rather than because a type truly has an appropriate translation directly to bytes.

True.


> By allowing %a, we avoid that temptation. This is also potentially useful specifically in the case of binary logging
> formats and as a quick way to request backslash escaping of non-ASCII characters in text.
>
> Call it +0.5 for allowing %a. I don't expect it to be used heavily, but I think it will head off a fair bit of potential
> misuse of __bytes__.

So, if %a is added it would act like:

---------
   "%a" % some_obj
---------
   tmp = str(some_obj)
   res = b''
   for ch in tmp:
       if ord(ch) < 256:
           res += bytes([ord(ch)]
       else:
           res += unicode_escape(ch)
---------

where 'unicode_escape' would yield something like "\u0440" ?

--
~Ethan~


More information about the Python-Dev mailing list