[Python-Dev] PEP 461 Final?
Ethan Furman
ethan at stoneleaf.us
Sat Jan 18 23:01:03 CET 2014
On 01/18/2014 05:48 AM, Nick Coghlan wrote:
> On 18 Jan 2014 11:52, "Ethan Furman" wrote:
>>
>> I'll admit to being somewhat on the fence about %a.
>>
>> It seems there are two possibilities with %a:
>>
>> 1) have it be ascii(repr(obj))
>>
>> 2) have it be str(obj).encode('ascii', 'strict')
>
> This gets very close to crossing the line into implicit encoding of text again. Binary interpolation is being added back
> for the specific use case of working with ASCII compatible segments in binary formats, and it's at best arguable that
> supporting %a will help with that use case.
Agreed.
> However, without it, there may be a greater temptation to inappropriately define __bytes__ just to support binary
> interpolation, rather than because a type truly has an appropriate translation directly to bytes.
True.
> By allowing %a, we avoid that temptation. This is also potentially useful specifically in the case of binary logging
> formats and as a quick way to request backslash escaping of non-ASCII characters in text.
>
> Call it +0.5 for allowing %a. I don't expect it to be used heavily, but I think it will head off a fair bit of potential
> misuse of __bytes__.
So, if %a is added it would act like:
---------
"%a" % some_obj
---------
tmp = str(some_obj)
res = b''
for ch in tmp:
if ord(ch) < 256:
res += bytes([ord(ch)]
else:
res += unicode_escape(ch)
---------
where 'unicode_escape' would yield something like "\u0440" ?
--
~Ethan~
More information about the Python-Dev
mailing list