[Python-Dev] PEP 460: allowing %d and %f and mojibake

Sun Jan 12 02:10:37 CET 2014

On 1/11/2014 1:50 PM, Ethan Furman wrote:
> Perhaps that's the problem.  According to the docs:
> ========================================================================
>  object.__bytes__(self)
>
>     Called by bytes() to compute a byte-string representation of an 
> object. This should return a bytes object.
> ========================================================================
>
> Obviously, with the plethora of different binary possibilities for 
> representing a number (how many bytes? endianness? which complement?), 
> we would be well within our rights to decide that the "byte-string 
> representation" of the numeric types is the ASCII equivalent of their 
> __repr__ or __str__, and implement __bytes__ appropriately for them.  
> Any other object that wants to be represented easily in a byte stream 
> would also have to implement __bytes__.   If necessary we could add 
> __bytes__ to str for /strict/ ASCII conversion (even latin-1 would 
> have to be explicitly encoded)[1]. 

In spite of Victor's explanation of internals, which I didn't 
understand, this sounds like a very interesting idea, conceptually, that 
any object could implement its __bytes__representation.

On the other hand, it would probably have to be parameterized in the 
general case: for binary data values, one protocol or format may wish 
the data to be big-endian, and another may wish the data to be 
little-endian; for str, one protocol or format may require one encoding 
and another may require a different encoding, even (as for email) for 
different parts of the message. So it could be somewhat complex, yet 
would be very powerful in allowing complex objects, made up of other 
objects, some of which might have a variety of potential bytes formats 
(think TIFF files, for example) to convert themselves into a stream of 
bytes that fits the standard. On the flip side, one would want to 
convert the stream of bytes into the set of objects, which is a parsing 
problem.

This is a bit beyond what can be done automatically, just by calling 
__bytes__ with no parameters, though.

What it may be, though, is a meta-operation from which the needed bytes 
operations can be determined. It may also not be an easy "compatible 
with existing Python 2 code with minor tweaks" solution, either. It 
would be more like a pickle protocol, but pickle defines its own 
formats, and thus is useless for creating standard formats.

I guess it would belong on python-ideas.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140111/0a8e52c3/attachment.html>