[Python-Dev] PEP 460: allowing %d and %f and mojibake
Glenn Linderman
v+python at g.nevcal.com
Sun Jan 12 02:10:37 CET 2014
On 1/11/2014 1:50 PM, Ethan Furman wrote:
> Perhaps that's the problem. According to the docs:
> ========================================================================
> object.__bytes__(self)
>
> Called by bytes() to compute a byte-string representation of an
> object. This should return a bytes object.
> ========================================================================
>
> Obviously, with the plethora of different binary possibilities for
> representing a number (how many bytes? endianness? which complement?),
> we would be well within our rights to decide that the "byte-string
> representation" of the numeric types is the ASCII equivalent of their
> __repr__ or __str__, and implement __bytes__ appropriately for them.
> Any other object that wants to be represented easily in a byte stream
> would also have to implement __bytes__. If necessary we could add
> __bytes__ to str for /strict/ ASCII conversion (even latin-1 would
> have to be explicitly encoded)[1].
In spite of Victor's explanation of internals, which I didn't
understand, this sounds like a very interesting idea, conceptually, that
any object could implement its __bytes__representation.
On the other hand, it would probably have to be parameterized in the
general case: for binary data values, one protocol or format may wish
the data to be big-endian, and another may wish the data to be
little-endian; for str, one protocol or format may require one encoding
and another may require a different encoding, even (as for email) for
different parts of the message. So it could be somewhat complex, yet
would be very powerful in allowing complex objects, made up of other
objects, some of which might have a variety of potential bytes formats
(think TIFF files, for example) to convert themselves into a stream of
bytes that fits the standard. On the flip side, one would want to
convert the stream of bytes into the set of objects, which is a parsing
problem.
This is a bit beyond what can be done automatically, just by calling
__bytes__ with no parameters, though.
What it may be, though, is a meta-operation from which the needed bytes
operations can be determined. It may also not be an easy "compatible
with existing Python 2 code with minor tweaks" solution, either. It
would be more like a pickle protocol, but pickle defines its own
formats, and thus is useless for creating standard formats.
I guess it would belong on python-ideas.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140111/0a8e52c3/attachment.html>
More information about the Python-Dev
mailing list