[Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5

Juraj Sukop juraj.sukop at gmail.com
Sat Jan 11 13:56:56 CET 2014


On Sat, Jan 11, 2014 at 6:36 AM, Steven D'Aprano <steve at pearwood.info>wrote:

>
> I'm sorry, I don't understand what you mean here. I'm honestly not
> trying to be difficult, but you sound confident that you understand what
> you are doing, but your description doesn't make sense to me. To me, it
> looks like you are conflating bytes and ASCII characters, that is,
> assuming that characters "are" in some sense identical to their ASCII
> representation. Let me explain:
>
> The integer that in English is written as 100 is represented in memory
> as bytes 0x0064 (assuming a big-endian C short), so when you say "an
> integer is written down AS-IS" (emphasis added), to me that says that
> the PDF file includes the bytes 0x0064. But then you go on to write the
> three character string "100", which (assuming ASCII) is the bytes
> 0x313030. Going from the C short to the ASCII representation 0x313030 is
> nothing like inserting the int "as-is". To put it another way, the
> Python 2 '%d' format code does not just copy bytes.
>

Sorry, I should've included an example: when I said "as-is" I meant "1",
"0", "0" so that would be yours "0x313030."


> If you consider PDF as binary with occasional pieces of ASCII text, then
> working with bytes makes sense. But I wonder whether it might be better
> to consider PDF as mostly text with some binary bytes. Even though the
> bulk of the PDF will be binary, the interesting bits are text. E.g. your
> example:
>
> Even though the binary image data is probably much, much larger in
> length than the text shown above, it's (probably) trivial to deal with:
> convert your image data into bytes, decode those bytes into Latin-1,
> then concatenate the Latin-1 string into the text above.
>

This is similar to what Chris Barker suggested. I also don't try to be
difficult here but please explain to me one thing. To treat bytes as if
they were Latin-1 is bad idea, that's why "%f" got dropped in the first
place, right? How is it then alright to put an image inside an Unicode
string?

Also, apart from the in/out conversions, do any other difficulties come to
your mind?

Please also take note that in Python 3.3 and better, the internal
> representation of Unicode strings containing only code points up to 255
> (i.e. pure ASCII or pure Latin-1) is very efficient, using only one byte
> per character.
>

I guess you meant [C]Python...

In any case, thanks for the detailed reply.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140111/49ab687d/attachment.html>


More information about the Python-Dev mailing list