[Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5

Nick Coghlan ncoghlan at gmail.com
Sun Jan 12 14:16:37 CET 2014


On 12 Jan 2014 21:53, "Juraj Sukop" <juraj.sukop at gmail.com> wrote:
>
>
>
>
> On Sun, Jan 12, 2014 at 2:35 AM, Steven D'Aprano <steve at pearwood.info>
wrote:
>>
>> On Sat, Jan 11, 2014 at 08:13:39PM -0200, Mariano Reingart wrote:
>>
>> > AFAIK (and just for the record), there could be both Latin1 text and
UTF-16
>> > in a PDF (and other encodings too), depending on the font used:
>> [...]
>> > In Python2, txt is just a str, but in Python3 handling everything as
latin1
>> > string obviously doesn't work for TTF in this case.
>>
>> Nobody is suggesting that you use Latin-1 for *everything*. We're
>> suggesting that you use it for blobs of binary data that represent
>> arbitrary bytes. First you have to get your binary data in the first
>> place, using whatever technique is necessary.
>
>
> Just to check I understood what you are saying. Instead of writing:
>
>     content = b'\n'.join([
>         b'header',
>         b'part 2 %.3f' % number,
>         binary_image_data,
>         utf16_string.encode('utf-16be'),
>         b'trailer'])
>
> it should now look like:
>
>     content = '\n'.join([
>         'header',
>         'part 2 %.3f' % number,
>         binary_image_data.decode('latin-1'),
>         utf16_string.encode('utf-16be').decode('latin-1'),
>         'trailer']).encode('latin-1')

Why are you proposing to do the *join* in text space? Encode all the parts
separately, concatenate them with b'\n'.join() (or whatever separator is
appropriate). It's only the *text formatting operation* that needs to be
done in text space and then explicitly encoded (and this example doesn't
even need latin-1,ASCII is sufficient):

    content = b'\n'.join([
        b'header',
         ('part 2 %.3f' % number).encode('ascii'),
         binary_image_data,
         utf16_string.encode('utf-16be'),
        b'trailer'])

> Correct?

My updated version above is the reasonable way to do it in Python 3, and
the one I consider clearly superior to reintroducing implicit encoding to
ASCII as part of the core text model.

This is why I *don't* have a problem with PEP 460 as it stands - it's just
syntactic sugar for something you can already do with b''.join(), and thus
not particularly controversial.

It's only proposals that add any form of implicit encoding
that silently switches from the text domain to the binary domain that
conflict with the core Python 3 text model (although third party types
remain largely free to do whatever they want).

Cheers,
Nick.

>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140112/6a9fbe0b/attachment.html>


More information about the Python-Dev mailing list