[Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5

Ethan Furman ethan at stoneleaf.us
Sat Jan 11 20:05:36 CET 2014


On 01/11/2014 10:36 AM, Steven D'Aprano wrote:
> On Sat, Jan 11, 2014 at 08:20:27AM -0800, Ethan Furman wrote:
>>
>>    unicode to bytes
>>    bytes to unicode using latin1
>>    unicode to bytes
>
> Where do you get this from? I don't follow your logic. Start with a text
> template:
>
> template = """\xDE\xAD\xBE\xEF
> Name:\0\0\0%s
> Age:\0\0\0\0%d
> Data:\0\0\0%s
> blah blah blah
> """
>
> data = template % ("George", 42, blob.decode('latin-1'))
>
> Only the binary blobs need to be decoded. We don't need to encode the
> template to bytes, and the textual data doesn't get encoded until we're
> ready to send it across the wire or write it to disk.

And what if your name field has data not representable in latin-1?

--> '\xd1\x81\xd1\x80\xd0\x83'.decode('utf8')
u'\u0441\u0440\u0403'

--> '\xd1\x81\xd1\x80\xd0\x83'.decode('utf8').encode('latin1')
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-2: ordinal not in range(256)

So really your example should be:

data = template % ("George".encode('some_non_ascii_encoding_such_as_cp1251').decode('latin-1'), 42, blob.decode('latin-1'))

Which is a mess.

--
~Ethan~


More information about the Python-Dev mailing list