[Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5

Sun Jan 12 02:35:00 CET 2014

On Sat, Jan 11, 2014 at 08:13:39PM -0200, Mariano Reingart wrote:

> AFAIK (and just for the record), there could be both Latin1 text and UTF-16
> in a PDF (and other encodings too), depending on the font used:
[...]
> In Python2, txt is just a str, but in Python3 handling everything as latin1
> string obviously doesn't work for TTF in this case.

Nobody is suggesting that you use Latin-1 for *everything*. We're 
suggesting that you use it for blobs of binary data that represent 
arbitrary bytes. First you have to get your binary data in the first 
place, using whatever technique is necessary. Here's one way to get a 
blob of binary data:

# encode four C shorts into a fixed-width struct
struct.pack(">hhhh", 23, 42, 17, 99)

Here's another way:

# encode a text string into UTF-16
"My name is Steven".encode("utf-16be")

Both examples return a bytes object containing arbitrary bytes. How do 
you combine those arbitrary bytes with a string template while still 
keeping all code-points under U+0100? By decoding to Latin-1.

-- 
Steven