[Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5
Steven D'Aprano
steve at pearwood.info
Sun Jan 12 02:35:00 CET 2014
On Sat, Jan 11, 2014 at 08:13:39PM -0200, Mariano Reingart wrote:
> AFAIK (and just for the record), there could be both Latin1 text and UTF-16
> in a PDF (and other encodings too), depending on the font used:
[...]
> In Python2, txt is just a str, but in Python3 handling everything as latin1
> string obviously doesn't work for TTF in this case.
Nobody is suggesting that you use Latin-1 for *everything*. We're
suggesting that you use it for blobs of binary data that represent
arbitrary bytes. First you have to get your binary data in the first
place, using whatever technique is necessary. Here's one way to get a
blob of binary data:
# encode four C shorts into a fixed-width struct
struct.pack(">hhhh", 23, 42, 17, 99)
Here's another way:
# encode a text string into UTF-16
"My name is Steven".encode("utf-16be")
Both examples return a bytes object containing arbitrary bytes. How do
you combine those arbitrary bytes with a string template while still
keeping all code-points under U+0100? By decoding to Latin-1.
--
Steven
More information about the Python-Dev
mailing list