base64.b64encode(data)

Random832 random832 at fastmail.com
Mon Jun 13 09:36:06 EDT 2016


On Mon, Jun 13, 2016, at 06:35, Steven D'Aprano wrote:
> But this is a Python forum, and Python 3 is a language that tries
> very, very hard to keep a clean separation between bytes and text,

Yes, but that doesn't mean that you're right about which side of that
divide base64 output belongs on.

> where text is understood to mean Unicode, not a subset of ASCII-
> encoded bytes.

Sure. But let's not pretend that U+0020 through U+007E *aren't* unicode
characters. Base 64's output is characters. Those characters could be
encoded as ASCII, as UTF-32, as EBCDIC, and they would still be the same
characters.

At
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/uuencode.html
you can see in the rationale section a specific mention of using base64
with EBCDIC, and that the characters are all invariant across all EBCDIC
encodings being part of the reason for base64 using the characters it
does (as opposed to the historical uuencode algorithm's 0x20 through
0x5F, or as opposed to using some other non-alphanumeric characters than
+ / =)

The fact that many historical standards do mix text with ASCII-encoded
bytes and treat them interchangeably, as you said, does that you have to
read carefully to see which one they mean. The problem with your
argument, though, is that in base64's case it clearly *is* text. For
example, from the original privacy-enhanced mail standards - the very
first application of base64:

RFC 989:

"1.   (Local_Form) The message text is created (e.g., via an editor)
          in the system's native character set, with lines delimited in
          accordance with local convention."

RFC 1421:

"A plaintext message
   is accepted in local form, using the host's native character set and
   line representation."

And specifically in its description of base64 ("printable encoding"):

"Proceeding from
   left to right, the bit string resulting from step 3 is encoded into
   characters which are universally representable at all sites, though
   not necessarily with the same bit patterns (e.g., although the
   character "E" is represented in an ASCII-based system as hexadecimal
   45 and as hexadecimal C5 in an EBCDIC-based system, the local
   significance of the two representations is equivalent)."



More information about the Python-list mailing list