[Python-Dev] Why does base64 return bytes?

Greg Ewing greg.ewing at canterbury.ac.nz
Wed Jun 15 01:40:26 EDT 2016


Stephen J. Turnbull wrote:
> it does refer to *encoded* characters as the output of
> the encoding process:
> 
>  >     The encoding process represents 24-bit groups of input bits 
>  >     as output strings of 4 encoded characters.

The "encoding" being referred to there is the encoding
from input bytes to output characters, not an encoding
of the output characters as bytes.

Nowhere in RFC 4648 does it refer to the output as
being made up of "bytes" or "octets". It's always
described in terms of "characters".

> As I understand it, the intention of the standard
> in using "character" to denote the code unit is similar to that of RFC
> 3986: BASE encodings are intended to be printable and recognizable to
> humans.

Hmmm... so why then does it say, in section 4:

    The Base 64 encoding is designed to represent arbitrary sequences of
    octets in a form that ... need not be human readable.

> If you're using a non-ASCII-superset encoding such as EBCDIC
> for text I/O, then you should translate from ASCII to that encoding
> for display,

What about the channel you're sending the encoded data over?

Suppose I'm on Windows and I'm embedding the base64 encoded
data in a text message that I'm sending through a mail client
that accepts text in utf-16.

I hope you would agree that, in that situation, encoding the
base64 output in ASCII and giving those bytes directly to
the mail client would be very much the wrong thing to do?

-- 
Greg


More information about the Python-Dev mailing list