[Python-Dev] Why does base64 return bytes?

Random832 random832 at fastmail.com
Tue Jun 14 14:02:02 EDT 2016


On Tue, Jun 14, 2016, at 13:19, Paul Sokolovsky wrote:
> Well, it's easy to remember the conclusion - it was decided to return
> bytes. The reason also wouldn't be hard to imagine - regardless of the
> fact that base64 uses ASCII codes for digits and letters, it's still
> essentially a binary data. 

Only in the sense that all text is binary data. There's nothing in the
definition of base64 specifying ASCII codes. It specifies *characters*
that all happen to be in ASCII's character repertoire.

>And the most natural step for it is to send
> it down the socket (socket.send() accepts bytes), etc.

How is that more natural than to send it to a text buffer that is
ultimately encoded (maybe not even in an ASCII-compatible encoding...
though probably) and sent down a socket or written to a file by a layer
that is outside your control? Yes, everything eventually ends up as
bytes. That doesn't mean that we should obsessively convert things to
bytes as early as possible.

I mean if we were gonna do that why bother even having a unicode string
type at all?

> I'd find it a bit more surprising that binascii.hexlify() returns
> bytes, but I personally got used to it, and consider it a
> consistency thing on binascii module.
> 
> Generally, with Python3 by default using (inefficient) Unicode for
> strings, 

Why is it inefficient?

> any efficient data processing would use bytes, and then one
> appreciates the fact that data encoding/decoding routines also return
> bytes, avoiding implicit expensive conversion to strings.


More information about the Python-Dev mailing list