[Python-3000] base64 - bytes and strings

Mon Jul 30 02:43:19 CEST 2007

On 7/29/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Martin v. Löwis wrote:
> > The point that proponents of "base64 encoding should
> > yield strings" miss is that US-ASCII is *both* a character set,
> > and an encoding.
>
> Last time we discussed this, I went and looked at the
> RFC where base64 is defined. According to my reading of
> it, nowhere does it say that base64 output must be
> encoded as US-ASCII, nor any other particular encoding.
>
> It *does* say that the characters used were chosen because
> they are present in a number of different character sets
> in use at the time, and explicity mentions EBCDIC as one
> of those character sets.
>
> To me this quite clearly says that base64 is defined at
> the level of characters, not encodings.

I think it's all besides the point. We should look at the use cases. I
recall finding out once that a Java base64 implementation was much
slower than Python's -- turns out that the Java version was converting
everything to Strings; then we needed to convert back to bytes in
order to output them. My suspicion is that in the end using bytes is
more efficient *and* more convenient; it might take some looking
through the email package to confirm or refute this. (The email
package hasn't been converted to work in the struni branch; that
should happen first. Whoever does that might well be the one who tells
us how they want their base64 APIs.)

An alternative might be to provide both string- and bytes-based APIs,
although that doesn't help with deciding what the default one (the one
that uses the same names as 2.x) should do.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)