[Python-Dev] bytes.from_hex()

Just van Rossum just at letterror.com
Thu Mar 2 09:57:57 CET 2006


Ron Adam wrote:

> Josiah Carlson wrote:
> > Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> >>    u = unicode(b)
> >>    u = unicode(b, 'utf8')
> >>    b = bytes['utf8'](u)
> >>    u = unicode['base64'](b)   # encoding
> >>    b = bytes(u, 'base64')     # decoding
> >>    u2 = unicode['piglatin'](u1)   # encoding
> >>    u1 = unicode(u2, 'piglatin')   # decoding
> > 
> > Your provided semantics feel cumbersome and confusing to me, as
> > compared with str/unicode.encode/decode() .
> > 
> >  - Josiah
> 
> This uses syntax to determine the direction of encoding.  It would be 
> easier and clearer to just require two arguments or a tuple.
> 
>       u = unicode(b, 'encode', 'base64')
>       b = bytes(u, 'decode', 'base64')
> 
>       b = bytes(u, 'encode', 'utf-8')
>       u = unicode(b, 'decode', 'utf-8')
> 
>       u2 = unicode(u1, 'encode', 'piglatin')
>       u1 = unicode(u2, 'decode', 'piglatin')
> 
> 
> 
> It looks somewhat cleaner if you combine them in a path style string.
> 
>       b = bytes(u, 'encode/utf-8')
>       u = unicode(b, 'decode/utf-8')

It gets from bad to worse :(

I always liked the assymmetry between

    u = unicode(s, "utf8")

and

    s = u.encode("utf8")

which I think was the original design of the unicode API. Cudos for
whoever came up with that.

When I saw

    b = bytes(u, "utf8")

mentioned for the first time, I thought: why on earth must the bytes
constructor be coupled to the unicode API?!?! It makes no sense to me
whatsoever. Bytes have so much more use besides encoded text.

I believe (please correct me if I'm wrong) that the encoding argument of
bytes() was invented to make it easier to write byte literals. Perhaps a
true bytes literal notation is in order after all?

My preference for bytes -> unicode -> bytes API would be this:

    u = unicode(b, "utf8")  # just like we have now
    b = u.tobytes("utf8")   # like u.encode(), but being explicit
                            # about the resulting type

As to base64, while it works as a codec ("Why a base64 codec? Because we
can!"), I don't find it a natural API at all, for such conversions.

(I do however agree with Greg Ewing that base64 encoded data is text,
not ascii-encoded bytes ;-)

Just-my-2-cts


More information about the Python-Dev mailing list