[Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

M.-A. Lemburg mal at egenix.com
Wed Feb 15 22:07:02 CET 2006


Jason Orendorff wrote:
> Instead of byte literals, how about a classmethod bytes.from_hex(), which
> works like this:
> 
>   # two equivalent things
>   expected_md5_hash = bytes.from_hex('5c535024cac5199153e3834fe5c92e6a')
>   expected_md5_hash = bytes([92, 83, 80, 36, 202, 197, 25, 145, 83, 227,
> 131, 79, 229, 201, 46, 106])
> 
> It's just a nicety; the former fits my brain a little better.  This would
> work fine both in 2.5 and in 3.0.
> 
> I thought about unicode.encode('hex'), but obviously it will continue to
> return a str in 2.x, not bytes.  Also the pseudo-encodings ('hex', 'rot13',
> 'zip', 'uu', etc.) generally scare me. 

Those are not pseudo-encodings, they are regular codecs.

It's a common misunderstanding that codecs are only seen as serving
the purpose of converting between Unicode and strings.

The codec system is deliberately designed to be general enough
to also work with many other types, e.g. it is easily possible to
write a codec that convert between the hex literal sequence you
have above to a list of ordinals:

""" Hex string codec

    Converts between a list of ordinals and a two byte hex literal
    string.

    Usage:
    >>> codecs.encode([1,2,3], 'hexstring')
    '010203'
    >>> codecs.decode(_, 'hexstring')
    [1, 2, 3]

    (c) 2006, Marc-Andre Lemburg.

"""
import codecs

class Codec(codecs.Codec):

    def encode(self, input, errors='strict'):

        """ Convert hex ordinal list to hex literal string.
        """
        if not isinstance(input, list):
            raise TypeError('expected list of integers')
        return (
            ''.join(['%02x' % x for x in input]),
            len(input))

    def decode(self,input,errors='strict'):

        """ Convert hex literal string to hex ordinal list.
        """
        if not isinstance(input, str):
            raise TypeError('expected string of hex literals')
        size = len(input)
        if not size % 2 == 0:
            raise TypeError('input string has uneven length')
        return (
            [int(input[(i<<1):(i<<1)+2], 16)
             for i in range(size >> 1)],
            size)

class StreamWriter(Codec,codecs.StreamWriter):
    pass

class StreamReader(Codec,codecs.StreamReader):
    pass

def getregentry():
    return (Codec().encode,Codec().decode,StreamReader,StreamWriter)

> And now that bytes and text are
> going to be two very different types, they're even weirder than before.
> Consider:
> 
>   text.encode('utf-8') ==> bytes
>   text.encode('rot13') ==> text
>   bytes.encode('zip') ==> bytes
>   bytes.encode('uu') ==> text (?)
> 
> This state of affairs seems kind of crazy to me.

Really ?

It all depends on what you use the codecs for. The above
usages through the .encode() and .decode() methods is
not the only way you can make use of them.

To get full access to the codecs, you'll have to use
the codecs module.

> Actually users trying to figure out Unicode would probably be better served
> if bytes.encode() and text.decode() did not exist.

You're missing the point: the .encode() and .decode() methods
are merely interfaces to the registered codecs. Whether they
make sense for a certain codec depends on the codec, not the
methods that interface to it, and again, codecs do not
only exist to convert between Unicode and strings.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 15 2006)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::


More information about the Python-Dev mailing list