[Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs

Wed Jun 9 10:41:29 CEST 2010

Victor Stinner wrote:
> There are two opposite issues in the bug tracker:
> 
>    #7475: codecs missing: base64 bz2 hex zlib ...
>    -> reintroduce the codecs removed from Python3
> 
>    #8838: Remove codecs.readbuffer_encode()
>    -> remove the last part of the removed codecs
> 
> If I understood correctly, the question is: should codecs module only contain 
> encoding codecs, or contain also other kind of codecs.

Sorry, but I can only repeat what I've already mentioned
a few times on the tracker items: this is a misunderstanding.

The codec system does not mandate a specific type combination
(and that's per design). Only the helper methods .encode() and
.decode() on bytes and str objects in Python3 do in order to
provide type safety.

> Encoding codec API is now strict (encode: str->bytes, decode: bytes->str), 
> it's not possible to reuse str.encode() or bytes.decode() for the other 
> codecs. Marc-Andre Lemburg proposed to add .tranform() and .untranform() 
> methods to str, bytes and bytearray types. If I understood correctly, it would 
> look like:
> 
>    >>> b'abc'.transform("hex")
>    '616263'
>    >>> '616263'.untranform("hex")
>    b'abc'

No, .transform() and .untransform() will be interface to same-type
codecs, i.e. ones that convert bytes to bytes or str to str. As with
.encode()/.decode() these helper methods also implement type safety
of the return type.

The above example will read:

    >>> b'abc'.transform("hex")
    b'616263'
    >>> b'616263'.untranform("hex")
    b'abc'

> I suppose that each codec will have a different list of accepted input and 
> output types. Example:
> 
>    bz2: encode:bytes->bytes, decode:bytes->bytes
>    rot13: encode:str->str, decode:str->str
>    hex: encode:bytes->str, decode: str->bytes

hex will do bytes->bytes in both directions, just like it does
in Python2.

The methods to be used will be .transform() for the encode direction
and .untransform() for the decode direction.

> And so "abc".encode("bz2") would raise a TypeError.

Yes.

> --
> 
> In my opinion, we should not mix codecs of different kinds (compression, 
> cipher, etc.) because the input and output types are different. It would have 
> more sense to create a standard API for each kind of codec. Existing examples 
> of standard APIs in Python: hashlib, shutil.make_archive(), database API, etc.

If you want, you can have those as well, but then you'd
have to introduce new APIs or modules, whereas the codec
interface have existed for quite a while in Python2 and
are in regular use.

For most applications the very simple to use codec interface
to these codecs is all that is needed, so I don't see a strong
case for adding new interfaces, e.g.

hex_data = data.transform('hex')

looks clean and neat.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jun 09 2010)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2010-07-19: EuroPython 2010, Birmingham, UK                39 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/