[Python-Dev] transform() and untransform() methods, and the codec registry

Fri Dec 3 16:11:29 CET 2010

On Fri, 03 Dec 2010 10:16:04 +0100, Victor Stinner <victor.stinner at haypocalc.com> wrote:
> On Thursday 02 December 2010 19:06:51 georg.brandl wrote:
> > Author: georg.brandl
> > Date: Thu Dec  2 19:06:51 2010
> > New Revision: 86934
> >
> > Log:
> > #7475: add (un)transform method to bytes/bytearray and str, add back codecs
> > that can be used with them from Python 2.
> 
> Oh no, someone did it. Was it really needed to reintroduce rot13 and friends?
> 
> I'm not strongly opposed to .transform()/.untranform() if it can be complelty
> separated to text encodings (ascii, latin9, utf-8 & cie). But str.encode() and
> bytes.decode() do accept transform codec names and raise strange error
> messages. Quote of Martin von LÃ¶wis (#7475):
> 
> "If the codecs are restored, one half of them becomes available to
> .encode/.decode methods, since the codec registry cannot tell which
> ones implement real character encodings, and which ones are other
> conversion methods. So adding them would be really confusing."
> 
> >>> 'abc'.transform('hex')
> TypeError: 'str' does not support the buffer interface
> >>> b'abc'.transform('rot13')
> TypeError: expected an object with the buffer interface

I find these 'buffer interface' error messages to be the most confusing
error message I get out of Python3 no matter what context they show up
in.  I have no idea what they are telling me.  That issue is more
general than transform/untransform, but perhaps it could be fixed
for transform/untransform in particular.

> >>> b'abcd'.decode('hex')
> TypeError: decoder did not return a str object (type=bytes)
> >>> 'abc'.encode('rot13')
> TypeError: encoder did not return a bytes object (type=str)

These error messages make perfect sense to me.  I think it
is called "duck typing" :)

> I don't like transform() and untransform() because I think that we should not
> add too much operations to the base types (bytes and str), and they do
> implicit module import. I prefer explicit module import (eg. import binascii;
> binascii.hexlify(b'to hex')). It remembers me PHP and it's ugly namespace with
> +5000 functions. I prefer Python because it uses smaller and more namespaces
> which are more specific and well defined. If we add email and compression
> functions to bytes, why not adding a web browser to the str?

As MAL says, the codec machinery is a general purpose tool.  I think
it, and the transform methods, are a useful level of abstraction over
a general class of problems.

Please also recall that transform/untransform was discussed before
the release of Python 3.0 and was approved at the time, but it just
did not get implemented before the 3.0 release.

--
R. David Murray                                      www.bitdance.com