[Python-Dev] transform() and untransform() methods, and the codec registry

Guido van Rossum guido at python.org
Thu Dec 9 19:42:27 CET 2010


On Mon, Dec 6, 2010 at 3:39 AM, M.-A. Lemburg <mal at egenix.com> wrote:
> Guido van Rossum wrote:
>> The moratorium is intended to freeze the state of the language as
>> implemented, not whatever was discussed and approved but didn't get
>> implemented (that'd be a hole big enough to drive a truck through, as
>> the saying goes :-).
>
> Sure, but those two particular methods only provide interfaces
> to the codecs sub-system without actually requiring any major
> implementation changes.
>
> Furthermore, they "help ease adoption of Python 3.x" (quoted from
> PEP 3003), since the functionality they add back was removed from
> Python 3.0 in a way that makes it difficult to port Python2
> applications to Python3.
>
>> Regardless of what I or others may have said before, I am not
>> currently a fan of adding transform() to either str or bytes.
>
> How should I read this ? Do want the methods to be removed again
> and added back in 3.3 ?

Given that it's in 3.2b1 I'm okay with keeping it. That's at best a
+0. I'd be -0 if it wasn't already in. But anyway this should suffice
to keep it in unless there are others strongly opposed.

> Frankly, I'm a bit tired of constantly having to argue against
> cutting down the Unicode and codec support in Python3.

But transform() isn't really about Unicode or codec support -- it is
about string-to-string and bytes-to-bytes transformations. At least
the transform() API is clear about the distinction between codecs
(which translate between bytes and string) and transforms (which keep
the type unchanged) -- though I still don't like that the registries
for transforms and codecs use the same namespace. Also bytes-bytes and
string-string transforms use the same namespace even though the
typical transform only supports one or the other. E.g. IMO all of the
following should raise LookupError:

>>> b'abc'.transform('rot13')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/guido/p3/Lib/encodings/rot_13.py", line 16, in encode
    return (input.translate(rot13_map), len(input))
TypeError: expected an object with the buffer interface

>>> b'abc'.decode('rot13')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/guido/p3/Lib/encodings/rot_13.py", line 19, in decode
    return (input.translate(rot13_map), len(input))
AttributeError: 'memoryview' object has no attribute 'translate'

>>> 'abc'.encode('rot13')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: encoder did not return a bytes object (type=str)

>>> b''.decode('rot13')
''

The latter may be a separate bug; b''.decode('anything') seems to not
even attempt to look up the codec.

-- 
--Guido van Rossum (python.org/~guido)


More information about the Python-Dev mailing list