[Python-Dev] Add transform() and untranform() methods

Sat Nov 16 14:05:49 CET 2013

On 16 November 2013 21:49, M.-A. Lemburg <mal at egenix.com> wrote:
> On 16.11.2013 01:47, Victor Stinner wrote:
>> Adding transform()/untransform() method to bytes and str is a non
>> trivial change and not everybody likes them. Anyway, it's too late for
>> Python 3.4.
>
> Just to clarify: I still like the idea of adding those methods.
>
> I just don't see what this addition has to do with the codecs.encode()/
> .decode() functions.

Part of the interest here is in making Python 3 better compete with
the ease of the following in Python 2:

>>> "68656c6c6f".decode("hex")
'hello'
>>> "hello".encode("hex")
'68656c6c6f'

Until recently, I (and others) thought the best Python 3 had to offer was:

>>> import codecs
>>> codecs.getencoder("hex")("hello")[0]
'68656c6c6f'
>>> codecs.getdecoder("hex")("68656c6c6f")[0]
'hello'

In reality, though, Python 3 has always supported the following, it
just wasn't documented so I (and others) didn't know it had actually
been available as an alternative interface to the codecs machinery
since Python 2.4:

>>> from codecs import encode, decode
>>> encode("hello", "hex")
'68656c6c6f'
>>> decode("68656c6c6f", "hex")
'hello'

That's almost as clean as the Python 2 version, it just requires the
initial import of the convenience functions from the codecs module.
The fact it is supported in Python 2 means that 2/3 compatible codecs
can also use it.

Accordingly, I now see ensuring that everyone has a common
understanding of *what is already available* as an essential next
step, and only then consider significant changes in the codecs
mechanisms*. I know I learned a hell of a lot about the distinction
between the type agnostic codec infrastructure and the Unicode text
model over the past several months, and I think this thread shows
clearly that there's still a lot of confusion over the matter, even
amongst core developers. That's a problem, and something we need to
fix before giving further consideration to the transform/untransform
idea.

*(Victor's proposal in issue 19619 is actually relatively modest, now
that I understand it properly, and entails taking the existing output
type checks and making it possible to do them in advance, without
touching input type checks)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia