[Python-Dev] Add transform() and untranform() methods

Nick Coghlan ncoghlan at gmail.com
Sat Nov 16 01:26:02 CET 2013


On 16 Nov 2013 02:36, "Antoine Pitrou" <solipsis at pitrou.net> wrote:
>
> On Sat, 16 Nov 2013 00:46:15 +1000
> Nick Coghlan <ncoghlan at gmail.com> wrote:
> > On 16 November 2013 00:04, Antoine Pitrou <solipsis at pitrou.net> wrote:
> > >> Rather than the more useful:
> > >>
> > >> >>> b"abcdef".decode("hex")
> > >> Traceback (most recent call last):
> > >>   File "<stdin>", line 1, in <module>
> > >> TypeError: 'hex' decoder returned 'bytes' instead of 'str'; use
> > >> codecs.decode() to decode to arbitrary types
> > >
> > > I think this may be confusing.  TypeError seems to suggest that the
> > > parameter type sent by the user to the method is wrong, which is not
> > > the actual cause of the error.
> >
> > The TypeError isn't new,
>
> Really? That's not what your message said.

The second example in my post included restoring the "hex" alias for
"hex_codec" (its absence is the reason for the current "unknown encoding"
error). The 3.2 and 3.3 error message for a restored alias would have been
"TypeError: 'hex' decoder returned 'bytes' instead of 'str'", which I agree
is confusing and uninformative - that's why I added the reference to the
module level functions to the output type errors *before* proposing the
restoration of the aliases.

So you can already use "codecs.decode(s, 'hex_codec')" in Python 3, you
just won't get a useful error leading you there if you use the more common
'hex' alias instead.

To address Serhiy's security concerns with the compression codecs (which
are technically independent of the question of restoring the aliases), I
also plan to document how to systematically blacklist particular codecs in
an application by setting attributes on the encodings module and/or
appropriate entries in sys.modules.

Finally, I now plan to write a documentation PEP that suggests clearly
splitting the codecs module docs into two layers: the type agnostic core
infrastructure and the specific application of that infrastructure to the
implementation of the text encoding model.

The only functional *change* I'd still like to make for 3.4 is to restore
the shorthand aliases for the non-Unicode codecs (to ease the migration for
folks coming from Python 2), but this thread has convinced me I likely need
to write the PEP *before* doing that, and I still have to integrate
ensurepip into pyvenv before the beta 1 deadline.

So unless you and Victor are prepared to +1 the restoration of the codec
aliases (closing issue 7475) in anticipation of that codecs infrastructure
documentation PEP, the change to restore the aliases probably won't be in
3.4. (I *might* get the PEP written in time regardless, but I'm not betting
on it at this point).

Cheers,
Nick.

>
> Regards
>
> Antoine.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20131116/93cef44f/attachment-0001.html>


More information about the Python-Dev mailing list