[Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

Nick Coghlan ncoghlan at gmail.com
Tue Sep 21 15:38:08 CEST 2010


On Tue, Sep 21, 2010 at 3:03 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> On the other hand, it is dangerous to provide a polymorphic API which
> does that more extensive parsing, because a less than paranoid
> programmer will have very likely allowed the parsed components to
> escape from the context where their encodings can be reliably
> determined.  Remember, *it is unlikely that they will ever be punished
> for their own lack of caution.*  The person who is doomed is somebody
> who tries to take that code and reuse it in a different context.

Yeah, that's the original reasoning that had me leaning towards the
parallel API approach. If I seem to be changing my mind a lot in this
thread it's because I'm genuinely torn between the desire to make it
easier to port existing 2.x code to 3.x by making the current API
polymorphic and the fear that doing so will reintroduce some of the
exact same bytes/text confusion that the bytes/str split is trying to
get rid of.

There's no real way for 2to3 to help with the porting issue either,
since it has no way to determine the original intent of the 2.x code.

I *think* avoiding the quote/unquote precedent and applying the rule
"bytes in -> bytes out" will help with avoiding the worst of any
potential encoding confusion problems though. At some point the
programmer is going to have to invoke decode() if they want a string
to pass to display functions and the like (or vice versa with
encode()) so there are still limits to how far any poorly handled code
will get before blowing up. (Basically, while the issue of programmers
assuming 'latin-1' or 'utf-8' or similar ASCII friendly encodings when
they shouldn't is real, I don't believe a polymorphic API here will
make things any *worse* than what would happen with a parallel API)

And if this turns out to be a disaster in practice:
a) on my head be it; and
b) we still have the option of the DeprecationWarning dance for bytes
inputs to the existing functions and moving to a parallel API

Still-trying-to-figure-out-what-moment-of-insanity-prompted-me-to-volunteer-to-tackle-this'ly,
Nick.


More information about the Python-Dev mailing list