[Python-Dev] urllib.quote and unquote - Unicode issues

Brett Cannon brett at python.org
Sat Jul 12 20:46:48 CEST 2008


On Sat, Jul 12, 2008 at 10:27 AM, Matt Giuca <matt.giuca at gmail.com> wrote:
> Hi all,
>
> My first post to the list. In fact, first time Python hacker, long-time
> Python user though. (Melbourne, Australia).
>

Welcome!

> Some of you may have seen for the past week or so my bug report on Roundup,
> http://bugs.python.org/issue3300
>
> I've spent a heap of effort on this patch now so I'd really like to get some
> more opinions and have this patch considered for Python 3.0.
>

Hopefully we can get to it in the near future. Since we are having two
more betas (one of this is next week) hopefully there is enough time
before hitting a release candidate to have this looked at.

> Basically, urllib.quote and unquote seem not to have been updated since
> Python 2.5, and because of this they implicitly perform Latin-1 encoding and
> decoding (with respect to percent-encoded characters). I think they should
> default to UTF-8 for a number of reasons, including that's what other
> software such as web browsers use.
>
> I've submitted a patch which fixes quote and unquote to use UTF-8 by
> default. I also added extra arguments allowing the caller to choose the
> encoding (after discussion, there was some consensus that this would be
> beneficial). I have now completed updating the documentation, writing
> extensive test cases, and testing the rest of the standard library for code
> breakage - with the result being there wasn't really any, everything seems
> to just work nicely with UTF-8. You can read the sordid details of my
> investigation in the tracker.
>
> Firstly, it'd be nice to hear if people think this is desirable behaviour.

Based on what is said in this email, it sounds reasonable.

> Secondly, if it's feasible to get this patch in Python 3.0. (I think if it
> were delayed to Python 3.1, the code breakage wouldn't justify it).

If what you are saying is true, then it can probably go in as a bug
fix (unless someone else knows something about Latin-1 on the Net that
makes this not true).

> And
> thirdly, if the first two are positive, if anyone would like to review this
> patch and check it in.
>

That I can't say I can necessarily due; have my own bug reports to
work through this weekend. =)

-Brett


More information about the Python-Dev mailing list