[Python-Dev] urllib.quote and unquote - Unicode issues

Guido van Rossum guido at python.org
Wed Aug 6 23:44:35 CEST 2008


On Wed, Aug 6, 2008 at 9:09 AM, "Martin v. Löwis" <martin at v.loewis.de> wrote:
>> Nobody's been
>> assigned to look at it and it hasn't been given a priority, even though
>> we all agree it's a bug (though we disagree on how to fix it).
>
> This I can explain (I think). Nobody is assigned to look: we usually
> don't do assignments of bugs or patches, except when there is a specific
> maintainer for the code in question. urllib has no maintainer.

I'm somehow strangely attracted to this issue, and have posted a bit
of a code review.

> It hasn't been given priority: There are currently 606 patches in the
> tracker, many fixing bugs of some sort. It's not clear (to me, at least)
> why this should be given priority over all the other things such as
> interpreter crashes.

Well, it's an API change, and those are impossible after beta3 is
released (or at least have to wait for 3.1).

> We all agree it's a bug: no, I don't. I think it's a missing feature,
> at best, but I'm staying out of the discussion. As-is, urllib only
> supports ASCII in URLs, and that is fine for most purposes. URLs
> are just not made for non-ASCII characters. Implement IRIs if you
> want non-ASCII characters; the rules are much clearer for these.

The wikipedia use of urlencoded UTF-8 (and other examples) suggest
that whether we like it or not the trend is towards this. I'd like to
support such usage rather than fight it (standards or no standards).

> As it stands, a committer would have
> - to agree it's an important problem
> - to agree the patch is correct
> - to judge it is not a new feature, as we are in beta already
> - implicitly accept maintenance of that change, and take all
>  the blame that it might produce in the coming years

It could be a small enough new feature. Also note that for urls that
only use ASCII there is no behavior change. (And certainly this *is*
an important use case, e.g. to encode occurrences of + or & in query
strings).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-Dev mailing list