[issue3300] urllib.quote and unquote - Unicode issues

Thu Aug 7 22:49:26 CEST 2008

Bill Janssen <bill.janssen at gmail.com> added the comment:

Just to reply to Antoine's comments on my patch:

- it would be nice to have more unit tests, especially for the various
bytes/unicode possibilities, and perhaps also roundtripping (Matt's
patch has a lot of tests)

Yes, I completely agree.

- quote_as_bytes() should return a bytes object, not a bytearray

Good point.

- using the "%02X" format looks clearer to me than going through the
_hextable lookup table...

Really?  I see it the other way.

- when the argument is of the wrong type, quote_as_bytes() should raise
a TypeError rather than a ValueError

Good point.

- why is quote_as_string() hardwired to utf8 while unquote_as_string()
provides a charset parameter? wouldn't it be better for them to be
consistent with each other?

To encourage the use of UTF-8.  The caller can alway explicitly encode
to some other character set, then pass in the bytes that result from
that encoding.  Remember that the RFC for percent-encoding really takes
bytes in, and produces bytes out.  The string-in and string-out versions
are to support naive programming (what a nice way of putting it!).

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue3300>
_______________________________________