[issue3300] urllib.quote and unquote - Unicode issues

Wed Aug 6 07:59:44 CEST 2008

Bill Janssen <bill.janssen at gmail.com> added the comment:

Here's my version of how quote and unquote should be implemented in
Python 3.0.  I haven't looked at the uses of it in the library, but I'd
expect improper uses (and there are lots of them) will break, and thus
can be fixed.

Basically, percent-quoting is about creating an ASCII string that can be
safely used in URI from an arbitrary sequence of octets.  So, my version
of quote() takes either a byte sequence or a string, and percent-quotes
the unsafe ones, and then returns a str.  If a str is supplied on input,
it is first converted to UTF-8, then the octets of that encoding are
percent-quoted.

For unquote, there's no way to tell what the octets of the quoted
sequence may mean, so this takes the percent-quoted ASCII string, and
returns a byte sequence with the unquoted bytes.  For convenience, since
the unquoted bytes are often a string in some particular character set
encoding, I've also supplied unquote_as_string(), which takes an
optional character set, and first unquotes the bytes, then converts them
to a str, using that character set encoding, and returns the resulting
string.

----------
nosy: +janssen
Added file: http://bugs.python.org/file11062/myunquote.py

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue3300>
_______________________________________