[issue3300] urllib.quote and unquote - Unicode issues
Bill Janssen
report at bugs.python.org
Wed Aug 6 07:59:44 CEST 2008
Bill Janssen <bill.janssen at gmail.com> added the comment:
Here's my version of how quote and unquote should be implemented in
Python 3.0. I haven't looked at the uses of it in the library, but I'd
expect improper uses (and there are lots of them) will break, and thus
can be fixed.
Basically, percent-quoting is about creating an ASCII string that can be
safely used in URI from an arbitrary sequence of octets. So, my version
of quote() takes either a byte sequence or a string, and percent-quotes
the unsafe ones, and then returns a str. If a str is supplied on input,
it is first converted to UTF-8, then the octets of that encoding are
percent-quoted.
For unquote, there's no way to tell what the octets of the quoted
sequence may mean, so this takes the percent-quoted ASCII string, and
returns a byte sequence with the unquoted bytes. For convenience, since
the unquoted bytes are often a string in some particular character set
encoding, I've also supplied unquote_as_string(), which takes an
optional character set, and first unquotes the bytes, then converts them
to a str, using that character set encoding, and returns the resulting
string.
----------
nosy: +janssen
Added file: http://bugs.python.org/file11062/myunquote.py
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue3300>
_______________________________________
More information about the Python-bugs-list
mailing list