[Python-Dev] urllib.quote and unquote - Unicode issues

Bill Janssen janssen at parc.com
Wed Jul 30 18:52:26 CEST 2008


> On Wed, Jul 30, 2008 at 8:09 AM, André Malo <nd at perlig.de> wrote:
> > I'm actually in favour of encoding bytes only back and forth. A useful
> > extension would be *another* function which wraps quote/unquote and encod=
> es
> > and decodes characters.
> 
> I'd reverse this. By all means, add a new pair of functions that is
> bytes in / bytes out. But keep the existing functions purely string in
> / string out, hardcoded to UTF-8. People wanting another encoding can
> use the bytes functions and explicit encode / decode calls.

Actually (as I pointed out before) the existing functions are not
string-in/string-out.  They are something-in and bytes-out.  just look
like string-in/string-out because of the confusion between byte
strings and Unicode strings in Python 1 and 2.

Look, Matt's suggestion is a degradation of the integrity of the
stdlib, because it enthrones a broken understanding, a misreading of
the RFC, in a very prominent place.  I'd prefer not to have Python
contribute to that breakage.  Keep the functions the way they are now:
bytes-in and bytes-out.

Bill


More information about the Python-Dev mailing list