[Python-Dev] urllib.quote and unquote - Unicode issues

Antoine Pitrou solipsis at pitrou.net
Wed Jul 30 17:08:29 CEST 2008


Facundo Batista <facundobatista <at> gmail.com> writes:

> 
> 2008/7/30 Matt Giuca <matt.giuca <at> gmail.com>:
> 
> > 2. Default to UTF-8.
> > In favour: Matt Giuca, Brett Cannon, Jeroen Ruigrok van der Werven
> > Pros: Fully working and tested solution is implemented; recommended by
> > RFC 3986 for all future schemes; recommended by W3C for use with HTML;
> > UTF-8 used by all major browsers; supports all characters; most
> > existing code compatible by default; unquote is inverse of quote.
> > Cons: By default, URIs may have invalid octet sequences (not possible
> > to reverse).
> 
> +1, assuming that if you have a different encoding in the URI you can
> pass it as a parameter.

+1 for me as well, with an optional encoding parameter to override the default.
Also, your "con" is a "pro" to me, since it means errors are reported instead of
silently producing garbage (as would be the case with latin1).

Regards

Antoine.




More information about the Python-Dev mailing list