[Baypiggies] urllib.urlencode and encoding
Keith Dart ♂
keith at dartworks.biz
Thu Apr 19 02:08:13 CEST 2007
Tung Wai Yip wrote the following on 2007-04-18 at 16:51 PDT:
===
> urllib.urlencode() cannot encode unicode string itself. RFC 2396 has not
> taken unicode into consideration. So there is no rule on what to do with
> unicode in an URI. It is up to the application to decide on the encoding,
> e.g. UTF-8 first, url encoding next. Others might very well choose to use
> UTF-16 instead.
===
Nope, see RFC 3986:
Network Working Group T. Berners-Lee
Request for Comments: 3986 W3C/MIT
STD: 66 R. Fielding
Updates: 1738 Day Software
Obsoletes: 2732, *2396*, 1808
Section 2.5:
When a new URI scheme defines a component that represents textual
data consisting of characters from the Universal Character Set [UCS],
the data should first be encoded as octets according to the UTF-8
character encoding [STD63]; then only those octets that do not
correspond to characters in the unreserved set should be percent-
encoded. For example, the character A would be represented as "A",
the character LATIN CAPITAL LETTER A WITH GRAVE would be represented
as "%C3%80", and the character KATAKANA LETTER A would be represented
as "%E3%82%A2".
--
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Keith Dart <keith at dartworks.biz>
public key: ID: 19017044
<http://www.dartworks.biz/>
=====================================================================
More information about the Baypiggies
mailing list