[Web-SIG] Python 3.0 and WSGI 1.0.

Bill Janssen janssen at parc.com
Thu Apr 2 04:00:53 CEST 2009


Alan Kennedy <alan at xhaus.com> wrote:

> Hi Bill,
> 
> [Bill]
> > I think the controlling reference here is RFC 3875.
> 
> I think the controlling references are RFC 2616, RFC 2396 and RFC 3987.

I see what you're saying, but it's darn near impossible, as a practical
matter, to get any guidance on encoding matters from those.

The question is where those names come from, and they come from CGI, and
that is (practically speaking) defined these days by RFC 3875, as much as
anything.

> I think the question is "are people using IRIs in the wild"? If so,
> then we must decide how do we best deal with the problems of
> recognising iso-8859-1+rfc2037 versus utf-8, or whatever
> server-configured encoding the user has chosen.

See http://bugs.python.org/issue3300, where we went around and around
that question.  The answer seems to be, yes.

There are lots of useful fragments in that discussion, for instance:

``For the authority (server name) portion of a URI, RFC 3986 is
pretty clear that UTF-8 must be used for non-ASCII values (assuming, for
a moment, that IDNA addresses are not Punycode encoded already). For
the path portion of URIs, a large-ish proportion of them are, indeed,
UTF-8 encoded because that has been the de facto standard in Web browsers
for a number of years now. For the query and fragment parts, however,
the encoding is determined by context and often depends on the encoding
of some page that contains the form from which the data is taken. Thus,
a large number of URIs contain non-UTF-8 percent-encoded octets.''

Bill


More information about the Web-SIG mailing list