[Web-SIG] Move to bless Graham's WSGI 1.1 as official spec
Manlio Perillo
manlio_perillo at libero.it
Thu Dec 3 21:15:06 CET 2009
And Clover ha scritto:
> Manlio Perillo wrote:
>
>> However what about URI (that is, for PATH_INFO and the like)?
>> For URI (if I remember correctly) the suggested encoding is UTF-8, so
>> URLS should be decoded using
>
>> url.decode('utf-8', 'surrogateescape')
>
>> Is this correct?
>
> The currently-discussed proposal is ISO-8859-1, allowing the real bytes
> to be trivially extracted. This is consistent with the other headers and
> would be my preferred approach.
>
There is something that I don't understand.
Some HTTP headers, like Accept-Language, contains data described as
`token`, where:
token = 1*<any CHAR except CTLs or separators>
So a token, IMHO, is an opaque string, and it SHOULD not decoded.
In Python 3.x it SHOULD be a byte string.
Text content is described as `TEXT`, where:
The TEXT rule is only used for descriptive field contents and values
that are not intended to be interpreted by the message parser. Words
of *TEXT MAY contain characters from character sets other than ISO-
8859-1 [22] only when encoded according to the rules of RFC 2047
[14].
TEXT = <any OCTET except CTLs,
but including LWS>
The only type of data where TEXT can be used is `quoted-string`.
A `quoted-string` only appears in well specified portions of an header.
So, IMHO, it is *not* correct for a WSGI middleware, to return all HTTP
headers as Unicode strings.
This is up to the application/framework, that must parse each header,
split it in component and handle them as more appropriate (as byte
string, Unicode string or instance of some other data type).
> [...]
Regards Manlio
More information about the Web-SIG
mailing list