[Web-SIG] WSGI 2.0 Round 2: requirements and call for interest

Benoit Chesneau bchesneau at gmail.com
Wed Jan 6 04:13:01 EST 2016


On Tue, Jan 5, 2016 at 3:17 PM Aymeric Augustin <
aymeric.augustin.2003 at polytechnique.org> wrote:

> Hello Benoît,
>
>
> Le mardi 5 janvier 2016 14:13:48 UTC+1, Benoit Chesneau a écrit :
>>
>> Header formats which are btw US-ASCII in the HTTP spec now, could be
>> already solved if only the frameworks could comply with the spec instead of
>> trying to impose their own rules.
>>
>
> That's just a detail, but either I misunderstood you or you blamed the
> wrong side here.
>
> Non-ASCII data in request headers isn't a problem created by frameworks,
> it's a problem created by (possibly non compliant) user-agents.
>

I had in mind this ticket:
https://github.com/benoitc/gunicorn/issues/1151

As of today, because some applications are still sending response in a a
non compliant way we are trying to recode the headers on the server side so
we can send them. Today like in apache 2 (and I think nginx) we now just
ignore headers that can't be encoded in us-ascii. If all
applications/framework would give us the headers as Latin1 it wouldn't be a
major problem, but that's not the case.




>
> If future-WSGI guaranteed that HTTP header values provided in environ only
> contain ASCI, fameworks would be happy. Servers would likely have to
> respond 400 to requests containing non-ASCII headers, which would likely be
> considered a problematic backwards-incompatibility. It would go against the
> IETF principle of being tolerant in what a system accepts.
>

We should also update the spec to reflect the latest changes in the HTTP
specs to force applications to send to the gateway US-ASCII headers.

>
> If future-WSGI provided header values as bytes, frameworks would be happy
> as well. That would be my preference, because the application is in the
> best position to pick a charset for decoding the values (that would be
> UTF-8 in general).
>
> If future-WSGI insists on decoding header values with an arbitrary
> encoding, I believe it should do so with UTF-8 rather than ISO-8859-1. "The
> server is decoding with ISO-8859-1 so the application can reencode to get
> the raw bytes" never sounded like a compelling argument to me. It will
> still be wrong in theory, but it will generally give the right results in
> practice.
>

Hmm but, actually the HTTP spec insist that headers are neither utf-8,
neither latin1 (iso8859-1) but US-ASCII:

https://github.com/benoitc/gunicorn/issues/1151#issuecomment-158884740

so native strings or bytes are fine for me until we make sure that we are
sending and receiving US-ASCII.


> Best regards,
>
>
> --
> Aymeric.
>
> PS: if you find Django trying to impose its own rules, I'll do my best to
> correct that. As much as I can speak for the Django team, this isn't our
> intent. Please flag such cases so we can make sure there's no
> misunderstanding. Thanks!
>
>
Thanks! I will if needed :)

- benoît
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20160106/c013fc28/attachment.html>


More information about the Web-SIG mailing list