[Web-SIG] Newline values in WSGI response header values.

Thu Jun 12 10:58:09 CEST 2008

> 2008/6/12 Sylvain Hellegouarch <sh at defuze.org>:
>>
>>> Can anyone confirm for me what the behaviour should be if someone
>>> includes a newline in the value of a WSGI response header?
>>>
>>> CGI specification would seem to disallow it and thus WSGI adapter
>>> should by rights possibly produce an error if user code does it.
>>>
>>> At the moment I know of no WSGI adapter implementation which validates
>>> whether a newline appears in the value of a WSGI response header. For
>>> many WSGI adapters this means that a header of:
>>>
>>>   Key1: "Value1\r\nKey2: Value2"
>>>
>>> will actually translate into two separate headers being sent back to
>>> client.
>>>
>>> For a header of:
>>>
>>>   Key3: "Value3a\r\nValue3b"
>>>
>>> in a WSGI adapter which simply passes things through, the client would
>>> get an invalid header line, which in general it would ignore. If
>>> however this was generated when hosted with a CGI-WSGI adapter, for
>>> Apache at least, Apache would generate a 500 error itself due to
>>> detected a header line of invalid format.
>>>
>>> Thus, is an embedded newline in value invalid? Would it be reasonable
>>> for a WSGI adapter to flag it as an error?
>>>
>>
>> I might be reading the spec wrong but it doesn't seem to be forbidden by
>> RFC 2616.
>>
>> Section 4.2 says:
>>
>>> Any LWS that occurs between field-content MAY be replaced with a single
>> SP before interpreting the field value or forwarding the message
>> downstream.
>>
>> Then a look at the definition of separators shows us that SP is a valid
>> separator.
>>
>> Since section 2.1 tells:
>>
>>> Except where noted otherwise, linear white space (LWS) can be included
>> between any two adjacent words (token or quoted-string), and between
>> adjacent words and separators, without changing the interpretation of a
>> field.
>>
>> It sounds to me that this is a valid construct but a WSGI adapter might
>> consider converting those CRLF into simple SP as said in 2.1 again:
>>
>>> A recipient MAY replace any linear white space with a single SP before
>> interpreting the field value or forwarding the message downstream.
>
> A LWS is:
>
>   LWS            = [CRLF] 1*( SP | HT )
>
> Ie, not just a single CRLF, but a CRLF followed by a space or tab.
>
> Thus, can't just replace CRLF only with a space.
>
> Anyway, the wording of my question and reference to CGI was a bit
> wrong, as WSGI response headers are probably more governed by HTTP
> RFC.
>
> To clarify, what we really have is two cases, the first is return of a
> value with a valid LWS as specified by HTTP RFC.
>
> If the WSGI adapter is mapping direct to HTTP, then it can pass it
> straight through. If however the WSGI adapter hosts on top a interface
> with CGI like semantics, then it should translate LWS to single space
> as described.
>
> The second case is an embedded CRLF which isn't followed by space or
> tab and thus isn't a LWS. This is the case which causes problems and
> am asking whether it should be detected and flagged as an errornous
> response.
>

You might want to take the question to the HTTP-BIS charter and follow-up
on that issue:

http://tools.ietf.org/wg/httpbis/trac/ticket/30

- Sylvain

-- 
Sylvain Hellegouarch
http://www.defuze.org