[Web-SIG] Newline values in WSGI response header values.

Thu Jun 12 10:38:17 CEST 2008

2008/6/12 Sylvain Hellegouarch <sh at defuze.org>:
>
>> Can anyone confirm for me what the behaviour should be if someone
>> includes a newline in the value of a WSGI response header?
>>
>> CGI specification would seem to disallow it and thus WSGI adapter
>> should by rights possibly produce an error if user code does it.
>>
>> At the moment I know of no WSGI adapter implementation which validates
>> whether a newline appears in the value of a WSGI response header. For
>> many WSGI adapters this means that a header of:
>>
>>   Key1: "Value1\r\nKey2: Value2"
>>
>> will actually translate into two separate headers being sent back to
>> client.
>>
>> For a header of:
>>
>>   Key3: "Value3a\r\nValue3b"
>>
>> in a WSGI adapter which simply passes things through, the client would
>> get an invalid header line, which in general it would ignore. If
>> however this was generated when hosted with a CGI-WSGI adapter, for
>> Apache at least, Apache would generate a 500 error itself due to
>> detected a header line of invalid format.
>>
>> Thus, is an embedded newline in value invalid? Would it be reasonable
>> for a WSGI adapter to flag it as an error?
>>
>
> I might be reading the spec wrong but it doesn't seem to be forbidden by
> RFC 2616.
>
> Section 4.2 says:
>
>> Any LWS that occurs between field-content MAY be replaced with a single
> SP before interpreting the field value or forwarding the message
> downstream.
>
> Then a look at the definition of separators shows us that SP is a valid
> separator.
>
> Since section 2.1 tells:
>
>> Except where noted otherwise, linear white space (LWS) can be included
> between any two adjacent words (token or quoted-string), and between
> adjacent words and separators, without changing the interpretation of a
> field.
>
> It sounds to me that this is a valid construct but a WSGI adapter might
> consider converting those CRLF into simple SP as said in 2.1 again:
>
>> A recipient MAY replace any linear white space with a single SP before
> interpreting the field value or forwarding the message downstream.

A LWS is:

  LWS            = [CRLF] 1*( SP | HT )

Ie, not just a single CRLF, but a CRLF followed by a space or tab.

Thus, can't just replace CRLF only with a space.

Anyway, the wording of my question and reference to CGI was a bit
wrong, as WSGI response headers are probably more governed by HTTP
RFC.

To clarify, what we really have is two cases, the first is return of a
value with a valid LWS as specified by HTTP RFC.

If the WSGI adapter is mapping direct to HTTP, then it can pass it
straight through. If however the WSGI adapter hosts on top a interface
with CGI like semantics, then it should translate LWS to single space
as described.

The second case is an embedded CRLF which isn't followed by space or
tab and thus isn't a LWS. This is the case which causes problems and
am asking whether it should be detected and flagged as an errornous
response.

Graham