[Web-SIG] HTTP header canonicalization?

Mon Aug 23 01:33:05 CEST 2004

At 03:41 PM 8/22/04 -0700, Mark Nottingham wrote:
>On Aug 22, 2004, at 3:30 PM, Phillip J. Eby wrote:
>>Maybe a dictionary of lists would work?  That is, the ``headers`` field 
>>would look like:
>>
>>     {'content-type': ['text/plain'], 'content-length': ['1234'], ...}
>>
>>This would be perhaps annoying for specifying simpler fields, but it 
>>would still be easy to write utility functions to manipulate headers.
>
>Would implementations be required to separate multiple header values into 
>different list items?

No.  Readers would be required to look at all list items.

>>For the content, I'm thinking we should still prohibit embedded control 
>>characters, but note that the server is allowed to "fold" long header 
>>lines if it wishes (by replacing one or more whitespace characters with 
>>'\r\n ').
>
>That *may* get tricky if it does so in the middle of quoted content, e.g.,
>
>Example: foo="bar
>    baz"
>
>if whitespace is significant inside the quotes.

I think I'm going to punt on this by saying that the server can split or 
fold headers only if it can do so *safely*, where "safely" means, "the 
server has sufficient understanding of the header's format or semantics".  :(

A possible alternative is to allow applications to fold their own headers, 
but I'm reluctant to do this because I fear people using e.g. '\n' when 
they should use '\r\n' and suchlike.  Banning control characters means the 
server can easily detect when a supplied header is broken, *and* the server 
knows it always adds a single CRLF to the end of each header.