[Web-SIG] HTTP header canonicalization?
Phillip J. Eby
pje at telecommunity.com
Mon Aug 23 01:33:05 CEST 2004
At 03:41 PM 8/22/04 -0700, Mark Nottingham wrote:
>On Aug 22, 2004, at 3:30 PM, Phillip J. Eby wrote:
>>Maybe a dictionary of lists would work? That is, the ``headers`` field
>>would look like:
>>
>> {'content-type': ['text/plain'], 'content-length': ['1234'], ...}
>>
>>This would be perhaps annoying for specifying simpler fields, but it
>>would still be easy to write utility functions to manipulate headers.
>
>Would implementations be required to separate multiple header values into
>different list items?
No. Readers would be required to look at all list items.
>>For the content, I'm thinking we should still prohibit embedded control
>>characters, but note that the server is allowed to "fold" long header
>>lines if it wishes (by replacing one or more whitespace characters with
>>'\r\n ').
>
>That *may* get tricky if it does so in the middle of quoted content, e.g.,
>
>Example: foo="bar
> baz"
>
>if whitespace is significant inside the quotes.
I think I'm going to punt on this by saying that the server can split or
fold headers only if it can do so *safely*, where "safely" means, "the
server has sufficient understanding of the header's format or semantics". :(
A possible alternative is to allow applications to fold their own headers,
but I'm reluctant to do this because I fear people using e.g. '\n' when
they should use '\r\n' and suchlike. Banning control characters means the
server can easily detect when a supplied header is broken, *and* the server
knows it always adds a single CRLF to the end of each header.
More information about the Web-SIG
mailing list