[Web-SIG] HTTP header canonicalization?

Mon Aug 23 00:41:48 CEST 2004

On Aug 22, 2004, at 3:30 PM, Phillip J. Eby wrote:

> At 03:10 PM 8/22/04 -0700, Mark Nottingham wrote:
>> The only problem I'm aware of is Set-Cookie, which can have an 
>> unquoted expires date in it; e.g.,
>>
>>   Set-Cookie: CUSTOMER=WILE_E_COYOTE; path=/; expires=Wednesday, 
>> 09-Nov-99 23:12:40 GMT
>>
>> If you have two of these, the comma after the day (here, "Wednesday") 
>> makes parsing problematic.
>>
>> Note that this is only specified in the original netscape cookie spec 
>> [1], not the State Management RFC [2]. See section 10.1.2 of [2] for 
>> more discussion of this issue.
>>
>> So, you *shouldn't* see these, especially since WSGI is about the 
>> server side. All the same, I'll ask around to see how often they're 
>> still seen in the wild.
>
> Unfortunately, this seems like something that's awfully likely to be 
> present in Python frameworks "in the wild".

I'm honestly not sure. That was my assumption until recently, but I'm 
hopeful that RFC2109 may have reduced the need to accommodate this. 
Since it's a server-side framework, it can enforce conformance to the 
RFCs (there are other problems with using Expires on cookies anyway, 
esp. WRT caching) if it so chooses, as long as the application 
frameworks are willing to accept that.

>> Regarding ordering of headers with different names; I don't think so. 
>> Note that HTTP says
>>
>> """it is "good practice" to send general-header fields first, 
>> followed by request-header or response-header fields, and ending with 
>> the entity-header fields."""
>>
>> This isn't very strict, though.
>
> I was thinking that servers that want to follow "good practice" could 
> just have a list of headers in the desirable order, pulling them out 
> of the dictionary first.  In practice, *not* doing this simply means 
> that every application or framework has to know what order headers 
> "belong" in, so this doesn't seem like a terrible thing.

Agreed.

> Maybe a dictionary of lists would work?  That is, the ``headers`` 
> field would look like:
>
>     {'content-type': ['text/plain'], 'content-length': ['1234'], ...}
>
> This would be perhaps annoying for specifying simpler fields, but it 
> would still be easy to write utility functions to manipulate headers.

Would implementations be required to separate multiple header values 
into different list items?

> For the content, I'm thinking we should still prohibit embedded 
> control characters, but note that the server is allowed to "fold" long 
> header lines if it wishes (by replacing one or more whitespace 
> characters with '\r\n ').

That *may* get tricky if it does so in the middle of quoted content, 
e.g.,

Example: foo="bar
    baz"

if whitespace is significant inside the quotes.

--
Mark Nottingham     http://www.mnot.net/