[Web-SIG] HTTP header canonicalization?

Phillip J. Eby pje at telecommunity.com
Mon Aug 23 00:30:12 CEST 2004


At 03:10 PM 8/22/04 -0700, Mark Nottingham wrote:
>The only problem I'm aware of is Set-Cookie, which can have an unquoted 
>expires date in it; e.g.,
>
>   Set-Cookie: CUSTOMER=WILE_E_COYOTE; path=/; expires=Wednesday, 
> 09-Nov-99 23:12:40 GMT
>
>If you have two of these, the comma after the day (here, "Wednesday") 
>makes parsing problematic.
>
>Note that this is only specified in the original netscape cookie spec [1], 
>not the State Management RFC [2]. See section 10.1.2 of [2] for more 
>discussion of this issue.
>
>So, you *shouldn't* see these, especially since WSGI is about the server 
>side. All the same, I'll ask around to see how often they're still seen in 
>the wild.

Unfortunately, this seems like something that's awfully likely to be 
present in Python frameworks "in the wild".


>Regarding ordering of headers with different names; I don't think so. Note 
>that HTTP says
>
>"""it is "good practice" to send general-header fields first, followed by 
>request-header or response-header fields, and ending with the 
>entity-header fields."""
>
>This isn't very strict, though.

I was thinking that servers that want to follow "good practice" could just 
have a list of headers in the desirable order, pulling them out of the 
dictionary first.  In practice, *not* doing this simply means that every 
application or framework has to know what order headers "belong" in, so 
this doesn't seem like a terrible thing.



>Overall, I think that modelling headers as dictionary in the application 
>and passing them in that form to a server is a good thing, as long as the 
>Set-Cookie issue is kept in mind. Servers might have to modify their 
>serialisation on the wire to account for line lengths and aesthetics 
>(generally, the only time you run into line length problems is when you're 
>extending HTTP to do non-browsing things), but that doesn't need to be 
>exposed to the application.

Maybe a dictionary of lists would work?  That is, the ``headers`` field 
would look like:

     {'content-type': ['text/plain'], 'content-length': ['1234'], ...}

This would be perhaps annoying for specifying simpler fields, but it would 
still be easy to write utility functions to manipulate headers.

For the content, I'm thinking we should still prohibit embedded control 
characters, but note that the server is allowed to "fold" long header lines 
if it wishes (by replacing one or more whitespace characters with '\r\n ').



More information about the Web-SIG mailing list