[Web-SIG] FW: Closing #63: RFC2047 encoded words

James Y Knight foom at fuhm.net
Wed Apr 8 20:14:10 CEST 2009


On Apr 8, 2009, at 12:57 PM, Robert Brewer wrote:
> Yes, but parsers need to continue decoding them for many years to  
> come.
> IMO WSGI origin servers should do this so we can write the decoding
> logic once and forget about it (assuming middleware and apps far
> outnumber origin servers).

Decoding RFC 2047 quoted words is rather trivial compared to correctly  
parsing all the HTTP headers. Plus, as I said before, you can't even  
*do* the RFC2047 decoding without parsing the headers at the same time  
to figure out which pieces need to be decoded! And furthermore, nobody  
needs to "continue" decoding them for years to come, *because nobody  
decodes them now*!

WSGI is intentionally exposing a fairly low-level view of the world.  
So my opinion is that the headers in the dict should be byte strings  
and that anyone who wants decoded headers also probably really wants  
(or ought to want!) parsed headers, and thus should be using an http  
header parsing library. That can expose values as unicode strings if  
it wants to.

If you want to start a discussion about having a standard parsed- 
header object in WSGI, that's another thing, but saying that WSGI  
servers should *partially* decode the headers seems rather silly to me.

James



More information about the Web-SIG mailing list