[Python-3000] should rfc822 accept text io or binary io?

"Martin v. Löwis" martin at v.loewis.de
Fri Aug 17 19:22:18 CEST 2007


> The odd thing here is that RFC 2047 (MIME) seems to be about encoding
> non-ASCII character sets in ASCII.  So the spec is kind of odd here.
> The actual bytes on the wire seem to be ASCII, but they may an
> interpretation where those ASCII bytes represent a non-ASCII string.

HTTP is fairly confused about usage of non-ASCII characters in headers.
For example, RFC 2617 specifies that, for Basic authentication, userid
and password are *TEXT (excluding : in the userid); it then says that
user-pass is base64-encoded. It nowhere says what the charset of userid
or password should be.

People now interpret that as saying: it's TEXT, so you need to encode
it according to RFC 2047 before using it in a header, requiring that
the userid first gets MIME-Q-encoded (say, or B), and then the result
gets base64-encoded again, then transmitted. Neither web browsers nor
web servers implement that correctly today.

But in short, the intention seems to be that the HTTP headers are
strict ASCII on the wire, with non-ASCII encoded using MIME header
encoding.

A library implementing that in Python should certainly use bytes
at the network (stream) side, and strings at the application side.
Even though the format is human-readable, the protocol is byte-oriented,
not character-oriented.

Regards,
Martin


More information about the Python-3000 mailing list