[Python-3000] should rfc822 accept text io or binary io?

Tue Aug 7 18:51:34 CEST 2007

On 8/7/07, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
> On 8/6/07, Fred Drake <fdrake at acm.org> wrote:
> > On Aug 6, 2007, at 4:46 PM, skip at pobox.com wrote:
> > > I thought rfc822 was going away.  From the current module
> > > documentation:
> > > ...
> > > Shouldn't rfc822 be gone altogether in Python 3?
> >
> > Yes.  And the answers to Jeremy's questions about what sort of IO is
> > appropriate for the email package should be left to the email-sig as
> > well, I suspect.  It's good that they've come up.
>
> Hmmm.  Should we being using the email package to parse HTTP headers?
> RFC 2616 says that HTTP headers follow the "same generic format" as
> RFC 822, but RFC 822 says headers are ASCII and RFC 2616 says headers
> are arbitrary 8-bit values.  You'd need to parse them differently.

I'm confused (and too lazy to read the RFCs). How can you have case
insensitivity (as HTTP clearly has) if the headers are arbitrary 8-bit
values? Assuming they mean it's an ASCII superset, does that mean that
HTTP doesn't have case insensitivity for bytes with values > 127?

> I also wonder if it makes sense for httplib to depend on email.  If it
> is possible to write generic code, maybe it belongs in a common
> library rather than in either email or httplib.
>
> I meant my original email to ask a more general question:  Does anyone
> have some suggestions about how to design libraries that could deal
> with bytes or strings?  If an HTTP header value contains 8-bit binary
> data, does the client application expect bytes or a string in some
> encoding?
>
> If you have a library that consumes file-like objects, how do you deal
> with bytes vs. strings?  Do you have two constructor options so that
> the client can specify what kind of output the file-like object
> products?  Do you try to guess?  Do you just write code assuming
> strings and let it fail on a bad lower() call when it gets bytes?

In general I'm against writing polymorphic code that tries to work for
strings as well as bytes, except very small algorithms. For larger
amounts of code, you almost always run into the need for literals or
hashing or case conversion or other differences (e.g. \n vs. \r\n when
doing I/O).

I think it's conceptually cleaner to pick a particular type for an API
and stick to it. E.g. sockets, binary files (io.RawIOBase) and *dbm
files read/write bytes; text files (io.TextIOBase) read/write strings.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)