urllib, urllib2, httplib -- Begging for consolidation?

Tue May 7 11:26:54 EDT 2002

On Tue, 7 May 2002, A. Keyton Weissinger wrote:

> Am I the only one that thinks these need to be pulled together some? I saw a
> PEP (268?) where there are some rumblings about adding some things to it as
> well. Maybe a combo project?

Yes, part of the problem is that it's not obvious when you should use 
which (e.g. urllib vs. urllib2).

BUT, if there were to occur some sort of consolidation (meaning, 
introducing incompatibilities or a whole new module), then we should use 
that as an opportunity to restructure/redesign that whole set of modules 
because, IMO, they've evolved past their original design. If we can come 
up with a good organization, the actual implementation could be handled by 
various members of the community.

The original premise of urllib, that it helps your app open any type of 
URL in roughly the same way, is pretty neat but now both urllib and 
urllib2 have lots of stuff tacked on that is pretty HTTP-specific. Also, 
I usually need to support only one protocol and I know in advance which 
that is (usually HTTP, sometimes FTP), but the httplib docs imply that 
httplib is more of an internal module.

So... if we were to change something, I'd like us to build a rich HTTP
library that supports the super easy use case (gimme the data at this URL,
optionally posting this data right here first) as well as more complicated
cases (add in these request headers before sending the request to the
origin). It would be in this module (or one closely tied to it) that we'd
capture knowledge about the HTTP protocol, such as parsing and building
HTTP 1.0 and 1.1 compliant request and response headers, handling cookies,
basic and digest authentication, '\n' vs. '\r\n' line endings, easy-to-use
HTTPS, etc. Supporting routines (like quote, urlencode, urlparse) can
either be imported and exposed through the HTTP module, or kept in a
module with better definied boundaries.

We could take the same approach with other protocols, and include modules 
for FTP, plain files, etc. With all those in place we could still have the 
"open any type of URL" routine built on top, but it should work only for 
the simplest of use cases; if you need something more complex then you'd 
go use the corresponding protocol library yourself.

I'm not suggesting that we scrap the current protocol modules (they've be 
very, very useful); it's just that over time they've grown up and are due 
for some redesign/refactoring (the kind that will not be backwards 
compatible).

-Dave