urllib, urllib2, httplib -- Begging for consolidation?
brueckd at tbye.com
brueckd at tbye.com
Tue May 7 11:26:54 EDT 2002
On Tue, 7 May 2002, A. Keyton Weissinger wrote:
> Am I the only one that thinks these need to be pulled together some? I saw a
> PEP (268?) where there are some rumblings about adding some things to it as
> well. Maybe a combo project?
Yes, part of the problem is that it's not obvious when you should use
which (e.g. urllib vs. urllib2).
BUT, if there were to occur some sort of consolidation (meaning,
introducing incompatibilities or a whole new module), then we should use
that as an opportunity to restructure/redesign that whole set of modules
because, IMO, they've evolved past their original design. If we can come
up with a good organization, the actual implementation could be handled by
various members of the community.
The original premise of urllib, that it helps your app open any type of
URL in roughly the same way, is pretty neat but now both urllib and
urllib2 have lots of stuff tacked on that is pretty HTTP-specific. Also,
I usually need to support only one protocol and I know in advance which
that is (usually HTTP, sometimes FTP), but the httplib docs imply that
httplib is more of an internal module.
So... if we were to change something, I'd like us to build a rich HTTP
library that supports the super easy use case (gimme the data at this URL,
optionally posting this data right here first) as well as more complicated
cases (add in these request headers before sending the request to the
origin). It would be in this module (or one closely tied to it) that we'd
capture knowledge about the HTTP protocol, such as parsing and building
HTTP 1.0 and 1.1 compliant request and response headers, handling cookies,
basic and digest authentication, '\n' vs. '\r\n' line endings, easy-to-use
HTTPS, etc. Supporting routines (like quote, urlencode, urlparse) can
either be imported and exposed through the HTTP module, or kept in a
module with better definied boundaries.
We could take the same approach with other protocols, and include modules
for FTP, plain files, etc. With all those in place we could still have the
"open any type of URL" routine built on top, but it should work only for
the simplest of use cases; if you need something more complex then you'd
go use the corresponding protocol library yourself.
I'm not suggesting that we scrap the current protocol modules (they've be
very, very useful); it's just that over time they've grown up and are due
for some redesign/refactoring (the kind that will not be backwards
compatible).
-Dave
More information about the Python-list
mailing list