Fundamental problem with urllib...

Steve Holden sholden at holdenweb.com
Tue Apr 23 18:33:11 EDT 2002


"Jeremy Hylton" <jeremy at alum.mit.edu> wrote ...
> "A.M. Kuchling" <akuchlin at ute.mems-exchange.org> wrote ...
> > In article <yNUw8.74422$T%5.18813 at atlpnn01.usenetserver.com>,
> > Steve Holden wrote:
> > > Since urllib knows nothing of cookies, you will need to integrate some
sort
> > > of a cookie jar into the library, with a new API  for the clients to
> > > retrieve and store the cookies.
> >
> > This is worthwhile, but I don't think it belongs in urllib.  It
> > belongs in a module or package of its own that provides general
> > Web-browser features such as cookies, remembering authentication
> > usernames and passwords, and a cache.  This package could then be used
> > for implementing HTML-scraping scripts, spiders, or a Web browser.
>
> urllib2 provides a more flexible framework for implementing
> URL-loading programs, like a spider.  I think it would be helpful to
> have the features you mention integrated into urllib2.
>
Given that urllib2 is much newer than urllib it does seem to make sense to
augment the more recently developed code.

> I'm not sure what the difference between an HTTP client, like urllib
> or urllib2, and a Web-browser is.  Other than urllib's monolithic
> design, why wouldn't you want these sorts of features in the module?
>
I imagine Andrew thought code breakage would be a problem, but I wouldn't
presume to channel him ;-).

Personally I imagined passing a dictionary as an optional cookie jar
argument, keyed by (domain, path) tuples. The library code would update this
as dictated by its interactions with web sources.

Of course there's a problem if multiple connections need to update the
cookie jar concurrently, so this might well turn out to be too simplistic
anyway.

regards
 Steve







More information about the Python-list mailing list