[Web-SIG] So what's missing?

Sat Oct 25 20:12:09 EDT 2003

On Sat, 25 Oct 2003, Ian Bicking wrote:
[...]
> In general, I just don't feel like there needs to be quite so many
> handlers in urllib2.  One featureful HTTP implementation would be
> easier to work with (and, I think, easier to extend).

Well, that was a large part of the purpose of urllib2 -- to let you choose
what 'clever' stuff it does.  If you don't want something, you just don't
use that handler.  More importantly, if you want to do something slightly
differently, you supply your own handler.

If you shift stuff from an auth handler into the HTTP{S,}Handler, anybody
out there who's written their own auth handler will have their auth code
suddenly stop being invoked by urllib2.  Whatever special authorization
they were doing (maybe just reading from a database, maybe fixing a bug,
real or imagined, in urllib2) will stop happening, and their code will
probably break.

Anyway, it may or may not be the perfect system, but I'm not convinced it
needs changing.  Can you give a specific example of where having lots of
handlers becomes oppressive?

[...about inconvenience of having to provide realm and URI for auth...]
> Yes, a wildcard could definitely be good.  This is particularly
> important with scripts, i.e., one-off programs where you just want to
> grab something from a URL.

OK.  Do we have a document where we're recording these proposals?  Is
there a wiki?

[...]
> > OK.  Does this URL class proposal fit with that path module PEP, do you
> > think?  Somebody mentioned that PEP (it was a PEP, wasn't it...?)
> > before,
> > but I've forgotten everything about it :-)
>
> No, there's no PEP, for this or for a filesystem path object.  These
> were the links from the other email:
>
> http://www.jorendorff.com/articles/python/path/
>
> http://groups.google.com/groups?dq=&hl=en&lr=&ie=UTF-
> 8&threadm=mailman.1057651032.22842.python-list%40python.org

Thanks.  Again, is there somewhere to record this URL class idea and the
fact that this path module is related?

[...]
> > That doesn't answer my question.  To repeat: What requirements does
> > caching impose that *urllib2* doesn't meet?  And why do we need a new
> > UserAgent class when we already have urllib2 and its handlers?
>
> All the normal HTTP caching, like If-Modified-Since and E-Tags.  If you
> handle this, you have to store the retrieved results, handle the
> metadata for those results, and provide control (where to put the
> cache, when and how to expire it, what items are in the cache, flush
> the cache, maybe a memory cache, etc).  That could be done in a
> handler, but it feels like a separate object to me (an object which
> might still go in urllib2).

So, merely because you think "it feels like a new object", you're
proposing to create a whole new layer of complexity for users to learn?
Why should people have to learn a new API just to get caching?  If
somebody had implemented HTTP caching and found the handler mechanism
lacking, or had a specific argument that showed it to be so, a new layer
*might* be justified.  Otherwise, I think it's a bad idea.

> But looking back on what Bill was asking for, I think he was thinking
> more along the lines of connection caching, like CacheFTPHandler, and
> that would probably go in a handler.

Yep.

John