[Web-SIG] So what's missing?

Sat Oct 25 21:00:39 EDT 2003

On Saturday, October 25, 2003, at 07:12 PM, John J Lee wrote:
> On Sat, 25 Oct 2003, Ian Bicking wrote:
> [...]
>> In general, I just don't feel like there needs to be quite so many
>> handlers in urllib2.  One featureful HTTP implementation would be
>> easier to work with (and, I think, easier to extend).
>
> Well, that was a large part of the purpose of urllib2 -- to let you 
> choose
> what 'clever' stuff it does.  If you don't want something, you just 
> don't
> use that handler.  More importantly, if you want to do something 
> slightly
> differently, you supply your own handler.
>
> If you shift stuff from an auth handler into the HTTP{S,}Handler, 
> anybody
> out there who's written their own auth handler will have their auth 
> code
> suddenly stop being invoked by urllib2.  Whatever special authorization
> they were doing (maybe just reading from a database, maybe fixing a 
> bug,
> real or imagined, in urllib2) will stop happening, and their code will
> probably break.

a) There's not a lot of different ways to deal with a 401 response.  Is 
there something that's not covered by basic and digest authentication?
b) Accessing a database should happen in the password manager, not the 
handler.  The handler handles the protocol, the database is not tied to 
the protocol.  I'm not proposing that the password manager go away 
(though it would be nice if it was hidden for simple usage)
c) This doesn't have to effect backward compatibility anyway.  We can 
leave HTTPBasicAuthHandler in there (deprecated), but also fold it's 
functionality into HTTPHandler.  HTTPBasicAuthHandler doesn't require 
that HTTPHandler *not* handle authentication.

> Anyway, it may or may not be the perfect system, but I'm not convinced 
> it
> needs changing.  Can you give a specific example of where having lots 
> of
> handlers becomes oppressive?

The documentation is certainly a problem (e.g., the 
HTTPBasicAuthHandler page), though it could be organized differently 
without changing the code.  It's definitely ravioli code 
(http://c2.com/cgi/wiki?RavioliCode), with all that entails -- IMHO 
it's hard to document ravioli code well.  (It's not so important how 
things are structured internally, but currently urllib2 also exposes 
that complex class structure)

Also urlopen is not really extensible.  You can't tell urlopen to use 
authentication information (and it doesn't obey browser URL 
conventions, like http://user:password@domain/).  And we want to add 
structured POST data to that method (but also allow non-structured 
data), and cookies, and it might be nice to set the user-agent, and 
maybe other things that I haven't thought of.  If urlopen doesn't 
support these extra features then programmers have to learn a new API 
as their program becomes more complex.  Yet none of these features 
would be all that difficult to add via urlopen or perhaps other simple 
functions, (instead of via classes).  I don't think there's any need 
for classes in the external API -- fetching URLs is about doing things, 
not representing things, and functions are easier to understand for 
doing.

> [...about inconvenience of having to provide realm and URI for auth...]
>> Yes, a wildcard could definitely be good.  This is particularly
>> important with scripts, i.e., one-off programs where you just want to
>> grab something from a URL.
>
> OK.  Do we have a document where we're recording these proposals?  Is
> there a wiki?

No, we don't have anything.  Should we use the main Python Wiki?  
Something else?  Opinions?

[...]
>>> That doesn't answer my question.  To repeat: What requirements does
>>> caching impose that *urllib2* doesn't meet?  And why do we need a new
>>> UserAgent class when we already have urllib2 and its handlers?
>>
>> All the normal HTTP caching, like If-Modified-Since and E-Tags.  If 
>> you
>> handle this, you have to store the retrieved results, handle the
>> metadata for those results, and provide control (where to put the
>> cache, when and how to expire it, what items are in the cache, flush
>> the cache, maybe a memory cache, etc).  That could be done in a
>> handler, but it feels like a separate object to me (an object which
>> might still go in urllib2).
>
> So, merely because you think "it feels like a new object", you're
> proposing to create a whole new layer of complexity for users to learn?
> Why should people have to learn a new API just to get caching?  If
> somebody had implemented HTTP caching and found the handler mechanism
> lacking, or had a specific argument that showed it to be so, a new 
> layer
> *might* be justified.  Otherwise, I think it's a bad idea.

I think fetching and caching are two separate things.  The caching 
requires a context.  The fetching doesn't.  I think fetching things 
should be simplified, with an API that's not very object-oriented.  
Since a cache is persistent it has to have a persistent representation, 
so it needs to be some sort of object.

I also don't see how caching would fit very well into the handler 
structure.  Maybe there'd be a HTTPCachingHandler, and you'd 
instantiate it with your caching policy? (where it stores files, how 
many files, etc)  Also a HTTPBasicAuthCachingHandler, 
HTTPDigestAuthCachingHandler, HTTPSCachingHandler, and so on?  This 
caching is orthogonal -- not just to things like authentication, but 
even to HTTP (to some degree).  The handler structure doesn't allow 
orthogonal features.  Except through mixins, but don't get me started 
on mixins...

Using a separate class, not related to Handlers, isn't more complex.  
Either way we have to provide the same features and the same options, 
and document all of those.  No matter which way you cut it, it's new 
stuff, it's another layer.  Implementing it in a new class is just 
calling it what it is.

--
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org