[Web-SIG] So what's missing?

Sat Oct 25 17:41:36 EDT 2003

On Saturday, October 25, 2003, at 03:54 PM, John J Lee wrote:
> On Sat, 25 Oct 2003, Ian Bicking wrote:
>
>> On Saturday, October 25, 2003, at 07:38 AM, John J Lee wrote:
> [...]
>>> It's a minor issue, but it seems nicer to me to have authentication
>>> separate if it can easily be separate -- that fits in with the  
>>> general
>>> philosophy of urllib2 that you pick 'n mix the features you want.   
>>> What
>>> are the trivial reasons for it breaking on non-HTTP auth?
>>
>> There's a HTTPBasicAuthHandler, but no HTTPSBasicAuthHandler, and
>> though the two concepts are orthogonal they are still tied into each
>> other.  Another option would be to take HTTPS out of the class
>> hierarchy, and make SSL a feature of HTTPHandler (and maybe the other
>
> Well, that would break code.  And adding an HTTPSBasicAuthHandler is  
> only
> five lines or so (even less if you want a class that handles both HTTP  
> and
> HTTPS).

All the handlers start getting in the way.  If we added authentication  
support to HTTPHandler, it the other classes could still be left in  
there.  Authentication is part of HTTP, after all -- and the  
distinction between basic and digest auth doesn't seem necessary  
(implemented differently, but you shouldn't need to know which one  
you're going to need).  It seems like HTTPHandler could do what  
HTTPBasicAuthHandler (and DigestAuthHandler) do if it is given a  
password manager.  And that it could even create a password manager if  
it was given a username and password, or now, but then the password  
manager should accept a username and password in __init__ so that you  
don't have to do multiple sets to set that up.

In general, I just don't feel like there needs to be quite so many  
handlers in urllib2.  One featureful HTTP implementation would be  
easier to work with (and, I think, easier to extend).

> [...]
>> The AuthHandlers are a little annoying too, you can't just give them a
>> username/password.  You have to give them some manager object that can
>> be queried for a password for a username/realm/URL.  This is a nice
>> option to have, but in most cases you don't need that kind of
>> generality, and it makes it a lot harder to understand what you need  
>> to
>> do.  username=x, password=y are very easy to understand.
>
> That's just a documentation issue, I think -- and possibly adding some
> convenience method.  I wrote some docs for this, and I keep asking for
> people who seem to be actually using these features to check this
> documentation bug, but nobody has yet:
>
> http://www.python.org/sf/798244
>
>
> You don't have to provide a password manager object in fact: just let  
> the
> HTTPBasicAuthHandler create one for you, and use the add_password  
> method
> (which admittedly does require realm and uri as well as username /
> password -- perhaps None should act as a wildcard there?).

Yes, a wildcard could definitely be good.  This is particularly  
important with scripts, i.e., one-off programs where you just want to  
grab something from a URL.

>>>> Cookie handling also fits into this, but from the opposite direction
>>>> from a URL object, since we are creating something of a user agent.
>>>> You'd almost want to do:
>>>>
>>>> ua = UserAgent()
>>>> url = web.URL('http://whatever.com')
>>>> content = ua.get(url)
>>>>
>>>> Or something like that.  I think an explicit agent is called for,
>>>> separate from the URLs that it may retrieve.  But only when you  
>>>> start
>>>> considering cookies and caching.
>>> [...]
>>>
>>> Are you suggesting replacing urllib2, building on top of it, or
>>> extending it?  urllib2's handlers already gets a lot of the
>>> 'user-agent' job done.  What requirements does caching impose that
>>> urllib2 doesn't meet?  There's already a CacheFTPHandler.
>>
>> I think a URL class would probably building on top of urllib2, but
>> would also need some more features.  And obviously urllib2 can't go
>> anywhere, so we might as well use it.
>
> OK.  Does this URL class proposal fit with that path module PEP, do you
> think?  Somebody mentioned that PEP (it was a PEP, wasn't it...?)  
> before,
> but I've forgotten everything about it :-)

No, there's no PEP, for this or for a filesystem path object.  These  
were the links from the other email:

http://www.jorendorff.com/articles/python/path/

http://groups.google.com/groups?dq=&hl=en&lr=&ie=UTF- 
8&threadm=mailman.1057651032.22842.python-list%40python.org

>> The caching in CacheFTPHandler is connection caching, not result
>
> OK.
>
>
>> caching.  HTTP has a wide array of ways to indicate caching, check for
>> updates, etc.  Enough that it becomes kind of complicated, which is  
>> why
>> I don't think that fits well into the idea of a URL object (which
>> should be quite simple, at least from the outside).
>
> That doesn't answer my question.  To repeat: What requirements does
> caching impose that *urllib2* doesn't meet?  And why do we need a new
> UserAgent class when we already have urllib2 and its handlers?

All the normal HTTP caching, like If-Modified-Since and E-Tags.  If you  
handle this, you have to store the retrieved results, handle the  
metadata for those results, and provide control (where to put the  
cache, when and how to expire it, what items are in the cache, flush  
the cache, maybe a memory cache, etc).  That could be done in a  
handler, but it feels like a separate object to me (an object which  
might still go in urllib2).

But looking back on what Bill was asking for, I think he was thinking  
more along the lines of connection caching, like CacheFTPHandler, and  
that would probably go in a handler.

--
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org