How to use urllib2.BaseHandler class

Eric @ Zomething eric at zomething.com
Mon Jul 12 00:06:58 EDT 2004


John J. Lee advised on urllib2:

> > Hi all,
> > 
> > I'm trying to build a web page crawler to help us build our websites,
> > which are driven by static pages after they are called the first time.
> > Anyway, I can use urllib2.urlopen() no problem, but I'd like to have
> > more control over the process. In particular I'd like to get back the
> > HTTP status code from the request, even if it's a 200. It looks like I
> > can do that by deriving my own class from HTTPHandler, but I'm not
> > sure how to go about it. Can anyone direct me to some useful example
> > code for this kind of thing?
> 
> In 2.3, urllib2 only ever *returns* a response if the code is 200.  In
> other cases, HTTPError exceptions are *raised*.  HTTPError instances
> satisfy the normal response interface, so you can catch them and use
> them just as you would the return value of urlopen().  As you've
> noticed, they also have .code and .msg attributes (unlike normal
> response objects, in 2.3 -- since it's always 200, they weren't really
> necessary!).


Novice question:

I am way behind the times and have code using urllib (not urllib2).  Recently I've been getting null objects back from urllib.urlopen(); it seems I need to work more tightly with the HTTP to understand and correctly process these events.  

Any advice? Is this doable with urllib, or is urllib2 superior for this; or do I need to work with httplib?

Thx



Eric Pederson
http://www.songzilla.blogspot.com
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
e-mail me at:
do at something.com
except, increment the "d" and "o" by one letter
and spell something with a "z"
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::



More information about the Python-list mailing list