[Tutor] Visiting a URL
Danny Yoo
dyoo@hkn.eecs.berkeley.edu
Sun Nov 3 20:30:02 2002
On Sat, 2 Nov 2002, John Abbe wrote:
> At 12:29 AM -0800 on 2002-11-01, Danny Yoo typed:
> >On Thu, 31 Oct 2002, John Abbe wrote:
> > > I've got a newbie question -- i'm looking at altering PikiePikie to
> >> notify weblogs.com when my weblog updates. I could get all involved in
> >> XML-RPC, but it's doable through a plain URL. How do i visit a URL in
> >> Python?
> >
> >Hi John,
> >
> >Do you mean retrieving the contents of the URL resource? If so, there's a
> >nice module called 'urllib' that allows us to open URLs as if they were
> >files.
> >
> > http://python.org/doc/lib/module-urllib.html
>
> Very cool. Thanks! Even that may be a little overkill. All i need to do
> is visit the URL; i don't need the response.
By "response", I'll assume that you mean that you don't want to look at
the body of the web request; all we may want to look at are the headers of
the response.
For that, we can use the 'httplib' http-client module:
http://www.python.org/doc/lib/module-httplib.html
For example:
###
>>> import httplib
>>> connection = httplib.HTTPConnection('python.org')
>>> connection.request('GET', 'index.html')
>>> response = connection.getresponse()
>>> response.status
400
>>> response.reason
'Bad Request'
>>>
>>> ## Oops, forgot to put a '/' in front!
>>> ## (actually, most web browsers will do this
>>> ## correction for us!)
>>>
>>> connection.request('GET', '/index.html')
>>> response = connection.getresponse()
>>> response.status
200
>>> response.reason
'OK'
>>> response.getheader('last-modified')
'Tue, 29 Oct 2002 22:51:06 GMT'
###
So using httplib would be one way of visiting that url without actually
downloading the whole page.
We can wrap this all up into a tidy function:
###
>>> def getLastModified(url):
... scheme, location, path, params, query, fragment = \
... urlparse.urlparse(url)
... connection = httplib.HTTPConnection(location)
... connection.request('GET', path)
... return connection.getresponse().getheader('last-modified')
...
>>> import urlparse
>>> getLastModified('http://python.org/')
'Tue, 29 Oct 2002 22:51:06 GMT'
###
The function above is a bit sloppy: it only handles standard http
connections, and it isn't doing much error checking at that. But it's
something we can build on.
Hope this helps!