blocking forever with urllib
Alex Martelli
aleax at aleax.it
Fri Aug 31 05:50:50 EDT 2001
"Oleg Broytmann" <phd at phd.pp.ru> wrote in message
news:mailman.999244847.23762.python-list at python.org...
> On Thu, 30 Aug 2001, Michael P. Soulier wrote:
> > I'm writing a web crawler as an exercise, using urllib and htmllib
to
> > recursively crawl through the pages. Whenever urllib.urlopen() throws an
> > IOError exception the url gets flagged as a broken link.
> >
> > Unfortunately, urllib.urlopen() is blocking for some time on one
URL. When
> > I do an nslookup on it, it times out within a few seconds, since it's a
URL
> > from our internal intranet at work and is not accessible from the
internet.
> > However, urllib.urlopen() takes forever to return.
> >
> > Is there a way to specify a timeout for this library? I can't find a
way
> > in the documentation.
>
> There is no. This is actually TCP/IP problem. There are many solutions:
> signals (unfortunately, signals are sinchronous in Python),
multiprocessing
> (forking or threading), non-blocking sockets (select/poll, asyncore).
Probably the best solution in Python is using
http://www.timo-tasi.org/python/timeoutsocket.py
Alex
More information about the Python-list
mailing list