blocking forever with urllib

Alex Martelli aleax at aleax.it
Fri Aug 31 05:50:50 EDT 2001


"Oleg Broytmann" <phd at phd.pp.ru> wrote in message
news:mailman.999244847.23762.python-list at python.org...
> On Thu, 30 Aug 2001, Michael P. Soulier wrote:
> >     I'm writing a web crawler as an exercise, using urllib and htmllib
to
> > recursively crawl through the pages. Whenever urllib.urlopen() throws an
> > IOError exception the url gets flagged as a broken link.
> >
> >     Unfortunately, urllib.urlopen() is blocking for some time on one
URL. When
> > I do an nslookup on it, it times out within a few seconds, since it's a
URL
> > from our internal intranet at work and is not accessible from the
internet.
> > However, urllib.urlopen() takes forever to return.
> >
> >     Is there a way to specify a timeout for this library? I can't find a
way
> > in the documentation.
>
>    There is no. This is actually TCP/IP problem. There are many solutions:
> signals (unfortunately, signals are sinchronous in Python),
multiprocessing
> (forking or threading), non-blocking sockets (select/poll, asyncore).

Probably the best solution in Python is using
http://www.timo-tasi.org/python/timeoutsocket.py


Alex






More information about the Python-list mailing list