A problem while using urllib

Tue Oct 11 23:51:18 EDT 2005

Alex Martelli wrote:
> Johnny Lee <johnnyandfiona at hotmail.com> wrote:
>    ...
> >    try:
> >       webPage = urllib2.urlopen(url)
> >    except urllib2.URLError:
>    ...
> >    webPage.close()
> >    return True
> > ----------------------------------------------------
> >
> >    But every time when I ran to the 70 to 75 urls (that means 70-75
> > urls have been tested via this way), the program will crash and all the
> > urls left will raise urllib2.URLError until the program exits. I tried
> > many ways to work it out, using urllib, set a sleep(1) in the filter (I
> > thought it was the massive urls crashed the program). But none works.
> > BTW, if I set the url from which the program crashed to base url, the
> > program will still crashed at the 70-75 url. How can I solve this
> > problem? thanks for your help
>
> Sure looks like a resource leak somewhere (probably leaving a file open
> until your program hits some wall of maximum simultaneously open files),
> but I can't reproduce it here (MacOSX, tried both Python 2.3.5 and
> 2.4.1).  What version of Python are you using, and on what platform?
> Maybe a simple Python upgrade might fix your problem...
>
>
> Alex

Thanks for the info you provided. I'm using 2.4.1 on cygwin of WinXP.
If you want to reproduce the problem, I can send the source to you.

This morning I found that this is caused by urllib2. When I use urllib
instead of urllib2, it won't crash any more. But the matters is that I
want to catch the HTTP 404 Error which is handled by FancyURLopener in
urllib.open(). So I can't catch it.

Regards,
Johnny