A problem while using urllib
Steve Holden
steve at holdenweb.com
Wed Oct 12 04:25:05 EDT 2005
Johnny Lee wrote:
> Steve Holden wrote:
>
>>Johnny Lee wrote:
>>
>>>Alex Martelli wrote:
>>>
>>>
>>>>Johnny Lee <johnnyandfiona at hotmail.com> wrote:
>>>> ...
>>>>
>>>>
>>>>> try:
>>>>> webPage = urllib2.urlopen(url)
>>>>> except urllib2.URLError:
>>>>
>>>> ...
>>>>
>>>>
>>>>> webPage.close()
>>>>> return True
>>>>>----------------------------------------------------
>>>>>
>>>>> But every time when I ran to the 70 to 75 urls (that means 70-75
>>>>>urls have been tested via this way), the program will crash and all the
>>>>>urls left will raise urllib2.URLError until the program exits. I tried
>>>>>many ways to work it out, using urllib, set a sleep(1) in the filter (I
>>>>>thought it was the massive urls crashed the program). But none works.
>>>>>BTW, if I set the url from which the program crashed to base url, the
>>>>>program will still crashed at the 70-75 url. How can I solve this
>>>>>problem? thanks for your help
>>>>
>>>>Sure looks like a resource leak somewhere (probably leaving a file open
>>>>until your program hits some wall of maximum simultaneously open files),
>>>>but I can't reproduce it here (MacOSX, tried both Python 2.3.5 and
>>>>2.4.1). What version of Python are you using, and on what platform?
>>>>Maybe a simple Python upgrade might fix your problem...
>>>>
>>>>
>>>>Alex
>>>
>>>
>>>Thanks for the info you provided. I'm using 2.4.1 on cygwin of WinXP.
>>>If you want to reproduce the problem, I can send the source to you.
>>>
>>>This morning I found that this is caused by urllib2. When I use urllib
>>>instead of urllib2, it won't crash any more. But the matters is that I
>>>want to catch the HTTP 404 Error which is handled by FancyURLopener in
>>>urllib.open(). So I can't catch it.
>>>
>>
>>I'm using exactly that configuration, so if you let me have that source
>>I could take a look at it for you.
>>
[...]
>
> I've sent the source, thanks for your help.
>
[...]
Preliminary result, in case this rings bells with people who use urllib2
quite a lot. I modified the error case to report the actual message
returned with the exception and I'm seeing things like:
http://www.holdenweb.com/./Python/webframeworks.html
Message: <urlopen error (120, 'Operation already in progress')>
Start process
http://www.amazon.com/exec/obidos/ASIN/0596001886/steveholden-20
Error: IOError while parsing
http://www.amazon.com/exec/obidos/ASIN/0596001886/steveholden-20
Message: <urlopen error (120, 'Operation already in progress')>
.
.
.
So at least we know now what the error is, and it looks like some sort
of resource limit (though why only on Cygwin betas me) ... anyone,
before I start some serious debugging?
regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/
More information about the Python-list
mailing list