urllib.urlretrieve never returns???

John Nagle nagle at animats.com
Tue Mar 20 16:18:24 EDT 2012


On 3/17/2012 9:34 AM, Chris Angelico wrote:
> 2012/3/18 Laszlo Nagy<gandalf at shopzeus.com>:
>> In the later case, "log.txt" only contains "#1" and nothing else. If I look
>> at pythonw.exe from task manager, then its shows +1 thread every time I
>> click the button, and "#1" is appended to the file.

    Does it fail to retrieve on all URLs, or only on some of them?

    Running a web crawler, I've seen some pathological cases.
There are a very few sites that emit data very, very slowly,
but don't time out because they are making progress.  There are
also some sites where attempting to negotiate a SSL connection
results in the SSL protocol reaching a point where the host end
is supposed to finish the handshake, but it doesn't.

    The odds are against this being the problem. I see problems
like that in maybe 1 in 100,000 URLs.

				John Nagle



More information about the Python-list mailing list