[python] How to detect a remote webpage is accessible? (in HTTP)

John Nagle nagle at animats.com
Fri Jan 18 13:03:23 EST 2008


?? wrote:
> Howdy, all,
>      I want to use python to detect the accessibility of website.
> Currently, I use urllib
> to obtain the remote webpage, and see whether it fails. But the problem is that
> the webpage may be very large; it takes too long time. Certainly, it
> is no need to download
> the entire page. Could you give me a good and fast solution?
>     Thank you.
> --
> ShenLei

    If you can get through "urlopen", you've already received the HTTP headers.
Just open, then use "info()" on the file descriptor to get the header info.
Don't read the content at all.

    Setting the socket timeout will shorten the timeout when the requested
domain won't respond at all.  But if the remote host opens an HTTP connection,
then sends nothing, the socket timeout is ineffective and you wait for a while.
This is rare, but it happens.

					John Nagle



More information about the Python-list mailing list