[Tutor] Is a link broken?

Steven D'Aprano steve at pearwood.info
Sat Jan 12 13:16:49 CET 2013


On 12/01/13 13:47, Ed Owens wrote:

> I've written an example that I hope illustrates my problem:
>
> #!/usr/bin/env python
>
> import urllib2
>
> sites = ('http://www.catb.org', 'http://ons-sa.org', 'www.notasite.org')
> for site in sites:
>     try:
>         page = urllib2.urlopen(site)
>         print page.geturl(), "didn't return error on open"
>         print 'Reported server is', page.info()['Server']
>     except:
>         print site, 'generated an error on open'

Incorrect. Your "except" clause is too general, and so the error message
is misleading. The correct error message should be:

     print site, """something went wrong in either opening the url,
         getting the url, fetching info about the page, printing the
         results, or something completely unrelated to any of those
         things, or the user typed Ctrl-C to interrupt processing,
         or something that I haven't thought of...
         """


Which of course is so general that it is useless.

Lesson 1: never, ever use a bare "except" clause. (For experts only:
almost never use a bare "except" clause.) Always specify what sort
of exceptions you wish to catch.

Lesson 2: always keep the amount of code inside a "try" clause to the
minimum needed.

Lesson 3: whenever possible, *don't* catch exceptions at all. It is
infinitely better to get a full exception, with lots of useful
debugging information, that to catch the exception, throw away that
useful debugging information, and replace it with a lame and useless
error message like "generated an error on open".

What sort of error?

What generated that error?

What error code was it?

Is it a permanent error (e.g. like error 404, page not found) or a
temporary error?

Is the error at the HTTP level, or the lower networking level, or
a bug in your Python code?


These are all vital questions that can be answered by inspecting
the exception and stack trace that Python gives you. The error
message "generated an error on open" is not only *useless*, but it
is also *wrong*.


I recommend that you start by not catching any exception at all. Just
let the exception (if any) print as normal, and see what you can learn
from that.



> Site 1 is alive, the other two dead.

Incorrect.

Both site 1 and site 2 work in my browser. Try it and see for yourself.


> Yet this code only returns an error on site three. Notice that I
>checked for a redirection (I think) of the site if it opened, and that
>didn't help with site two.

There is no redirection with site 2.


> Is there an unambiguous way to determine if a link has died -- knowing
>nothing about the link in advance?

No.

Define "line has died". That could mean:

- the individual page is gone;

- the entire web server is not responding;

- the web server is responding, but slowly, and requests time-out;

- the web server does respond, but only to say "sorry, too busy to
   talk now, try again later";

- the web server refuses to respond because you haven't logged in;

- you have a low-level network error, but the site is still okay;

- your network connection is down;

etc. There are literally dozens upon dozens of different types of
errors here.


-- 
Steven


More information about the Tutor mailing list