[ python-Bugs-1612729 ] webchecker/urllib chokes on 404 pages

SourceForge.net noreply at sourceforge.net
Sun Dec 10 20:35:39 CET 2006


Bugs item #1612729, was opened at 2006-12-10 20:35
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1612729&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Demos and Tools
Group: Python 2.5
Status: Open
Resolution: None
Priority: 7
Private: No
Submitted By: Fredrik Lundh (effbot)
Assigned to: Nobody/Anonymous (nobody)
Summary: webchecker/urllib chokes on 404 pages

Initial Comment:
platform: standard Python 2.5 on Windows XP.

webchecker chokes on reponse code 404, which is a bit unfortunate...

the error occurs deep down in urllib, but a plain urllib request to the same page don't result in the same errors, so it's probably related to how webchecker is using the library.

here's an example:

C:\Python25\Tools\webchecker> python webchecker.py http://www.python.org/foo

webchecker version 50851

Round 1 (1 total, 1 to do, 0 done, 0 bad)

No need to save checkpoint
Traceback (most recent call last):
  File "webchecker.py", line 892, in <module>
    main()
  File "webchecker.py", line 222, in main
    c.run()
  File "webchecker.py", line 349, in run
    self.dopage(url)
  File "webchecker.py", line 404, in dopage
    page = self.getpage(url_pair)
  File "webchecker.py", line 509, in getpage
    text, nurl = self.readhtml(url_pair)
  File "webchecker.py", line 523, in readhtml
    f, url = self.openhtml(url_pair)
  File "webchecker.py", line 531, in openhtml
    f = self.openpage(url_pair)
  File "webchecker.py", line 543, in openpage
    return self.urlopener.open(url)
  File "c:\python25\lib\urllib.py", line 190, in open
    return getattr(self, name)(url)
  File "c:\python25\lib\urllib.py", line 334, in open_http
    return self.http_error(url, fp, errcode, errmsg, headers)
  File "c:\python25\lib\urllib.py", line 351, in http_error
    return self.http_error_default(url, fp, errcode, errmsg, headers)
  File "c:\python25\lib\urllib.py", line 357, in http_error_default
    raise IOError, ('http error', errcode, errmsg, headers)
TypeError: EnvironmentError expected at most 3 arguments, got 4

running the same test under Python 2.4 works fine:

C:\python24\Tools\webchecker>python webchecker.py http://www.python.org/foo
webchecker version 36560

Round 1 (1 total, 1 to do, 0 done, 0 bad)

Error ('http error', 404, 'Not Found')
 HREF  http://www.python.org/foo
  from <root>

Final Report (1 total, 0 to do, 1 done, 1 bad)

Error Report:

Error in <root>
  HREF http://www.python.org/foo
    msg ('http error', 404, 'Not Found')

Saving checkpoint to @webchecker.pickle ...
Done.
Use ``webchecker.py -R'' to restart.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1612729&group_id=5470


More information about the Python-bugs-list mailing list