WWW/urllib.urlretrieve problems

Oleg Broytmann phd at emerald.netskate.ru
Wed Jul 14 10:03:28 EDT 1999


Hello!

   I ran an URL checker, based on urllib, and it reported some errors. I
investigated what was going on, and I need some advices.


   First problem URL is http://www.expert.ru/ - it just timeouts.
When I pointed Netscape to the address - the page appeared well. Lynx and
telnet timed out too!
   Don't understand it. What in Netscape Communicator is so good?


   Second problem is with http://w3.one.net/~alward/. Netscape and lynx
showed the page, urllib.urlretrieve() and telnet returned error403 -
forbidden:

---------- Session ----------
phd at emerald 204 >>> t w3.one.net 80
Trying 206.112.192.125...
Connected to w3.one.net.
Escape character is '^]'.
GEt /~alward/ HTTP/1.0
Host: w3.one.net

HTTP/1.1 403 Forbidden
Date: Wed, 14 Jul 1999 13:48:03 GMT
Server: Apache/OneNet-W3 (LoadBal/2.1-s1) mod_perl/1.18 PHP/3.0.6
Connection: close
Content-Type: text/html

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>403 Forbidden</TITLE>
</HEAD><BODY>
<H1>Forbidden</H1>
You don't have permission to access /~alward/
on this server.<P>
<P>Additionally, a 403 Forbidden
error was encountered while trying to use an ErrorDocument to handle the
request.
</BODY></HTML>
Connection closed by foreign host.
---------- /Session ----------

   It seems telnet and urllib need some additional HTTP/1.1 headers, but what
headers?


   Third problem is on http://www.tucows.com/. Netscape and lynx showed the
page, urllib returned exception:

---------- Session ----------
('http error', -1, '<html>\012', None)
Traceback (innermost last):
  File "./test.py", line 12, in ?
    fname, headers = urllib.urlretrieve(url)
  File "/usr/local/lib/python1.5/urllib.py", line 66, in urlretrieve
    return _urlopener.retrieve(url, filename, reporthook)
  File "/usr/local/lib/python1.5/urllib.py", line 184, in retrieve
    fp = self.open(url)
  File "/usr/local/lib/python1.5/urllib.py", line 157, in open
    return getattr(self, name)(url)
  File "/usr/local/lib/python1.5/urllib.py", line 272, in open_http
    return self.http_error(url, fp, errcode, errmsg, headers)
  File "/usr/local/lib/python1.5/urllib.py", line 289, in http_error
    return self.http_error_default(url, fp, errcode, errmsg, headers)
  File "/usr/local/lib/python1.5/urllib.py", line 295, in
http_error_default
    raise IOError, ('http error', errcode, errmsg, headers)
IOError: ('http error', -1, '<html>\012', None)
---------- /Session ----------

   Telnet session showed the server didn't return any HTTP header - just
sent HTML. Should urllib test for this and how it should behave? I am not
sure...

   Thanks in advance to anyone who is willing to discuss.

Oleg.
---- 
    Oleg Broytmann        Netskate/Inter.Net.Ru        phd at emerald.netskate.ru
           Programmers don't die, they just GOSUB without RETURN.





More information about the Python-list mailing list