WWW/urllib.urlretrieve problems
Oleg Broytmann
phd at emerald.netskate.ru
Wed Jul 14 10:03:28 EDT 1999
Hello!
I ran an URL checker, based on urllib, and it reported some errors. I
investigated what was going on, and I need some advices.
First problem URL is http://www.expert.ru/ - it just timeouts.
When I pointed Netscape to the address - the page appeared well. Lynx and
telnet timed out too!
Don't understand it. What in Netscape Communicator is so good?
Second problem is with http://w3.one.net/~alward/. Netscape and lynx
showed the page, urllib.urlretrieve() and telnet returned error403 -
forbidden:
---------- Session ----------
phd at emerald 204 >>> t w3.one.net 80
Trying 206.112.192.125...
Connected to w3.one.net.
Escape character is '^]'.
GEt /~alward/ HTTP/1.0
Host: w3.one.net
HTTP/1.1 403 Forbidden
Date: Wed, 14 Jul 1999 13:48:03 GMT
Server: Apache/OneNet-W3 (LoadBal/2.1-s1) mod_perl/1.18 PHP/3.0.6
Connection: close
Content-Type: text/html
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>403 Forbidden</TITLE>
</HEAD><BODY>
<H1>Forbidden</H1>
You don't have permission to access /~alward/
on this server.<P>
<P>Additionally, a 403 Forbidden
error was encountered while trying to use an ErrorDocument to handle the
request.
</BODY></HTML>
Connection closed by foreign host.
---------- /Session ----------
It seems telnet and urllib need some additional HTTP/1.1 headers, but what
headers?
Third problem is on http://www.tucows.com/. Netscape and lynx showed the
page, urllib returned exception:
---------- Session ----------
('http error', -1, '<html>\012', None)
Traceback (innermost last):
File "./test.py", line 12, in ?
fname, headers = urllib.urlretrieve(url)
File "/usr/local/lib/python1.5/urllib.py", line 66, in urlretrieve
return _urlopener.retrieve(url, filename, reporthook)
File "/usr/local/lib/python1.5/urllib.py", line 184, in retrieve
fp = self.open(url)
File "/usr/local/lib/python1.5/urllib.py", line 157, in open
return getattr(self, name)(url)
File "/usr/local/lib/python1.5/urllib.py", line 272, in open_http
return self.http_error(url, fp, errcode, errmsg, headers)
File "/usr/local/lib/python1.5/urllib.py", line 289, in http_error
return self.http_error_default(url, fp, errcode, errmsg, headers)
File "/usr/local/lib/python1.5/urllib.py", line 295, in
http_error_default
raise IOError, ('http error', errcode, errmsg, headers)
IOError: ('http error', -1, '<html>\012', None)
---------- /Session ----------
Telnet session showed the server didn't return any HTTP header - just
sent HTML. Should urllib test for this and how it should behave? I am not
sure...
Thanks in advance to anyone who is willing to discuss.
Oleg.
----
Oleg Broytmann Netskate/Inter.Net.Ru phd at emerald.netskate.ru
Programmers don't die, they just GOSUB without RETURN.
More information about the Python-list
mailing list