exception in urllib2
asit
lipun4u at gmail.com
Sun Feb 1 05:34:39 EST 2009
I hv been developing a link scanner. Here the objective is to
recursively scan a particular web site.
During this, my script met http://images.google.co.in/imghp?hl=en&tab=wi
and passed it to the scan function, whose body is like this..
def scan(site):
log=open(logfile,'a')
log.write(site + "\n")
site = "http://" + site.lower()
try:
site_data = urllib.urlopen(site)
parser = MyParser()
parser.parse(site_data.read())
except(IOError),msg:
print "Error in connecting site ", site
print msg
links = parser.get_hyperlinks()
for l in links:
log.write(l + "\n")
But it throws a weird exception like this...
Traceback (most recent call last):
File "I:\Python26\linkscan1.py", line 104, in <module>
main()
File "I:\Python26\linkscan1.py", line 95, in main
scan(lk)
File "I:\Python26\linkscan1.py", line 65, in scan
site_data = urllib.urlopen(site)
File "I:\Python26\lib\urllib.py", line 87, in urlopen
return opener.open(url)
File "I:\Python26\lib\urllib.py", line 203, in open
return getattr(self, name)(url)
File "I:\Python26\lib\urllib.py", line 327, in open_http
h = httplib.HTTP(host)
File "I:\Python26\lib\httplib.py", line 984, in __init__
self._setup(self._connection_class(host, port, strict))
File "I:\Python26\lib\httplib.py", line 656, in __init__
self._set_hostport(host, port)
File "I:\Python26\lib\httplib.py", line 668, in _set_hostport
raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
httplib.InvalidURL: nonnumeric port: ''
How can i handle this ???
More information about the Python-list
mailing list