exception in urllib2

asit lipun4u at gmail.com
Sun Feb 1 05:34:39 EST 2009


I hv been developing a link scanner. Here the objective is to
recursively scan a particular web site.

During this, my script met http://images.google.co.in/imghp?hl=en&tab=wi
and passed it to the scan function, whose body is like this..

def scan(site):
    log=open(logfile,'a')
    log.write(site + "\n")
    site = "http://" + site.lower()
    try:
        site_data = urllib.urlopen(site)
        parser = MyParser()
        parser.parse(site_data.read())
    except(IOError),msg:
        print "Error in connecting site ", site
        print msg
    links = parser.get_hyperlinks()
    for l in links:
        log.write(l + "\n")

But it throws a weird exception like this...

Traceback (most recent call last):
  File "I:\Python26\linkscan1.py", line 104, in <module>
    main()
  File "I:\Python26\linkscan1.py", line 95, in main
    scan(lk)
  File "I:\Python26\linkscan1.py", line 65, in scan
    site_data = urllib.urlopen(site)
  File "I:\Python26\lib\urllib.py", line 87, in urlopen
    return opener.open(url)
  File "I:\Python26\lib\urllib.py", line 203, in open
    return getattr(self, name)(url)
  File "I:\Python26\lib\urllib.py", line 327, in open_http
    h = httplib.HTTP(host)
  File "I:\Python26\lib\httplib.py", line 984, in __init__
    self._setup(self._connection_class(host, port, strict))
  File "I:\Python26\lib\httplib.py", line 656, in __init__
    self._set_hostport(host, port)
  File "I:\Python26\lib\httplib.py", line 668, in _set_hostport
    raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
httplib.InvalidURL: nonnumeric port: ''

How can i handle this ???



More information about the Python-list mailing list