urlopen() error

John Machin sjmachin at lexicon.net
Fri Sep 15 10:42:53 EDT 2006


Paul McNett wrote:
> Tempo wrote:
> > Hello. I am getting an error and it has gotten me stuck. I think the
> > best thing I can do is post my code and the error message and thank
> > everybody in advanced for any help that you give this issue. Thank you.
> >
> > #############
> > Here's the code:
> > #############
> >
> > import urllib2
> > import re
> > import xlrd
> > from BeautifulSoup import BeautifulSoup
> >
> > book = xlrd.open_workbook("ige_virtualMoney.xls")
> > sh = book.sheet_by_index(0)
> > rx = 1
> > for rx in range(sh.nrows):

The above 2 lines should probably be:
   for rx.range(1, sh.nrows):
otherwise the likelihood is that a column heading will be treated as
data.
Now read on ;-)

> >     u = sh.cell_value(rx, 0)
> >     page = urllib2.urlopen(u)
> >     soup = BeautifulSoup(page)
> >     p = soup.findAll('span', "sale")
> >     p = str(p)
> >     p2 = re.findall('\$\d+\.\d\d', p)
> >     for price in p2:
> > 		print price
> >
> > ######################
> > Here are the error messages:
> > ######################
> >
> > Traceback (most recent call last):
> >   File "E:\Python24\scraper.py", line 16, in -toplevel-
> >     page = urllib2.urlopen(u)
> >   File "E:\Python24\lib\urllib2.py", line 130, in urlopen
> >     return _opener.open(url, data)
> >   File "E:\Python24\lib\urllib2.py", line 350, in open
> >     protocol = req.get_type()
> >   File "E:\Python24\lib\urllib2.py", line 233, in get_type
> >     raise ValueError, "unknown url type: %s" % self.__original
> > ValueError: unknown url type: List
>
> You were expecting u to be a url string like "http://google.com", but it
> looks like it is actually a list. I'm not familiar with package xlrd but
> cell_value() must be returning a list and not a cell value. Presumably,
> the list contains the cell value probably in element 0. Put in a print
> statement before your call to urlopen() like:
>
> print u

Sage advice. print repr(u) is in general even better advice.

>
> You'll likely discover your error.
>

Just for the record:

1. The xlrd package's Book.Sheet.cell_value() does *not* return lists.
As its docs say, it returns scalars, of the following types: unicode,
int, float, strg

2. The error is nothing to do with Python lists, it's all about
malformed URLs. "unknown url type" means it's not one of http, ftp,
file, data, gopher, ...

|>>> x = urllib2.urlopen('List')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "C:\Python24\lib\urllib2.py", line 130, in urlopen
    return _opener.open(url, data)
  File "C:\Python24\lib\urllib2.py", line 350, in open
    protocol = req.get_type()
  File "C:\Python24\lib\urllib2.py", line 233, in get_type
    raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type: List

|>>> x = urllib2.urlopen('GOTCHA')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "C:\Python24\lib\urllib2.py", line 130, in urlopen
    return _opener.open(url, data)
  File "C:\Python24\lib\urllib2.py", line 350, in open
    protocol = req.get_type()
  File "C:\Python24\lib\urllib2.py", line 233, in get_type
    raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type: GOTCHA
|>>>

HTH,
John




More information about the Python-list mailing list