[Tutor] reading web page with BeautifulSoup

Dave Angel d at davea.name
Thu Dec 13 03:03:53 CET 2012


On 12/12/2012 08:47 PM, Ed Owens wrote:
> >>> from urllib2 import urlopen
> >>> page = urlopen('w1.weather.gov/obhistory/KDCA.html')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File
> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
> line 126, in urlopen
>     return _opener.open(url, data, timeout)
>   File
> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
> line 386, in open
>     protocol = req.get_type()
>   File
> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
> line 248, in get_type
>     raise ValueError, "unknown url type: %s" % self.__original
> ValueError: unknown url type: w1.weather.gov/obhistory/KDCA.html
> >>>
>
> Can anyone see what I'm doing wrong here?  I have bs4 and urllib2
> imported, and get the above error when trying to read that page.  I
> can copy the url from the error message into my browser and get the page.

Like the error says, unknown type.  Prepend the type of the url, and it
should work fine:

page = urlopen('http://w1.weather.gov/obhistory/KDCA.html')



-- 

DaveA



More information about the Tutor mailing list