BeautifulSoup error

Serge Orlov Serge.Orlov at gmail.com
Fri Jun 16 01:39:21 EDT 2006


William Xu wrote:
> Hi, all,
>
> This piece of code used to work well. i guess the error occurs after
> some upgrade.
>
> >>> import urllib
> >>> from BeautifulSoup import BeautifulSoup
> >>> url = 'http://www.google.com'
> >>> port = urllib.urlopen(url).read()
> >>> soup = BeautifulSoup()
> >>> soup.feed(port)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
>   File "/usr/lib/python2.3/sgmllib.py", line 94, in feed
>     self.rawdata = self.rawdata + data
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xb8 in position 565: ordinal not in range(128)
> >>>
>
> Any ideas to solve this?

According to the documentation
<http://www.crummy.com/software/BeautifulSoup/documentation.html>
chapter "Beautiful Soup Gives You Unicode, Dammit" Beautiful Soup fully
supports unicode so it's probably a bug.

> version info:
>
> Python 2.3.5 (#2, Mar  7 2006, 12:43:17)
> [GCC 4.0.3 20060212 (prerelease) (Debian 4.0.2-9)] on linux2
>
> python-beautifulsoup: 3.0.1-1

Upgrading python-beautifulsoup is a good idea, since there were two bug
fix releases after 3.0.1




More information about the Python-list mailing list