[Tutor] Problems with encoding in BeautifulSoup
Eduardo Vieira
eduardo.susan at gmail.com
Tue Aug 18 01:00:04 CEST 2009
Hello, I have this sample script from beautiful soup, but I keep
getting an error because of encoding. I have google for solutions but
I don't seem to understand. Even this is dealt in Beautiful Soup's doc
but I am not able to understant/apply the solution successfully.
from BeautifulSoup import BeautifulSoup
import urllib2
page = urllib2.urlopen('http://www.yellowpages.ca/search/si/1/Signs/QC')
# if I change the url to
http://www.yellowpages.ca/search/si/3/Signs/ON, it works because
there's no french words...
soup = BeautifulSoup(page)
companies = soup('h2')
print soup.originalEncoding
print companies[:4]
However, if I do this, I don't get errors even when there are accents:
for title in companies:
print title
Here is the Error output:
utf-8
Traceback (most recent call last):
File "C:\myscripts\encondingproblem.py", line 13, in <module>
print companies[:4]
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
position 373: ordinal not in range(128)
===
Thanks in advance.
Eduardo
www.expresssignproducts.com
More information about the Tutor
mailing list