[Tutor] Problems with encoding in BeautifulSoup

Eduardo Vieira eduardo.susan at gmail.com
Tue Aug 18 01:00:04 CEST 2009


Hello, I have this sample script from beautiful soup, but I keep
getting an error because of encoding. I have google for solutions but
I don't seem to understand. Even this is dealt in Beautiful Soup's doc
but I am not able to understant/apply the solution successfully.

from BeautifulSoup import BeautifulSoup
import urllib2
page = urllib2.urlopen('http://www.yellowpages.ca/search/si/1/Signs/QC')

# if I change the url to
http://www.yellowpages.ca/search/si/3/Signs/ON, it works because
there's no french words...

soup = BeautifulSoup(page)

companies = soup('h2')

print soup.originalEncoding

print companies[:4]

However, if I do this, I don't get errors even when there are accents:
for title in companies:
    print title

Here is the Error output:
utf-8
Traceback (most recent call last):
  File "C:\myscripts\encondingproblem.py", line 13, in <module>
    print companies[:4]
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
position 373: ordinal not in range(128)

===
Thanks in advance.

Eduardo
www.expresssignproducts.com


More information about the Tutor mailing list