Trouble Encoding

John Roth newsgroups at jhrothjr.com
Tue Jun 7 17:32:58 EDT 2005


<fingermark at gmail.com> wrote in message 
news:1118135690.961381.207490 at o13g2000cwo.googlegroups.com...
> I'm using feedparser to parse the following:
>
> <div class="indent text">Adv: Termite Inspections! Jenny Moyer welcomes
> you to her HomeFinderResource.com TM A "MUST See &hellip;</div>
>
> I'm receiveing the following error when i try to print the feedparser
> parsing of the above text:
>
> UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in
> position 86: ordinal not in range(256)
>
> Why is this happening and where does the problem lie?

Several different things are going on here. First, when you try to
print a unicode string using str() or a similar function, Python is going to
use the default encoding to render it. The default encoding is usually
ASCII-7. Why it's trying to use Latin-1 in this case is somewhat
of a mystery.

The quote in front of the word MUST is a "smart quote", that is a
curly quote, and it is not a valid character in either ASCII or
Latin-1. Use Windows-1252 explicitly, and it should render
properly. Alternatively use UTF-8, as one of the other posters
suggested. Then it's up to whatever software you use to actually
put the ink on the paper to render it properly, but that's a different
issue.

John Roth
>
> thanks
> 




More information about the Python-list mailing list