Python and decimal character entities over 128.
bsagert at gmail.com
bsagert at gmail.com
Wed Jul 9 19:39:24 EDT 2008
Some web feeds use decimal character entities that seem to confuse
Python (or me). For example, the string "doesn't" may be coded as
"doesn’t" which should produce a right leaning apostrophe.
Python hates decimal entities beyond 128 so it chokes unless you do
something like string.encode('utf-8'). Even then, what should have
been a right-leaning apostrophe ends up as "’". The following script
does just that. Look for the string "The Canuck iPhone: Apple doesnâ
€™t care" after running it.
# coding: UTF-8
import feedparser
s = ''
d = feedparser.parse('http://feeds.feedburner.com/Mathewingramcom/
work')
title = d.feed.title
link = d.feed.link
for i in range(0,4):
title = d.entries[i].title
link = d.entries[i].link
s += title +'\n' + link + '\n'
f = open('c:/x/test.txt', 'w')
f.write(s.encode('utf-8'))
f.close()
This useless script is adapted from a "useful" script. Its only
purpose is to ask the Python community how I can deal with decimal
entities > 128. Thanks in advance, Bill
More information about the Python-list
mailing list