Python and decimal character entities over 128.

bsagert at gmail.com bsagert at gmail.com
Wed Jul 9 19:39:24 EDT 2008


Some web feeds use decimal character entities that seem to confuse
Python (or me). For example, the string "doesn't" may be coded as
"doesn’t" which should produce a right leaning apostrophe.
Python hates decimal entities beyond 128 so it chokes unless you do
something like string.encode('utf-8'). Even then, what should have
been a right-leaning apostrophe ends up as "’". The following script
does just that. Look for the string "The Canuck iPhone: Apple doesnâ
€™t care" after running it.

# coding: UTF-8
import feedparser

s = ''
d = feedparser.parse('http://feeds.feedburner.com/Mathewingramcom/
work')
title = d.feed.title
link = d.feed.link
for i in range(0,4):
    title = d.entries[i].title
    link = d.entries[i].link
    s += title +'\n' + link + '\n'

f = open('c:/x/test.txt', 'w')
f.write(s.encode('utf-8'))
f.close()

This useless script is adapted from a "useful" script. Its only
purpose is to ask the Python community how I can deal with decimal
entities > 128. Thanks in advance, Bill





More information about the Python-list mailing list