convert html entities into real chars
Laszlo Nagy
gandalf at designaproduct.biz
Tue Apr 10 11:24:11 EDT 2007
> I would like to have a function that can convert '>' into '>',
> '&' into '&' etc. I could not find how to do it easily (I have a
> code snippet for the opposite).
Found it, sorry
def convertentity(m):
"""Convert a HTML entity into normal string (ISO-8859-1)"""
if m.group(1)=='#':
try:
return chr(int(m.group(2)))
except ValueError:
return '&#%s;' % m.group(2)
try:
return htmlentitydefs.entitydefs[m.group(2)]
except KeyError:
return '&%s;' % m.group(2)
def unquotehtml(s):
"""Convert a HTML quoted string into normal string (ISO-8859-1).
Works with &#XX; and with > etc."""
return re.sub(r'&(#?)(.+?);',convertentity,s)
More information about the Python-list
mailing list