html escape sequences
Leif K-Brooks
eurleif at ecritters.biz
Fri Mar 18 06:46:20 EST 2005
Will McGugan wrote:
> I'd like to replace html escape sequences, like   and ' with
> single characters. Is there a dictionary defined somewhere I can use to
> replace these sequences?
How about this?
import re
from htmlentitydefs import name2codepoint
_entity_re = re.compile(r'&(?:(#)(\d+)|([^;]+));')
def _repl_func(match):
if match.group(1): # Numeric character reference
return unichr(int(match.group(2)))
else:
return unichr(name2codepoint[match.group(3)])
def handle_html_entities(string):
return _entity_re.sub(_repl_func, string)
More information about the Python-list
mailing list