convert html entities into real chars

Laszlo Nagy gandalf at designaproduct.biz
Tue Apr 10 11:24:11 EDT 2007


> I would like to have a function that can convert '>' into '>',  
> '&' into '&' etc. I could not find how to do it easily (I have a 
> code snippet for the opposite).
Found it, sorry



def convertentity(m):
    """Convert a HTML entity into normal string (ISO-8859-1)"""
    if m.group(1)=='#':
        try:
            return chr(int(m.group(2)))
        except ValueError:
            return '&#%s;' % m.group(2)
    try:
        return htmlentitydefs.entitydefs[m.group(2)]
    except KeyError:
        return '&%s;' % m.group(2)

def unquotehtml(s):
    """Convert a HTML quoted string into normal string (ISO-8859-1).
   
    Works with &#XX; and with   > etc."""
    return re.sub(r'&(#?)(.+?);',convertentity,s)




More information about the Python-list mailing list