Decoding numerical and name based HTML entities

Robert Brewer fumanchu at amor.org
Thu Mar 25 14:52:36 EST 2004


William Park wrote:
> Pieter Claerhout <Pieter.Claerhout at creo.com> wrote:
> > Hi all,
> > 
> > what would be the easiest way in Python to decode HTML 
> entities to a unicode
> > string? I would need a function that supports both 
> numerical as well as name
> > based HTML entities.
> > 
> > I already did some googling, but I only found a function 
> that decoded
> > numerical ones, and this function didn't support unicode...
> 
> Dictionary "table" would be first thing I'd try, ie.
>     tohtml['&'] = '&'
>     fromhtml['&'] = '&'


That dictionary already exists ;)

>>> import htmlentitydefs
>>> htmlentitydefs.name2codepoint[u'amp']
38
>>> unichr(38)
u'&'


Robert Brewer
MIS
Amor Ministries
fumanchu at amor.org




More information about the Python-list mailing list