Decoding numerical and name based HTML entities
Robert Brewer
fumanchu at amor.org
Thu Mar 25 14:52:36 EST 2004
William Park wrote:
> Pieter Claerhout <Pieter.Claerhout at creo.com> wrote:
> > Hi all,
> >
> > what would be the easiest way in Python to decode HTML
> entities to a unicode
> > string? I would need a function that supports both
> numerical as well as name
> > based HTML entities.
> >
> > I already did some googling, but I only found a function
> that decoded
> > numerical ones, and this function didn't support unicode...
>
> Dictionary "table" would be first thing I'd try, ie.
> tohtml['&'] = '&'
> fromhtml['&'] = '&'
That dictionary already exists ;)
>>> import htmlentitydefs
>>> htmlentitydefs.name2codepoint[u'amp']
38
>>> unichr(38)
u'&'
Robert Brewer
MIS
Amor Ministries
fumanchu at amor.org
More information about the Python-list
mailing list