exhaustive mapping from html entities to unicode ?

Steven Taschuk staschuk at telusplanet.net
Fri Mar 7 14:52:34 EST 2003


Quoth shagshag13:
> does anyone have compiled an exhaustive mapping from html entities (as
> entity name or entity number) to unicode ?
> (before doing it myself...)
> 
> i'm looking for something which would contain :
> .... mapping['€'] : u'\u20ac', mapping['&#8364'] : u'\u20ac' ....

As de Jong pointed out, including the numeric entities is silly.

For a list of named entities and their Unicode values, see the
HTML 4.01 spec: <http://www.w3.org/TR/html4/sgml/entities.html>.

-- 
Steven Taschuk                                                   w_w
staschuk at telusplanet.net                                      ,-= U
                                                               1 1





More information about the Python-list mailing list