mapping HTML entities to Unicode (was: Re: Unicode name questions)

Skip Montanaro skip at pobox.com
Wed Apr 17 11:57:11 EDT 2002


    Martin> Skip Montanaro <skip at pobox.com> writes:
    >> "Lambda" has been spelled with a "b" as long as I can remember.  

    Martin> I think the Unicode charts use an ASCII transcription of the
    Martin> native pronouncation of the letters, if possible....

Thanks for the explanation, Martin.  I now have a dictionary mapping 391
specific entity strings to their plain string or Unicode string equivalents.
I know there are more.  For example, my dictionary doesn't describe any
numeric entities encoded in hex (easy enough to preprocess back to decimal,
but I've never encountered any).  I have a map_entities function that does
the obvious chr(int(ent[2:-1])) thing for decimal numeric entities that map
into the ASCII range.

Is this of use to anyone besides me?  I tossed a copy of the dictionary into

    http://manatee.mojam.com/~skip/python/entities.py

I'm willing to maintain and improve it (and expose a function that uses the
dictionary) if others feel it would be of use.  If not, I'll just muddle
along with my own minor idiosyncracies.  I'd also be happy to dump this in
favor of a preexisting solution.

Skip





More information about the Python-list mailing list