Bug in htmlentitydefs.py with Python 3.0?

Thu Dec 27 05:34:33 EST 2007

Martin v. Löwis wrote:

>> In trying to parse html files using ElementTree running under Python
>> 3.0a1, and using htmlentitydefs.py to add "character entities" to the
>> parser, I found that I needed to create a customized version of
>> htmlentitydefs.py to make things work properly.
> 
> Can you please state what precise problem you were seeing? The original
> code looks fine to me as it stands.

from what I can tell, his problem is that htmlentitydefs.entitydefs maps 
to *either* character strings or HTML character references, depending on 
the character value.  he needs a dictionary that maps from entity names 
to characters for *all* names; something like (untested):

     entity_map = htmlentitydefs.entitydefs.copy()
     for name, entity in entity_map.items():
         if len(entity) != 1:
             entity_map[name] = unichr(int(entity[2:-1]))

(entitydefs is pretty unusable as it is, but it was added to Python 
before Python got Unicode strings, and changing it would break ancient 
code...)

</F>