Bug in htmlentitydefs.py with Python 3.0?

"Martin v. Löwis" martin at v.loewis.de
Wed Dec 26 19:53:00 EST 2007


> Without an additional parser, I was getting the following error
> message:
[...]
> xml.parsers.expat.ExpatError: undefined entity é: line 401, column 11

To understand that problem better, it would have been helpful to see
what line 401, column 11 of the input file actually says. AFAICT,
it must have been something like "&é;" which would be really puzzling
to have in an XML file (usually, people restrict themselves to ASCII
for entity names).

>             for entity in ent:
>                 if entity not in parser.entity:
>                     parser.entity[entity] = ent[entity]

This looks fine to me.

> The output was "wrong".  For example, one of the test I used was to
> process a copy of the main dict of htmlentitydefs.py inside an html page.  A
> few of the characters came ok, but I got things like:
> 
> 'Α':    0x0391, # greek capital letter alpha, U+0391

Why do you think this is wrong?

> When using my modified version, I got the following (which may not be
> transmitted properly by email...)
>     'Α':    0x0391, # greek capital letter alpha, U+0391
> 
> It does look like a Greek capital letter alpha here.

Sure, however, your first version ALSO has the Greek capital letter
alpha there; it is just spelled as Α (which *is* a valid spelling
for that latter in XML).

> I hope the above is of some help.

Thanks; I now think that htmlentitydefs is just as fine as it always
was - I don't see any problem here.

Regards,
Martin



More information about the Python-list mailing list