[XML-SIG] "Character reference too large" error with HtmlLib.Reader()

Lars Marius Garshol larsga@garshol.priv.no
31 Jul 2002 14:15:02 +0200


* Lars Marius Garshol
|
| This sounds like an obvious bug. I suggest you make the smallest
| document you can that reproduces the error, and then report this as
| a bug in the PyXML Sourceforge project (it seems to be in sgmlop,
| which I don't think is part of Python proper), attaching the file to
| it.

* Martin v. Loewis
| 
| It turns out that the bug is not that obvious. sgmlop cannot return
| a Unicode string, since, in SGML mode, it would have to know what the
| character set for character references is. 

That I understand, but it shouldn't just say that the reference is too
big. So the error message, at least, has to be improved.

| Instead, this was a bug in xml.dom.reader.SgmlOp.HtmlParser, which
| failed to implement handler_charref (sgmlop only tries to interpret
| the character references itself if handle_charref is not
| implemented).

Sounds reasonable to me.
 
| This will be fixed in PyXML 0.8; the fix is in SgmlOp.py 1.10.

Good. :)

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
ISO SC34/WG3, OASIS GeoLang TC        <URL: http://www.garshol.priv.no >