adding the XML to 2.0 to be a mistake?

Martin von Loewis loewis at informatik.hu-berlin.de
Sat Jan 20 17:51:36 EST 2001


rjroy at takingcontrol.com (Robert Roy) writes:

> >> Undeclared entities are a problem in SAX but can be handled cleanly
> >> using the unknown_entityref mecanism in xmllib.
> >
> >According to the XML specification, *all* entities must be declared. An
> >XML parser is required to check that.
> >
> 
> I may be misinterpreting the spec but if I declare standalone="no"
> with a non-validating parser, should it not ignore entities that it
> can't find?

Maybe its a misunderstanding. If you were talking about entities
declared in an external subset (i.e. in a DTD), then they are not
undeclared.

So if the document has no DOCTYPE declaration and then uses truly
undeclared entity references, the document is ill-formed, and even a
non-validating parser must report that. 

The section of the spec you cite only allows a parser to ignore
external entities if they can be assumed to have a definition in the
external DTD subset.

> If I interpret the spec properly then expat's behavior (at least as
> included with 2.0)  is questionable. It should not choke on an
> undeclared entity ref since it must assume that it is declared
> elsewhere.

And indeed, it doesn't. For me, it parses

<!DOCTYPE bla SYSTEM "hallo.dtd">
<bla>&test;</bla>

just fine, whereas it "chokes" on

<bla>&test;</bla>

just as it should.

> I want it all correct AND flexible <g/>. Now if we can just hook in
> here and override there...

It seems then that xml.parsers.expat is just for you - atleast if you
want to process XML. If you want to process something like XML, then
you should stick with xmllib.

Regards,
Martin



More information about the Python-list mailing list