adding the XML to 2.0 to be a mistake?
Martin von Loewis
loewis at informatik.hu-berlin.de
Sat Jan 20 17:51:36 EST 2001
rjroy at takingcontrol.com (Robert Roy) writes:
> >> Undeclared entities are a problem in SAX but can be handled cleanly
> >> using the unknown_entityref mecanism in xmllib.
> >
> >According to the XML specification, *all* entities must be declared. An
> >XML parser is required to check that.
> >
>
> I may be misinterpreting the spec but if I declare standalone="no"
> with a non-validating parser, should it not ignore entities that it
> can't find?
Maybe its a misunderstanding. If you were talking about entities
declared in an external subset (i.e. in a DTD), then they are not
undeclared.
So if the document has no DOCTYPE declaration and then uses truly
undeclared entity references, the document is ill-formed, and even a
non-validating parser must report that.
The section of the spec you cite only allows a parser to ignore
external entities if they can be assumed to have a definition in the
external DTD subset.
> If I interpret the spec properly then expat's behavior (at least as
> included with 2.0) is questionable. It should not choke on an
> undeclared entity ref since it must assume that it is declared
> elsewhere.
And indeed, it doesn't. For me, it parses
<!DOCTYPE bla SYSTEM "hallo.dtd">
<bla>&test;</bla>
just fine, whereas it "chokes" on
<bla>&test;</bla>
just as it should.
> I want it all correct AND flexible <g/>. Now if we can just hook in
> here and override there...
It seems then that xml.parsers.expat is just for you - atleast if you
want to process XML. If you want to process something like XML, then
you should stick with xmllib.
Regards,
Martin
More information about the Python-list
mailing list