adding the XML to 2.0 to be a mistake?

Robert Roy rjroy at takingcontrol.com
Sat Jan 20 18:19:09 EST 2001


On 20 Jan 2001 23:51:36 +0100, Martin von Loewis
<loewis at informatik.hu-berlin.de> wrote:

>> I may be misinterpreting the spec but if I declare standalone="no"
>> with a non-validating parser, should it not ignore entities that it
>> can't find?
>
>Maybe its a misunderstanding. If you were talking about entities
>declared in an external subset (i.e. in a DTD), then they are not
>undeclared.
>
>So if the document has no DOCTYPE declaration and then uses truly
>undeclared entity references, the document is ill-formed, and even a
>non-validating parser must report that. 
>
>The section of the spec you cite only allows a parser to ignore
>external entities if they can be assumed to have a definition in the
>external DTD subset.
>

Thanks, I had not made that distinction. Makes sense when you think
about it. 

>> If I interpret the spec properly then expat's behavior (at least as
>> included with 2.0)  is questionable. It should not choke on an
>> undeclared entity ref since it must assume that it is declared
>> elsewhere.
>
>And indeed, it doesn't. For me, it parses
>
><!DOCTYPE bla SYSTEM "hallo.dtd">
><bla>&test;</bla>
>
>just fine, whereas it "chokes" on
>
><bla>&test;</bla>
>
>just as it should.
>

Yes, if I add the DOCTYPE declaration, then it works too.

>> I want it all correct AND flexible <g/>. Now if we can just hook in
>> here and override there...
>
>It seems then that xml.parsers.expat is just for you - atleast if you
>want to process XML. If you want to process something like XML, then
>you should stick with xmllib.
>

Well this has been a fruitful thread for me. Between your replies and
Paul's, I now have a better understanding of the XML spec and best of
all I can now see a clear way to start making better use of pyexpat.


Thanks Martin, Paul


Bob



More information about the Python-list mailing list