[XML-SIG] Setting the DOCTYPE in a new XML DOM

Sylvain Thenault Sylvain.Thenault@logilab.fr
Wed, 16 Jan 2002 09:37:09 +0100 (CET)


On 15 Jan 2002, Douglas Bates wrote:
> Sylvain Thenault <Sylvain.Thenault@logilab.fr> writes:
> > On 11 Jan 2002, Douglas Bates wrote:
> > > I have been unable to determine how to set the SYSTEM in the doctype
> > > of a document read by the PyExpat reader.  I am rather new to this so
> > > it is possible that I am doing something foolish.  I have mostly been
> > > following demo's and examples as I haven't been able to track down a
> > > lot of documentation.  A sample program is
> > > 
> > > #!/usr/bin/env python2.2
> > > 
> > > from xml.dom.ext.reader.PyExpat import Reader
> > > from xml.dom.ext import PrettyPrint
> > > 
> > > if __name__ == "__main__":
> > >     reader = Reader()
> > >     doc = reader.fromUri("/tmp/foo1.xml")
> > >     PrettyPrint(doc)
> > > 
> > > The file /tmp/foo1.xml begins
> > > 
> > > <?xml version="1.0"?>
> > > <!DOCTYPE booklist SYSTEM "file:////home/deepayan/python/book.dtd">
> > > <booklist>
> > >   <book>
> > > 
> > > but the output file begins
> > > 
> > > <?xml version='1.0' encoding='UTF-8'?>
> > > <!DOCTYPE booklist>
> > > <booklist>
> > >   <book>
> > > 
> > > Can anyone tell me what I do to maintain the SYSTEM designation?
> > 
> > this is a bug in pyexpat. It should work if you use xmlproc instead of
> > pyexpat to generate your dom tree.
> 
> Could you tell me how I would use xmlproc to create a reader or to
> somehow load and XML from a URI?  Is there a Reader class that uses
> xmlproc?  I couldn't see one when I looked through the libraries.
  
here is an example:

---------------------------------------------------------
from xml.dom.ext.reader import Sax2
from xml.sax import make_parser

parser = make_parser(['xml.sax.drivers2.drv_xmlproc'])
reader = Sax2.Reader(parser=parser)
---------------------------------------------------------

by default (without the "parser" argument), Sax and Sax2 readers will use
xmlproc when you ask for a validating parser (in other cases, they try to
use the faster parser, say pyexpat or sgmlop which are Python wrapper for
C libraries).

I have tried this with your example and it works correctly (doesn't loose
the doctype node :)

cheers

-- 
Sylvain Thenault

  LOGILAB           http://www.logilab.org