[XML-SIG] Can I use (and How to use) a DOM Validating Reader ...

Uche Ogbuji Uche.Ogbuji at fourthought.com
Sun Oct 16 07:18:51 CEST 2005


On Mon, 2005-10-10 at 11:11 +0000, mcharest at sogetel.net wrote:
> Hi, 
> 
> * I am a beginner with XML processing, so please bear with me !
> 
> * I have looked over the Python/XML HOWTO and I am currently reading
> Python & XML (Jones & Drake) O'Reilly book.  Did not find what I am looking for.
> 
> OBJECTIVES:
> ---------------
> * I would like to parse the attached XML file using Python and a simple DOM API, however I would like the following additional features:
> a) Use a Validating Reader (I would like to use a DTD at run-time within my application)
> b) XML processor to ignore all the trailing line feeds (used to visually format the XML file).
> 
> SAMPLE XML FILE:
> ---------------------
> <?xml version="1.0" encoding="US-ASCII"?>
> <!DOCTYPE casefile SYSTEM "cases.dtd">
> <casefile name="data" revision="PA1" date="2005-10-01">
> <case date="2005-10-01">
> <problem> Find an apartment </problem>
> <solution> Use Google </solution>
> <outcome> successful </outcome>
> </case>
> </casefile>
> 
> QUESTIONS:
> --------------
> a) Does PyXML offer a Validating DOM Reader ?  (Or, is a Validating Reader
> only available for SAX?)

Well you can build a DOM from SAX through the validating reader.

> b) Would using a DOM Validating DOM Reader automatically eliminate the
> extra trailing line feeds in my DOM object ? If not, how do I get the DOM object
> to drop the extra line feeds ?

Depending on your DTD, those interstitial newlines might be ignorable
whitespace.  They are unless they match a PCDATA pattern in the DTD.

If they are, then they would come into SAX in an ignorableWhitespace
event rather than characters.  You could use this to tweak the creation
of the DOM.

> c) Can I do the above without using the 4Suite XML package ?

I think you can by hacking the SAX2-based readers in 4DOM (which is part
of PyXML, not 4Suite).

Then again, 4Suite has a very fast SAX -> DOM walker.  It doesn't
validate, but it also has a very fast whitespace stripper that would
eliminate the interstitial whitespace.


-- 
Uche Ogbuji                               Fourthought, Inc.
http://uche.ogbuji.net                    http://fourthought.com
http://copia.ogbuji.net                   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/



More information about the XML-SIG mailing list