[XML-SIG] Using PyExpat.py

Lars Marius Garshol larsga@garshol.priv.no
20 Feb 2001 00:03:51 +0100


* Uche Ogbuji
|
| Most XML processing specifications mandate that the URI of the XML
| entity that contains an infoset node is used as the basis for
| further processing.  To me, this argues strongly for dropping local
| files rather than URIs if we must choose.  Some XML specs would be
| very difficult to implement properly if the low-level tools became
| file-system-only readers.

* Guido van Rossum
| 
| Can you give more details of how this is used? 

The simplest example is perhaps

  <!DOCTYPE doc [
    <!ENTITY chapter1 SYSTEM "chapter1.xml">
    <!ENTITY chapter2 SYSTEM "chapter2.xml">
    <!ENTITY chapter3 SYSTEM "chapter3.xml">
    <!-- ... -->
  ]>
  <doc>
  <title>The Meaning of Life</title>

  <part><title>Life, the Universe and Everything</title>
  &chapter1;
  &chapter2;
  &chapter3;
  <!-- ... -->
  </part>
  </doc>

This XML document is really a hub document for a book, which contains
metadata about the book (the title), the part structure and references
to each chapter. The chapters, however, reside in files.  

The XML recommendation says clearly that the bit after the 'SYSTEM'
must be a URI, and that it is turned into an absolute URI by being
resolved against the base URI of the document.

With the XML Base specification you can put attributes named
'xml:base' into your documents to locally change the base URI in a
part of the document. This then interacts with other XML
specifications that allow URI references to appear in the contents of
the document. The XML syntax for topic maps is one example of this.

This does not mean that we can't have a parseFile method, but that the
file name given must be converted into a URI before the XML system
starts using it.

| I've got very limited XML experience, and so far it all falls in the
| category of "here's a file; give me a DOM tree for it" or "here's a
| DOM tree, write it to a file".  There are no URLs anywhere.
| Sometimes instead of a file it'll be text data read from or written
| to a database.  But no URLs.

That is probably the most common use case in the near future, but not
everyone uses XML like that and the entire family of standards assumes
that the basic framework is that of the web. 

Quite a few XML applications work across the network and really rely
on it being possible to parse remote documents (RSS perhaps being the
most famous), and I think this will only be more common in the future.
And in any case it works just fine already. :-)
 
[larsga@pc36 dist]$ python xvcmd.py http://www.w3.org/TR/2000/REC-xml-20001006.xml 
xmlproc version 0.70

Parsing 'http://www.w3.org/TR/2000/REC-xml-20001006.xml'
W:http://www.w3.org/XML/1998/06/xmlspec-v21.dtd:2:9: Attribute 'id'
defined more than once
W:http://www.w3.org/XML/1998/06/xmlspec-v21.dtd:3:9: Attribute 'role'
defined more than once
W:http://www.w3.org/XML/1998/06/xmlspec-v21.dtd:414:9: Attribute
'diff' defined more than once
E:http://www.w3.org/TR/2000/REC-xml-20001006.xml:2816:76: Actual value
of attribute 'xmlns:xlink' does not match fixed value

Parse complete, 1 error(s) and 3 warning(s)


--Lars M.