[XML-SIG] namespaces and sax questions

Roman Suzi rnd@onego.ru
Fri, 7 Sep 2001 21:14:50 +0400 (MSD)


Hello!

I am trying to master XML and I can't understand wgat is "qualified name"
as understood by the sax.* modules of standard Python 2.1.1:

Here are my program, XML example and result:

--- run.py ---
import xml.sax, xml.sax.handler
from xml.sax.xmlreader import InputSource

class ContentHandler(xml.sax.handler.ContentHandler):

  def startElementNS(self, name, qname, attrs):
    print "name=", name, "qname=", qname
    print "names:", attrs.getNames(),
    print "qnames:", attrs.getQNames()
#  def endElementNS(self, name, qname):
#    print name, qname
  def startPrefixMapping(self, prefix, URI):
    print "START", prefix, URI
  def endPrefixMapping(self, prefix):
    print "END", prefix

input_source = InputSource()
input_source.setByteStream(open("W3CExample.xml", "r"))
xml_reader = xml.sax.make_parser()
xml_reader.setContentHandler(ContentHandler())
# while docs tell it is ON by default, it is not:
xml_reader.setFeature(xml.sax.handler.feature_namespaces, 1)
xml_reader.parse(input_source)
---

--- W3CExample.xml ---
<?xml version="1.0"?>
<!-- elements are in the HTML namespace, in this case by default -->
<html xmlns='http://www.w3.org/TR/REC-html40'>
  <head><title>Frobnostication</title></head>
  <body><p>Moved to
    <a href='http://frob.com'>here</a>.</p></body>
</html>
---

And the result:

---
START None http://www.w3.org/TR/REC-html40
name= (u'http://www.w3.org/TR/REC-html40', u'html') qname= None
names: [] qnames: []
name= (u'http://www.w3.org/TR/REC-html40', u'head') qname= None
names: [] qnames: []
name= (u'http://www.w3.org/TR/REC-html40', u'title') qname= None
names: [] qnames: []
name= (u'http://www.w3.org/TR/REC-html40', u'body') qname= None
names: [] qnames: []
name= (u'http://www.w3.org/TR/REC-html40', u'p') qname= None
names: [] qnames: []
name= (u'http://www.w3.org/TR/REC-html40', u'a') qname= None
names: [(None, u'href')] qnames: []
END None
---

I do not see any "html:title", "html:head", ... in qnames while
http://www.w3.org/TR/REC-xml-names says what qname is:

 Qualified Name
     QName ::= (Prefix ':')? LocalPart
     Prefix ::= NCName
     LocalPart ::= NCName

Also, most of the features aren't supported by default xmlparser
(pyexpat), while Python docs do not tell so.

The same thing happens if I add "html:" to the tags explicitly.

What is the problem? How these observations could be explained?
Thanks!

Sincerely yours, Roman Suzi
-- 
_/ Russia _/ Karelia _/ Petrozavodsk _/ rnd@onego.ru _/
_/ Friday, September 07, 2001 _/ Powered by Linux RedHat 6.2 _/
_/ "Dreams are free, but you get soaked on the connect time." _/