[XML-SIG] SAX2.py

04 Oct 1999 13:49:47 +0200

* uche ogbuji
| 
| The module is attached.  

Uche, this is great! It duplicates what I have already done (and
already posted), but that doesn't matter. If we can thrash out the
issues on the list and arrive at one set of interfaces then that would
be great.

I've sent your proposal to the printer and will look at it tonight.
For comparison, here is mine:

--- Features

The list below is copied directly from David Megginsons latest
proposal. Note that all features are optional.

http://xml.org/sax/features/validation
  Validate (true) or don't validate (false).

http://xml.org/sax/features/external-general-entities
  Expand external general entities (true) or don't expand (false).

http://xml.org/sax/features/external-parameter-entities
  Expand external parameter entities including the external DTD subset
  (true) or don't expand (false).

http://xml.org/sax/features/namespaces
  Preprocess namespaces (true) or don't preprocess (false).  See also
  the http://xml.org/sax/properties/namespace-sep property.

http://xml.org/sax/features/normalize-text
  Ensure that all consecutive text is returned in a single callback to
  DocumentHandler.characters or DocumentHandler.ignorableWhitespace
  (true) or explicitly do not require it (false).

http://xml.org/sax/features/use-locator
  Provide a Locator using the DocumentHandler.setDocumentLocator
  callback (true), or explicitly do not provide one (false).

--- LexicalHandler

This handler is supposed to be used by applications that need
information about lexical details in the document such as comments and
entity boundaries. Most applications won't need this, but the DOM will
find it useful. Support for this handler will be optional.

This handler has the handerID http://xml.org/sax/handlers/lexical.

class LexicalHandler:

  def xmlDecl(self, version, encoding, standalone):
    """All three parameters are strings. encoding and standalone are not
    specified on the XML declaration, their values will be None."""

  def startDTD(self, root, publicID, systemID):
    """This event is reported when the DOCTYPE declaration is
    encountered. root is the name of the root element type, while the two last
    parameters are the public and system identifiers of the external
    DTD subset."""

  def endDTD(self):
    "This event is reported after the DTD has been parsed."

  def startEntity(self, name):
    """Reports the beginning of a new entity. If the entity is the
    external DTD subset the name will be '[dtd]'."""

  def endEntity(self, name):
    pass

  def startCDATA(self):
    pass

  def endCDATA(self):
    pass

--- Extended parser

class Parser2(Parser):

  def setFeature(featureID, state)
    This turns on or off (depending on whether state is true or false)
    support for a particular feature (like namespaces, validation etc).
    The parser can raise SAXNotSupportedException if it doesn't
    support the feature or its subclass SAXUnrecognizedException.

  def setHandler(handlerID, handler):
    This registers an event handler with the parser (LexicalHandler,
    NamespaceHandler or maybe some special parser-defined handler).
    The parser can raise SAXNotSupportedException if it doesn't
    support the handler or its subclass SAXUnrecognizedException.

  def set(propertyID, value):
    This sets the value of a parser property (such as the namespace
    separator string or something parser-defined.) The parser can
    raise SAXNotSupportedException if it doesn't support the handler
    or its subclass SAXUnrecognizedException.

  def get(propertyID):   
    This returns the value of a property. The parser can raise
    SAXNotSupportedException if it doesn't support the handler or its
    subclass SAXUnrecognizedException.

--- Properties

The first three properties come from the JavaSAX proposal, while the
last one was invented by yours truly.

http://xml.org/sax/properties/namespace-sep <String> (write-only)
  Set the separator to be used between the URI part of a name and the
  local part of a name when namespace processing is being performed
  (see the http://xml.org/sax/features/namespaces feature).  By
  default, the separator is a single space.  This property may not be
  set while a parse is in progress (throws a SAXNotSupportedException).

http://xml.org/sax/properties/dom-node <Node> (read-only)
  Get the DOM node currently being visited, if the SAX parser is
  iterating over a DOM tree.  If the parser recognises and supports
  this property but is not currently visiting a DOM node, it should
  return null (this is a good way to check for availability before the
  parse begins).

  This property doesn't make much sense for Python, but I see no point
  in leaving it out, either.

http://xml.org/sax/properties/xml-string <String> (read-only)
  Get the literal string of characters associated with the current
  event.  If the parser recognises and supports this property but is
  not currently parsing text, it should return null (this is a good
  way to check for availability before the parse begins).  I stole
  this idea from Expat.

In addition, I think PySAX needs the following property:

http://python.org/sax/properties/data-encoding <String> (read/write)
  This property can be used to control which character encoding is
  used for data events that come from the parser. Throws
  SAXEncodingNotSupportedException if the encoding is not supported
  by the parser.

--- AttributeList2

This posting specifies both an extended AttributeList interface for
information needed by the DOM (and possibly also others) and also for
full XML 1.0 conformance. I'm not really sure whether we should
actually use all of this, so opinions are welcome.

class AttributeList2:

  def isSpecified(self,attr):
    """Returns true if the attribute was explicitly specified in the
    document and false otherwise. attr can be the attribute name or
    its index in the AttributeList."""

  def getEntityRefList(self,attr):
    """This returns the EntityRefList (see below) for an attribute,
    which can be specified by name or index."""

The class below is inteded to be used for discovering entity reference
boundaries inside attribute values. This is needed because the XML 1.0
recommendation requires parsers to report unexpanded entity references, 
also inside attribute values. Whether this is really
something we want is another matter.

class EntityRefList:

  def getLength(self):
    "Returns the number of entity references inside this attribute value."

  def getEntityName(self, ix):
    "Returns the name of entity reference number ix (zero-based index)."

  def getEntityRefStart(self, ix):
    """Returns the index of the first character inside the attribute
    value that stems from entity reference number ix."""

  def getEntityRefEnd(self, ix):
    "Returns the index of the last character in entity reference ix."

One redeeming feature of this interface is that it lives entirely
outside the attribute value, and so can be ignored entirely by those
who are not interested.

--Lars M.