From akuchlin@cnri.reston.va.us  Thu Apr  1 01:36:07 1999
From: akuchlin@cnri.reston.va.us (A.M. Kuchling)
Date: Wed, 31 Mar 1999 20:36:07 -0500
Subject: [XML-SIG] PyXML 0.5.1 prerelease 1
Message-ID: <199904010136.UAA14036@207-172-38-113.s113.tnt8.ann.va.dialup.rcn.com>

I've put up a pre-release of version 0.5.1 of the XML package.  Please
try it out and report any minor errors, glitches, or installation
nits.  After one or two iterations, I'll remove the "pre-release"
designation and announce it more widely.

It's available in .tgz and .zip format:
     http://www.python.org/sigs/xml-sig/files/xml-0.5.1pre1.tgz
     http://www.python.org/sigs/xml-sig/files/xml051pre1.zip

(Also available at the python.org mirrors, of course.)

I haven't written up a list of the changes yet, but will do that for
the next pre-release.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
For non-deterministic read "Inhabited by pixies."
    -- Anonymous


From gstein@lyra.org  Thu Apr  1 02:51:12 1999
From: gstein@lyra.org (Greg Stein)
Date: Wed, 31 Mar 1999 18:51:12 -0800
Subject: [XML-SIG] updated "quick parser" ... qp_xml.py
Message-ID: <3702DF20.5C10EB14@lyra.org>

This is a multi-part message in MIME format.

--------------6625E7BC3519C1A232100425
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Hey there...

From that speed test thing that I posted a few days ago, I extracted an
actual module. At the same time, I also simplified some of the namespace
stuff and corrected several bugs w.r.t. default namespaces.

As I mentioned, this guys is about 12x faster than using the DOM to
parse XML (when both are using pyexpat).

It also handles namespaces and xml:lang properly.

Comments/patches are encouraged.

thx
-g

--
Greg Stein, http://www.lyra.org/

--------------6625E7BC3519C1A232100425
Content-Type: text/plain; charset=us-ascii; name="qp_xml.py"
Content-Disposition: inline; filename="qp_xml.py"
Content-Transfer-Encoding: 7bit

#
# qp_xml: Quick Parsing for XML
#

import string

try:
  import pyexpat
except ImportError:
  from xml.parsers import pyexpat

error = __name__ + '.error'


#
# The parsing class. Instantiate and pass a string/file to .parse()
#
class Parser:
  def __init__(self):
    self.reset()

  def reset(self):
    self.root = None
    self.cur_elem = None
    self.error = None

  def find_prefix(self, prefix):
    elem = self.cur_elem
    while elem:
      if elem.ns_scope.has_key(prefix):
        return elem.ns_scope[prefix]
      elem = elem.parent

    if prefix == '':
      return ''		# empty URL for "no namespace"

    return None

  def process_prefix(self, ob, use_default):
    idx = string.find(ob.name, ':')
    if idx == -1:
      if use_default:
        ob.ns = self.find_prefix('')
      else:
        ob.ns = ''	# no namespace
    elif string.lower(ob.name[:3]) == 'xml':
      ob.ns = ''	# name is reserved by XML. don't break out a NS.
    else:
      ob.ns = self.find_prefix(ob.name[:idx])
      ob.name = ob.name[idx+1:]

      if ob.ns is None:
        self.error = 'namespace prefix not found'
        return

  def start(self, name, attrs):
    if self.error:
      return

    elem = _element(name=name, lang=None, parent=None,
                    children=[], ns_scope={}, attrs=[],
                    first_cdata='', following_cdata='')

    if self.cur_elem:
      elem.parent = self.cur_elem
      elem.parent.children.append(elem)
      self.cur_elem = elem
    else:
      self.cur_elem = self.root = elem

    # scan for namespace declarations (and xml:lang while we're at it)
    for i in range(0, len(attrs), 2):
      name = attrs[i]
      value = attrs[i+1]

      if name == 'xmlns':
        elem.ns_scope[''] = value
      elif name[:6] == 'xmlns:':
        elem.ns_scope[name[6:]] = value
      elif name == 'xml:lang':
        elem.lang = value
      else:
        attr = _attribute(name=name, value=value)
        elem.attrs.append(attr)

    # inherit xml:lang from parent
    if elem.lang is None and elem.parent:
      elem.lang = elem.parent.lang

    # process prefix of the element name
    self.process_prefix(elem, 1)

    # process attributes' namespace prefixes
    for attr in elem.attrs:
      self.process_prefix(attr, 0)

  def end(self, name):
    if self.error:
      return

    parent = self.cur_elem.parent

    del self.cur_elem.ns_scope
    del self.cur_elem.parent

    self.cur_elem = parent

  def cdata(self, data):
    if self.error:
      return
    elem = self.cur_elem
    if elem.children:
      last = elem.children[-1]
      last.following_cdata = last.following_cdata + data
    else:
      elem.first_cdata = elem.first_cdata + data

  def parse(self, input):
    self.reset()

    p = pyexpat.ParserCreate()
    p.StartElementHandler = self.start
    p.EndElementHandler = self.end
    p.CharacterDataHandler = self.cdata

    try:
      if type(input) == type(''):
        rv = p.Parse(input, 1)
      else:
        while 1:
          s = input.read(_BLOCKSIZE)
          if not s:
            rv = p.Parse('', 1)
            break
          rv = p.Parse(s, 0)
          if rv == 0 or self.error:
            break

      if rv == 0:
        s = pyexpat.ErrorString(p.ErrorCode)
        raise error, 'expat parsing error: ' + s
      if self.error:
        raise error, self.error
    finally:
      _clean_tree(self.root)

    return self.root


#
# handy function for dumping a tree that is returned by Parser
#
def dump(f, root):
  f.write('<?xml version="1.0"?>\n')
  namespaces = _collect_ns(root)
  _dump_recurse(f, root, namespaces, 1)
  f.write('\n')


#
# This function returns the element's CDATA. Note: this is not recursive --
# it only returns the CDATA immediately within the element, excluding the
# CDATA in child elements.
#
def textof(elem):
  s = elem.first_cdata
  for child in elem.children:
    s = s + child.following_cdata
  return s


#########################################################################
#
# private stuff for qp_xml
#

_BLOCKSIZE = 16384	# chunk size for parsing input

class _blank:
  def __init__(self, **kw):
    self.__dict__.update(kw)
class _element(_blank): pass
class _attribute(_blank): pass

def _clean_tree(elem):
  elem.parent = None
  del elem.parent
  map(_clean_tree, elem.children)


def _collect_recurse(elem, dict):
  dict[elem.ns] = None
  for attr in elem.attrs:
    dict[attr.ns] = None
  for child in elem.children:
    _collect_recurse(child, dict)

def _collect_ns(elem):
  "Collect all namespaces into a NAMESPACE -> PREFIX mapping."
  d = { '' : None }
  _collect_recurse(elem, d)
  del d['']	# make sure we don't pick up no-namespace entries
  keys = d.keys()
  for i in range(len(keys)):
    d[keys[i]] = i
  return d

def _dump_recurse(f, elem, namespaces, dump_ns=0):
  if elem.ns:
    f.write('<ns%d:%s' % (namespaces[elem.ns], elem.name))
  else:
    f.write('<' + elem.name)
  for attr in elem.attrs:
    if attr.ns:
      f.write(' ns%d:%s="%s"' % (namespaces[attr.ns], attr.name, attr.value))
    else:
      f.write(' %s="%s"' % (attr.name, attr.value))
  if dump_ns:
    for ns, id in namespaces.items():
      f.write(' xmlns:ns%d="%s"' % (id, ns))
  if elem.children or elem.first_cdata:
    f.write('>' + elem.first_cdata)
    for child in elem.children:
      _dump_recurse(f, child, namespaces)
      f.write(child.following_cdata)
    if elem.ns:
      f.write('</ns%d:%s>' % (namespaces[elem.ns], elem.name))
    else:
      f.write('</%s>' % elem.name)
  else:
    f.write('/>')

--------------6625E7BC3519C1A232100425--


From Jeff.Johnson@icn.siemens.com  Thu Apr  1 20:38:15 1999
From: Jeff.Johnson@icn.siemens.com (Jeff.Johnson@icn.siemens.com)
Date: Thu, 1 Apr 1999 15:38:15 -0500
Subject: [XML-SIG] HtmlBuilder - uses sgmllib, can it use sax/pyexpat?
Message-ID: <85256746.007108C1.00@li01.lm.ssc.siemens.com>


Now that I've delivered my Beta CD's to reproduction, I can take a breath and
try to optimize my conversion programs.  I was reading Greg Stein's quick XML
parser and started to see if I could use it.  That was when I realized that most
of what I do is read HTML files via xml.dom.html_builder.HtmlBuilder and it uses
sgmllib.  Assuming that pure python sgmllib is slower than pyexpat which uses C
code, I wondered if there was a way to make HtmlBuilder use SAX and the default
pyexpat parser.  After taking a *very* quick look at the SAX and sgmllib parser
interfaces, it seems like a trivial matter to modify HtmlBuilder to use SAX.  Is
this true and would it be faster?  I know very little about these parsers so
forgive me if my suggestion is just plain stupid :)

To Greg: Most of my code uses DOM so I'm not sure if I could use your parser.
Would it be possible to add a DOM interface (or subset) to the objects it
creates?

To Andrew:
I've found a bug in the XML 0.5.1 package:  The xml/CREDITS file lists me (which
I was pleasantly surprised to see) and ONLY me.  I figure the guys that wrote
the library (you included) might also be included in the credits.  Thanks for
putting me in there though :)

Cheers,
Jeff


From gstein@lyra.org  Thu Apr  1 21:00:39 1999
From: gstein@lyra.org (Greg Stein)
Date: Thu, 01 Apr 1999 13:00:39 -0800
Subject: [XML-SIG] Re: HtmlBuilder - uses sgmllib, can it use sax/pyexpat?
References: <85256746.007108C1.00@li01.lm.ssc.siemens.com>
Message-ID: <3703DE77.1BB9DA6A@lyra.org>

Jeff.Johnson@icn.siemens.com wrote:
>...
> To Greg: Most of my code uses DOM so I'm not sure if I could use your parser.
> Would it be possible to add a DOM interface (or subset) to the objects it
> creates?

It would be possible, but it is important to note that DOM compatibility
was specifically excluded from its design principles. That's how come it
can go so much faster :-). Basically, it just presents an alternative
data representation for XML.

Actually, if it *just* exported the API, but no change was made to how
the structure is built, then it would probably work fine.

Note that Andrew is testing a similar technique for DOM building: skip
the API and smack the underlying data structure.

Ooh. And I just saw a way to make qp_xml a bit faster. The attribute
handling shouldn't create separate objects. I should create a mapping of
(ns, name) -> value. That will help during lookup, too.

Andrew: I was thinking this might be a nice alternative mechanism that
can go into the XML package. Where would it go? Maybe call it
xml.parsers.quick or something. Of course, it isn't as quick as plain
pyexpat :-), but then it also isn't a parser in the same sense as those.
Under util?

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From Jeff.Johnson@icn.siemens.com  Mon Apr  5 18:18:17 1999
From: Jeff.Johnson@icn.siemens.com (Jeff.Johnson@icn.siemens.com)
Date: Mon, 5 Apr 1999 13:18:17 -0400
Subject: [XML-SIG] dom.utils.FileReader & HtmlBuilder
Message-ID: <8525674A.005EB60E.00@li01.lm.ssc.siemens.com>


Could we have FileReader.readHtml() ignore mismatched end tags by default?  At
the moment, there is no way to ignore them at all using FileReader.  One of the
problems with FileReader is that there aren't a lot of ways to customize it
without subclassing it.  Since it is made to be extremely simple to use, I
figure it should fix up mismatched end tags by default.

Is the fixup for the parser not being freed still required?  Has that been
fixed?


    def readHtml(self,stream,ignore_mismatched_end_tags=1):
        from xml.dom import html_builder
        b = html_builder.HtmlBuilder(ignore_mismatched_end_tags)
        b.feed(stream.read())
        b.close()
        doc = b.document
        # There was some bug that prevents the builder from
        # freeing itself (maybe it has already been fixed?).
        # The next two lines break its references to the DOM
        # tree so that it can be freed.
        b.document = None
        b.current_element = None
        return doc


Thanks,
Jeff


From Jeff.Johnson@icn.siemens.com  Tue Apr  6 19:07:11 1999
From: Jeff.Johnson@icn.siemens.com (Jeff.Johnson@icn.siemens.com)
Date: Tue, 6 Apr 1999 14:07:11 -0400
Subject: [XML-SIG] raising exceptions in dom.core
Message-ID: <8525674B.00632EE5.00@li01.lm.ssc.siemens.com>


The following code shows two class based exceptions but in the first, the
message is passed along as an argument to 'raise' while in the second, the
message is given in the constructor of the exception.  Should this be changed to
use the constructor in both cases?


        if self.readonly:
            raise NoModificationAllowedException, "Read-only node "+repr(self)
        self._checkChild(newChild, self)

        if newChild._document != self._document:
            raise WrongDocumentException("newChild %s created from a "
                                         "different document" %
(repr(newChild),) )


From Jeff.Johnson@icn.siemens.com  Wed Apr  7 18:22:27 1999
From: Jeff.Johnson@icn.siemens.com (Jeff.Johnson@icn.siemens.com)
Date: Wed, 7 Apr 1999 13:22:27 -0400
Subject: [XML-SIG] xml.dom.writer doesn't work anymore
Message-ID: <8525674C.005F1478.00@li01.lm.ssc.siemens.com>


The following line:

          self.file.write(re.sub('\n+', '\n', s))


was removed from:
class OutputStream:
     def write(self, s):
          #print 'write', `s`
          self.file.write(re.sub('\n+', '\n', s))
          if s and s[-1] == '\n':
               self.new_line = 1
          else:
               self.new_line = 0


I figure it was removed to get rid of the re.sub() but not the self.file.write()
 itself :)

Currently, HtmlWriter and XmlWriter just create 0 byte files...

This is from the XML 0.5.1 zip file...

Cheers,
Jeff


From akuchlin@cnri.reston.va.us  Thu Apr  8 01:27:36 1999
From: akuchlin@cnri.reston.va.us (A.M. Kuchling)
Date: Wed, 7 Apr 1999 20:27:36 -0400
Subject: [XML-SIG] xml.dom.writer doesn't work anymore
In-Reply-To: <8525674C.005F1478.00@li01.lm.ssc.siemens.com>
References: <8525674C.005F1478.00@li01.lm.ssc.siemens.com>
Message-ID: <199904080027.UAA00447@207-172-56-204.s204.tnt12.ann.va.dialup.rcn.com>

Jeff.Johnson@icn.siemens.com writes:
 > The following line:
 >           self.file.write(re.sub('\n+', '\n', s))
 > was removed from <xml.dom.writer>

This is what's called a brown-bag bug (because it makes the person who 
made want to wear a bag over their head).  Fixed in the CVS.

	Has anyone noted other problems with the pre-release of 0.5.1?
If not, I'll make new .tgz and .zip files with the above correction,
and call it 0.5.1 final.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
If it wasn't for the fact that a monster called the Head was plunging a metal
pipe up his nose preparatory to sucking his brains out, Michael Smith could
almost laugh.
    -- Opening sentence of ENIGMA #2: "The Truth"


From akuchlin@cnri.reston.va.us  Thu Apr  8 01:33:46 1999
From: akuchlin@cnri.reston.va.us (A.M. Kuchling)
Date: Wed, 7 Apr 1999 20:33:46 -0400
Subject: [XML-SIG] raising exceptions in dom.core
In-Reply-To: <8525674B.00632EE5.00@li01.lm.ssc.siemens.com>
References: <8525674B.00632EE5.00@li01.lm.ssc.siemens.com>
Message-ID: <199904080033.UAA00457@207-172-56-204.s204.tnt12.ann.va.dialup.rcn.com>

Jeff.Johnson@icn.siemens.com writes:
 > The following code shows two class based exceptions but in the first, the
 > message is passed along as an argument to 'raise' while in the second, the
 > message is given in the constructor of the exception.  Should this
 > be changed to use the constructor in both cases?

	It doesn't really matter; "raise exception, argument" is
equivalent to "raise exception(argument)".  See GvR's essay on
exceptions at http://www.python.org/doc/essays/stdexceptions.html .
For consistency, the DOM code should probably pick one of the two
forms and stick with it; the exception(argument) form is probably the
one to choose.  Added to the TODO list.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
    "I didn't know that there was a downstairs, here."
    "There's a downstairs in everybody. That's where we live."
    -- Lyta and the youngest of the Three, in SANDMAN #58: "The Kindly Ones:2"


From hgv@nsg0.network.com  Thu Apr  8 18:24:02 1999
From: hgv@nsg0.network.com (Harry Varnis)
Date: Thu, 08 Apr 1999 12:24:02 -0500
Subject: [XML-SIG] dtd error handling
Message-ID: <370CE632.2741CCB3@network.com>

Sorry if this isn't an appropriate forum for this, but here goes...

I can't seem to get my ErrorHandler to be used for dtd errors. I'm
using SAX + validating xmlproc. My ErrorHandler gets xml document
errors OK, but for dtd errors, the methods of xmlproc's default
Application get used.

I've tried to sort through the module code (xml-0.5) but I quickly got
tangled up :-) Can anyone help, please?

Thanks,
Harry Varnis

Here is a traceback and some code snippets:

Traceback (innermost last):
  File "/usr/local/apache/fastcgi-bin/serviceapp.py", line 236, in ?
    app.load(path)
  File "/usr/local/apache/fastcgi-bin/serviceapp.py", line 75, in load
    self.servicedata = servicedataparse.fromFile(f)
  File "/home/hgv/SSM/servicedataparse.py", line 103, in fromFile
    p.parseFile(file)
  File "/usr/lib/python1.5/site-packages/xml/sax/drivers/drv_xmlproc.py", line 2
9, in parseFile
    self.parser.read_from(file)
  File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlval.py", line 12
0, in read_from
    self.parser.read_from(file)
  File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlutils.py", line
143, in read_from
    self.feed(buf)
  File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlutils.py", line
189, in feed
    self.do_parse()
  File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlproc.py", line 2
88, in do_parse
    self.parse_doctype()
  File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlproc.py", line 6
57, in parse_doctype
    self.app.handle_doctype(rootname,pub_id,sys_id)
  File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlval.py", line 28
8, in handle_doctype
    p.parse_resource(sys_id)
  File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlutils.py", line
71, in parse_resource
    self.report_error(3000,sysID)
  File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlutils.py", line
374, in report_error
    self.err.fatal(msg)
  File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlapp.py", line 13
4, in fatal
    sys.exit(1)
SystemExit: 1
            
class ServiceDataDocumentHandler(saxlib.HandlerBase):
    def __init__(self):
        saxlib.HandlerBase.__init__(self)
        self.serviceData = None

    def startElement(self, name, attrs):
        .
        .

    def endElement(self, name):
        .
        .

    def characters(self, ch, start, length):
        .
        .

    def error(self, exception):
        message = "Recoverable error: %s" % str(exception)
        .
        .

    def fatalError(self, exception):
        message = "Non-recoverable error: %s" % str(exception)
        .
        .
        raise exception

    def warning(self, exception):
        message = "Warning: %s" % str(exception)
        .
        .

def fromFile(file):
    p = saxexts.XMLValParserFactory.make_parser()
    h = ServiceDataDocumentHandler()
    p.setDocumentHandler(h)
    p.setErrorHandler(h)
    p.setDTDHandler(h)
    p.parseFile(file)
    p.close()
    return h.serviceData


From larsga@ifi.uio.no  Fri Apr  9 21:43:58 1999
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: 09 Apr 1999 22:43:58 +0200
Subject: [XML-SIG] SAX2: Parser properties
Message-ID: <wk4smpo775.fsf@ifi.uio.no>

The first three properties come from the JavaSAX proposal, while the
last one was invented by yours truly.


http://xml.org/sax/properties/namespace-sep <String> (write-only)
  Set the separator to be used between the URI part of a name and the
  local part of a name when namespace processing is being performed
  (see the http://xml.org/sax/features/namespaces feature).  By
  default, the separator is a single space.  This property may not be
  set while a parse is in progress (throws a SAXNotSupportedException).

http://xml.org/sax/properties/dom-node <Node> (read-only)
  Get the DOM node currently being visited, if the SAX parser is
  iterating over a DOM tree.  If the parser recognises and supports
  this property but is not currently visiting a DOM node, it should
  return null (this is a good way to check for availability before the
  parse begins).

  This property doesn't make much sense for Python, but I see no point
  in leaving it out, either.

http://xml.org/sax/properties/xml-string <String> (read-only)
  Get the literal string of characters associated with the current
  event.  If the parser recognises and supports this property but is
  not currently parsing text, it should return null (this is a good
  way to check for availability before the parse begins).  I stole
  this idea from Expat.


In addition, I think PySAX needs the following property:

http://python.org/sax/properties/data-encoding <String> (read/write)
  This property can be used to control which character encoding is
  used for data events that come from the parser. In Java this is not
  an issue since all strings are Unicode, but in Python it is. Expat
  reports UTF-8, while xmlproc/xmllib just pass on whatever they're
  given.

  Do we need a special SAXEncodingNotSupportedException for this?
  Otherwise it may be impossible to tell whether the parser doesn't
  support this at all or whether it just doesn't support this
  particular encoding.

--Lars M.


From larsga@ifi.uio.no  Fri Apr  9 21:44:50 1999
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: 09 Apr 1999 22:44:50 +0200
Subject: [XML-SIG] SAX2: Handler classes
Message-ID: <wk3e29o75p.fsf@ifi.uio.no>

This list is just copied from the Java proposal. Does anyone think we
should skip any of these or add any new ones?


http://xml.org/sax/handlers/lexical <LexicalHandler>
  Receive callbacks for comments, CDATA sections, and (possibly)
  entity references.

http://xml.org/sax/handlers/dtd-decl <DTDDeclHandler>
  Receive callbacks for element, attribute, and (possibly) parsed
  entity declarations.

http://xml.org/sax/handlers/namespace <NamespaceHandler>
  Receive callbacks for the start and end of the scope of each
  namespace declaration.

--Lars M.


From Fred L. Drake, Jr." <fdrake@acm.org  Fri Apr  9 22:01:53 1999
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Fri, 9 Apr 1999 17:01:53 -0400 (EDT)
Subject: [XML-SIG] SAX2: Handler classes
In-Reply-To: <wk3e29o75p.fsf@ifi.uio.no>
References: <wk3e29o75p.fsf@ifi.uio.no>
Message-ID: <14094.27329.189328.339983@weyr.cnri.reston.va.us>

Lars Marius Garshol writes:
 > http://xml.org/sax/handlers/lexical <LexicalHandler>
 >   Receive callbacks for comments, CDATA sections, and (possibly)
 >   entity references.

  Undecided; there are times when I think it would be nice to have
these things, especially when trying to make minimal edits.

 > http://xml.org/sax/handlers/dtd-decl <DTDDeclHandler>
 >   Receive callbacks for element, attribute, and (possibly) parsed
 >   entity declarations.
 >
 > http://xml.org/sax/handlers/namespace <NamespaceHandler>
 >   Receive callbacks for the start and end of the scope of each
 >   namespace declaration.

  Yes to both.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives


From Fred L. Drake, Jr." <fdrake@acm.org  Fri Apr  9 22:03:15 1999
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Fri, 9 Apr 1999 17:03:15 -0400 (EDT)
Subject: [XML-SIG] SAX2: Parser properties
In-Reply-To: <wk4smpo775.fsf@ifi.uio.no>
References: <wk4smpo775.fsf@ifi.uio.no>
Message-ID: <14094.27411.350840.911404@weyr.cnri.reston.va.us>

Lars Marius Garshol writes:
 > http://python.org/sax/properties/data-encoding <String> (read/write)
...
 >   Do we need a special SAXEncodingNotSupportedException for this?
 >   Otherwise it may be impossible to tell whether the parser doesn't

  Yes; this needs to be available for reporting to the user.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives


From dieter@handshake.de  Sat Apr 10 19:42:49 1999
From: dieter@handshake.de (Dieter Maurer)
Date: Sat, 10 Apr 1999 20:42:49 +0200
Subject: [XML-SIG] [Ann] XSL-Pattern 0.03 released
Message-ID: <199904101842.UAA01329@lindm.dm>

I have released version 0.03 of my XSL-Pattern package.

It implements the pattern sublanguage of the XSL working
draft specification as of Dec 16, 1998.
The package provides pattern matching and selection on
HTML/XML/SGML document trees.


Changes:
 * sevaral bugs fixed:
       - "test patterns with value" threw an exception
       - "<code>ancestor</code> in match pattern" threw an exception
       - <code><var>...</var>//<var>OtherNode</var></code> startet
	  above rather than at
	  <var>OtherNode</var> when used as match pattern.
	  (was correct as select pattern).
 * pattern objects now have a <code>patternstring</code> attribute;
      it is the string the object has been built from.
 * allows for customized pattern factories.
    This is interesting, if you want to use the parser infrastructure
    to build customized parsers. Such parsers build customized
    XSL pattern objects (by means of the factory).
    They can e.g. change the matching algorithm or work
    on a sequence of SAX events rather than DOM trees for selection.


More information and download:

  URL:http://www.handshake.de/~dieter/pyprojects/xslpattern.html


- Dieter


From paul@prescod.net  Sun Apr 11 05:19:56 1999
From: paul@prescod.net (Paul Prescod)
Date: Sat, 10 Apr 1999 23:19:56 -0500
Subject: [XML-SIG] DOM API
Message-ID: <371022EC.2E0A1F6@prescod.net>

Am I right that there is a semi-offical, portably implemented SAX API for
Python but
there is no such beast for the DOM? 

* Is it reasonable to unify a subset of their interfaces?

* Could 4XSL be written to use that interface so that it would work with
both DOM implementations or do performance issues make that impossible?

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

By lumping computers and televisions together, as if they exerted a 
single malign influence, pessimists have tried to argue that the 
electronic revolution spells the end of the sort of literate culture 
that began with Gutenberg�s press. On several counts, that now seems 
the reverse of the truth.

http://www.economist.com/editorial/freeforall/19-12-98/index_xm0015.html


From Jean-Michel.Bruel@univ-pau.fr  Wed Apr 14 17:13:23 1999
From: Jean-Michel.Bruel@univ-pau.fr (Jean-Michel BRUEL)
Date: Wed, 14 Apr 1999 18:13:23 +0200 (MET DST)
Subject: [XML-SIG] [CFP] <<UML>>'99
Message-ID: <199904141613.SAA26301@crisv4.univ-pau.fr>

[apologies if you receive multiple copies of this announcement]

=================================================================
     3rd Call for Papers                <<UML>>'99
=================================================================

 Second International Conference on the
      Unified Modeling Language

 October 28-30, 1999, Fort Collins, Colorado, USA
 (just before OOPSLA)
=================================================================
 http://www.cs.colostate.edu/UML99
=================================================================

Important dates (deadlines are hard!):
   Deadline for abstract                05 May 1999
   Deadline for submission              15 May 1999
   Notification to authors              15 July 1999
   Final version of accepted papers     25 August 1999

Submissions:
   Submit your 10-15 page manuscript electronically in Postscript
   or pdf using the Springer LNCS style. Details are available at
   the conference web page. The <<UML>>'99 proceedings will be
   published by Springer-Verlag in the LNCS series.

Further Information:
   Robert B. France             E-mail: france@cs.colostate.edu
   Computer Science Department  Tel:    970-491-6356
   Colorado State University    Fax:    970-491-2466
   Fort Collins, CO 80523, USA

   Bernhard Rumpe               E-mail: rumpe@in.tum.de
   Institut fuer Informatik     Tel:    0049-89-289-28129
   T. Universitaet Muenchen     Fax:    0049-89-289-28183
   80290 Muenchen, Germany

Sponsored by IEEE Computer Society Technical Committee on Complexity in Computing
In Cooperation with ACM SIGSOFT
With the Support of OMG


From akuchlin@cnri.reston.va.us  Wed Apr 14 17:42:44 1999
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Wed, 14 Apr 1999 12:42:44 -0400 (EDT)
Subject: [XML-SIG] PyXML 0.5.1final available, and T-shirts
Message-ID: <199904141642.MAA10932@amarok.cnri.reston.va.us>

I've put up the final release of PyXML 0.5.1:

	http://www.python.org/sigs/xml-sig/files/xml-0.5.1.tgz
	http://www.python.org/sigs/xml-sig/files/xml051.zip

I won't start posting announcements until tomorrow; today is a busy
day.

On an unrelated note, I'd like to get a cool T-shirt design that links
Python and XML.  This is sparked by the T-shirt I got from filling out
IBM's XML survey some months ago, which says "<tag>, you're it!".  So,
does anyone have a suggestion for a Python/XML design?

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
Whatever women do they must do twice as well as men to be thought half as
good... luckily, it's not difficult.
    -- Charlotte Whitton


From sean@digitome.com  Thu Apr 15 10:53:34 1999
From: sean@digitome.com (Sean Mc Grath)
Date: Thu, 15 Apr 1999 10:53:34 +0100
Subject: [XML-SIG] PyXML 0.5.1final available, and T-shirts
In-Reply-To: <199904141642.MAA10932@amarok.cnri.reston.va.us>
Message-ID: <3.0.6.32.19990415105334.0098ea10@gpo.iol.ie>

[Andrew Kuchling]
>On an unrelated note, I'd like to get a cool T-shirt design that links
>Python and XML.  This is sparked by the T-shirt I got from filling out
>IBM's XML survey some months ago, which says "<tag>, you're it!".  So,
>does anyone have a suggestion for a Python/XML design?
>

How about:-

	"Algorithms + Data Structures = Programs" (Nicklaus Wirth)
	"Python + XML = Programs" (Andrew Kuchling)

Or how about:-

	"Python gives XML something to do"
(This is a reworking of Jon Bosaks famous remark that XML gives Java
something to do)

	
<Sean uri="http://www.digitome.com/sean.htm"/>


From akuchlin@cnri.reston.va.us  Thu Apr 15 16:50:29 1999
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Thu, 15 Apr 1999 11:50:29 -0400 (EDT)
Subject: [XML-SIG] T-shirts
In-Reply-To: <3.0.6.32.19990415105334.0098ea10@gpo.iol.ie>
References: <199904141642.MAA10932@amarok.cnri.reston.va.us>
 <3.0.6.32.19990415105334.0098ea10@gpo.iol.ie>
Message-ID: <14102.1720.394944.571290@amarok.cnri.reston.va.us>

Sean Mc Grath writes:
>	"Algorithms + Data Structures = Programs" (Nicklaus Wirth)
>	"Python + XML = Programs" (Andrew Kuchling)

	Problem: I never said that.  :) 

	That does spark a thought, though; poking through the
Python-quotes file, "PYTHON = (P)rogrammers (Y)earning (T)o
(H)omestead (O)ur (N)oosphere." from one of your old .sigs is pretty
good, though not XML-specific.  The two XML-related quotes from Paul
Prescod aren't really suitable and they're too long for T-shirts,
anyway.

	Here's an idea derived from the XML/SGML use of the word
"element".  (I vaguely recall someone proposing this at IPC7; anyone
remember who?)  The design looks like a corner of the periodic table
of the elements.  The columns, instead of being titled "Group III", "Group
VIII", etc. are labeled with various DTDs and standards; the elements
in each column are then various element names from that DTD.  In the
middle is a large 2x2 square with a big red "Py" in it; below it we
might put regular-sized boxes with "Jv", "Pl", "Tcl", in them.  
Something like:

XML	HTML			MathML	...
---	----	----
		+-----------+
Dtd	Em	|Py	    |	Cn
                |	    |
Wfc	H1	+           +	Fn 
                |	    |
Pi	Cite	+-----------+	Eq
	
...		Jv	Pl

		Tcl			

We can argue about *which* DTDs and element names later...  If the
design is 8 inches wide, then each column is 1.6 inches wide, which is
hopefully large enough to make it readable.  That's important; for
example, the IBM shirt isn't really readable because the design is
only about 10 cm across; those FSF shirts that include the whole
preamble to the GPL on the back suffer from the same illegibility.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
Well, there are these two people here, Sir. The man says he drank wine with
you somewhere called Babylon, and the lady... she's making little frogs.
    -- The receptionist, in SANDMAN #43: "Brief Lives:3"


From uche.ogbuji@fourthought.com  Sat Apr 17 15:25:42 1999
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Sat, 17 Apr 1999 08:25:42 -0600
Subject: [XML-SIG] DOM API
In-Reply-To: Your message of "Sat, 10 Apr 1999 23:19:56 CDT."
 <371022EC.2E0A1F6@prescod.net>
Message-ID: <199904171425.IAA03919@malatesta.local>

> Am I right that there is a semi-offical, portably implemented SAX API for
> Python but
> there is no such beast for the DOM? 

SAX is mostly portably implemented because of LMG's work on the drivers.

> * Is it reasonable to unify a subset of their interfaces?

This does make sense.  The first question would be philisohical: should such a 
unified interface stick closely to the W3C's IDL, or should it be more 
faithful to Python (i.e. returning PyLists instead of NodeList objects).  This 
is the main difference between the two Python DOM implementation.  We could 
build an adapter accordingly (most of the work is already don with 
DOM.Ext.NodeList2PyList, etc), but I'd like to hear people's opinions first.

> * Could 4XSL be written to use that interface so that it would work with
> both DOM implementations or do performance issues make that impossible?

If such an interface was agreed upon, it would make sense to write 4DOM 
accordingly.  I've already had to port LMG's xll module to 4DOM, and even 
though the differences are subtle, porting can still be a bit of a chore.  A 
standard interface would help.

-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org


From larsga@ifi.uio.no  Sat Apr 17 16:41:34 1999
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: 17 Apr 1999 17:41:34 +0200
Subject: [XML-SIG] SAX2: General issues
In-Reply-To: <199903280047.RAA09559@malatesta.local>
References: <199903280047.RAA09559@malatesta.local>
Message-ID: <wk4smf45ld.fsf@ifi.uio.no>

* Lars Marius Garshol
|
| The last question is, which package shold we place the new stuff in?
| xml.sax2? xml.sax?

* uche ogbuji
| 
| Well, I know that on xml-dev, there's a lot of talk about not
| stomping all over SAX 1.0, but IMO, once the drivers are ported,
| there are not likely to be a lot of people depending on SAX 1.0, and
| even for those who don't want to break things by changing, they can
| always just stick to the older XML packages.

I agree with this, and I also think that those who use a SAX 1.0
interface also can use SAX 2 with no modifications at all.  At least
we should try to make it so. (Except for the fixes, but those are
pretty marginal.)
 
| In other words, I think we should use 
| 
| xml.sax
| 
| even for SAX2.

Agreed.

--Lars M.


From larsga@ifi.uio.no  Sat Apr 17 16:44:51 1999
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: 17 Apr 1999 17:44:51 +0200
Subject: [XML-SIG] SAX2: Handler classes
In-Reply-To: <14094.27329.189328.339983@weyr.cnri.reston.va.us>
References: <wk3e29o75p.fsf@ifi.uio.no> <14094.27329.189328.339983@weyr.cnri.reston.va.us>
Message-ID: <wk3e1z45fw.fsf@ifi.uio.no>

* Lars Marius Garshol
|
|  http://xml.org/sax/handlers/lexical <LexicalHandler>
|    Receive callbacks for comments, CDATA sections, and (possibly)
|    entity references.

* Fred L. Drake
| 
| Undecided; there are times when I think it would be nice to have
| these things, especially when trying to make minimal edits.

Personally, I think we should have this, partly since it's needed for
full DOM support. Also, some applications will need this. And in any
case support for it will be optional.

--Lars M.


From larsga@ifi.uio.no  Sat Apr 17 16:54:13 1999
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: 17 Apr 1999 17:54:13 +0200
Subject: [XML-SIG] SAX2: Parser properties
In-Reply-To: <14094.27411.350840.911404@weyr.cnri.reston.va.us>
References: <wk4smpo775.fsf@ifi.uio.no> <14094.27411.350840.911404@weyr.cnri.reston.va.us>
Message-ID: <wk1zhj450a.fsf@ifi.uio.no>

* Lars Marius Garshol
|
|  http://python.org/sax/properties/data-encoding <String> (read/write)
|    Do we need a special SAXEncodingNotSupportedException for this?
|    Otherwise it may be impossible to tell whether the parser doesn't

* Fred L. Drake
| 
|   Yes; this needs to be available for reporting to the user.

I agree. I've added this to my draft now.

--Lars M.


From larsga@ifi.uio.no  Sat Apr 17 17:05:20 1999
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: 17 Apr 1999 18:05:20 +0200
Subject: [XML-SIG] SAX2: LexicalHandler
Message-ID: <wkzp472pxb.fsf@ifi.uio.no>

This handler is supposed to be used by applications that need
information about lexical details in the document such as comments and
entity boundaries. Most applications won't need this, but the DOM will
find it useful. Support for this handler will be optional.

This handler has the handerID http://xml.org/sax/handlers/lexical.

class LexicalHandler:

  def xmlDecl(self, version, encoding, standalone):
    """All three parameters are strings. encoding and standalone are not
    specified on the XML declaration, their values will be None."""

  def startDTD(self, root, publicID, systemID):
    """This event is reported when the DOCTYPE declaration is
    encountered. root is the name of the root element type, while the two last
    parameters are the public and system identifiers of the external
    DTD subset."""

  def endDTD(self):
    "This event is reported after the DTD has been parsed."

  def startEntity(self, name):
    """Reports the beginning of a new entity. If the entity is the
    external DTD subset the name will be '[dtd]'."""

  def endEntity(self, name):
    pass

  def startCDATA(self):
    pass

  def endCDATA(self):
    pass


From larsga@ifi.uio.no  Sat Apr 17 17:06:12 1999
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: 17 Apr 1999 18:06:12 +0200
Subject: [XML-SIG] SAX2: Attribute extensions
Message-ID: <wkyajr2pvv.fsf@ifi.uio.no>


This posting specifies two interfaces for information needed by the
DOM (and possibly also others) and also for full XML 1.0 conformance.
I'm not really sure whether we should actually use all of this, so
opinions are welcome.

class AttributeList2:

  def isSpecified(self,attr):
    """Returns true if the attribute was explicitly specified in the
    document and false otherwise. attr can be the attribute name or
    its index in the AttributeList."""

  def getEntityRefList(self,attr):
    """This returns the EntityRefList (see below) for an attribute,
    which can be specified by name or index."""


The class below is inteded to be used for discovering entity reference
boundaries inside attribute values. This is needed because the XML 1.0
recommendation requires parsers to report unexpanded entity references, 
also inside attribute values. Whether this is really
something we want is another matter.

class EntityRefList:

  def getLength(self):
    "Returns the number of entity references inside this attribute value."

  def getEntityName(self, ix):
    "Returns the name of entity reference number ix (zero-based index)."

  def getEntityRefStart(self, ix):
    """Returns the index of the first character inside the attribute
    value that stems from entity reference number ix."""

  def getEntityRefEnd(self, ix):
    "Returns the index of the last character in entity reference ix."


One redeeming feature of this interface is that it lives entirely
outside the attribute value, and so can be ignored entirely by those
who are not interested.


From stuart.hungerford@webone.com.au  Sun Apr 18 14:00:44 1999
From: stuart.hungerford@webone.com.au (Stuart Hungerford)
Date: Sun, 18 Apr 1999 23:00:44 +1000
Subject: [XML-SIG] Literate XML?
Message-ID: <000701be899b$79288f30$0301a8c0@restless.com>

Hi all,

Even though (as I keep repeating to myself) "a markup
language is not a programming language", I believe there
are issues that affect XML content creators as much as 
programmers.

Over time, a large body of folklore, rules, heuristics has
been developed for making programs "readable". This
covers issues like choice of names, layout, indenting,
comment styles and content etc. etc.

Can anyone tell me if there is a similar body of experience
for markup languages--particularly XML? 

I understand that a lot of XML may be automatically 
generated and processed, but for the rest of the time,
does anyone have any experiences on making XML
text readable?

<!-- I could do comments and indenting like this -->
<world-view>
    There is no spoon...
</world-view>

<!--
    Or like this (I realize white space is significant here)
-->
<world-view>It's the smell...</world-view>

<!--
    To say nothing of attributes, entities and laying out a DTD
    Any advice?
-->


From stuart.hungerford@webone.com.au  Sun Apr 18 14:05:20 1999
From: stuart.hungerford@webone.com.au (Stuart Hungerford)
Date: Sun, 18 Apr 1999 23:05:20 +1000
Subject: [XML-SIG] Looking for namespace examples...
Message-ID: <000d01be899c$1d810210$0301a8c0@restless.com>

Two messages in one day!  It must be a full moon
or something.

This one is a bit more prosaic: I'm looking for some
realistic examples of DTD's and XML documents that
make use of namespaces.

My understanding is that a validating parser will not 
treat namespace prefixes as "special" in any way in 
a DTD. I've seen short examples where the xmlns:foo 
attribute is defined as a FIXED attribute (in David
Megginson's "19 questions" document), and now
I'm a bit confused.

Can anyone point me to some good learning
examples?

Thanks,

Stu


From jday@picard.csihq.com  Sun Apr 18 15:17:15 1999
From: jday@picard.csihq.com (John Day)
Date: Sun, 18 Apr 1999 10:17:15 -0400
Subject: [XML-SIG] Literate XML?
In-Reply-To: <000701be899b$79288f30$0301a8c0@restless.com>
Message-ID: <3.0.6.32.19990418101715.01340d90@mail.csihq.com>

A similar issue could be made for RTF (or any other text-based,
"human readable" encoding including, I guess, assembly language
mnemonics for computer programming). Nobody (except me maybe) writes
with RTF tags. (I needed to learn it in order to write a parser for it).

I wrote a pretty-printer for the RTF so I could read it better, but RTF is
sensitive to  inserted newlines etc and the output was ruined except
for viewing. 

I think most people who use XML (like the millions of people who use RTF) 
will never see the XML tags in their raw format. Some nice
friendly, bi-directional authoring tool (word processor) will allow
us to "see what we got" and make changes to it. Tags will become little
icons that you drag out of various DTD objects which we have OPENed or
we can do a NEW DTD and set various properties. It all get converted
to more or less usable (note I didn't say flawless) XML or whatever.
The underlying semantics will be preserved and made understandable.

It will all boil down to how much trust we can place in such tools. Most
of us didn't trust compilers a decade or so ago and wrote all of our
'critical' code in assembler. How many of us still code in assembler?
We have learned to trust the compilers. Though they  probably are not 
absolutely flawless, on average they're better than most of us.

But to answer your question, write XML just like you would write your
favorite HLL code: balanced indents, lots of white space and breaks to catch
the eye. You might want to write a pretty printer, so you can read anybody's
code without having to rewrite it yourself. (Maybe someone has already written
a Python pretty printer).

-jday

At 11:00 PM 4/18/99 +1000, you wrote:
>Hi all,
>
>Even though (as I keep repeating to myself) "a markup
>language is not a programming language", I believe there
>are issues that affect XML content creators as much as 
>programmers.
>
>Over time, a large body of folklore, rules, heuristics has
>been developed for making programs "readable". This
>covers issues like choice of names, layout, indenting,
>comment styles and content etc. etc.
>
>Can anyone tell me if there is a similar body of experience
>for markup languages--particularly XML? 
>
>I understand that a lot of XML may be automatically 
>generated and processed, but for the rest of the time,
>does anyone have any experiences on making XML
>text readable?
>
><!-- I could do comments and indenting like this -->
><world-view>
>    There is no spoon...
></world-view>
>
><!--
>    Or like this (I realize white space is significant here)
>-->
><world-view>It's the smell...</world-view>
>
><!--
>    To say nothing of attributes, entities and laying out a DTD
>    Any advice?
>-->
>
>
>
>
>_______________________________________________
>XML-SIG maillist  -  XML-SIG@python.org
>http://www.python.org/mailman/listinfo/xml-sig
>
>
>


From kevin_ng@xoommail.com  Mon Apr 19 07:45:10 1999
From: kevin_ng@xoommail.com (Kevin Ng)
Date: Sun, 18 Apr 1999 23:45:10 -0700
Subject: [XML-SIG] bug report(+fix) : Python/XML release 0.5.1
Message-ID: <199904190645.XAA09897@www2.xoommail.com>

One of the demos supplied, xml-0.5.1/demo/quotes/qtfmt.py, the line

p=saxexts.XMLParserFactory.make_parser("pyexpat")

raises an exception as saxexts.py cannot import the required module, 
I fixed it by changing the above line to :

p=saxexts.XMLParserFactory.make_parser("xml.sax.drivers.drv_pyexpat")

and the demo works ok.

Rgds
Kevin


I use Linux at home.

______________________________________________________
Get your free web-based email at http://www.xoom.com
Birthday? Anniversary? Send FREE animated greeting
cards for any occassion at http://greetings.xoom.com


From gstein@lyra.org  Mon Apr 19 07:51:43 1999
From: gstein@lyra.org (Greg Stein)
Date: Sun, 18 Apr 1999 23:51:43 -0700
Subject: [XML-SIG] DOM API
References: <199904171425.IAA03919@malatesta.local>
Message-ID: <371AD27F.7E0334A0@lyra.org>

uche.ogbuji@fourthought.com wrote:
>...
> > * Is it reasonable to unify a subset of their interfaces?
> 
> This does make sense.  The first question would be philisohical: should such a
> unified interface stick closely to the W3C's IDL, or should it be more
> faithful to Python (i.e. returning PyLists instead of NodeList objects).  This
> is the main difference between the two Python DOM implementation.  We could
> build an adapter accordingly (most of the work is already don with
> DOM.Ext.NodeList2PyList, etc), but I'd like to hear people's opinions first.

Speaking of DOM implementations, I had posted a couple weeks ago about
including my qp_xml.py module in the XML distribution. It effectively
presents another DOM for Python users to consume XML input (it does NOT
handle output, tho).

Didn't hear back on that, though... does anybody have any feelings one
way or another about including the module? I think it is quite nice for
lightweight XML parsing. I haven't ever found a need for the W3C DOM
(since I simply need a Python representation of the input, and all
output is via "print"), so I'm presuming others will find this useful.

Thoughts?

thx
-g

--
Greg Stein, http://www.lyra.org/


From larsga@ifi.uio.no  Mon Apr 19 08:40:54 1999
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: 19 Apr 1999 09:40:54 +0200
Subject: [XML-SIG] DOM API
In-Reply-To: <371AD27F.7E0334A0@lyra.org>
References: <199904171425.IAA03919@malatesta.local> <371AD27F.7E0334A0@lyra.org>
Message-ID: <wkso9xxdkp.fsf@ifi.uio.no>

* Greg Stein
| 
| Didn't hear back on that, though... does anybody have any feelings
| one way or another about including the module? 

I think it makes sense to have something a bit more lightweight and
easier to use than the DOM. However, why not build it on top of SAX
instead of pyexpat? No reason to restrict ourselves to just one
parser, is there?

--Lars M.


From gstein@lyra.org  Mon Apr 19 08:43:19 1999
From: gstein@lyra.org (Greg Stein)
Date: Mon, 19 Apr 1999 00:43:19 -0700
Subject: [XML-SIG] Looking for namespace examples...
References: <000d01be899c$1d810210$0301a8c0@restless.com>
Message-ID: <371ADE97.71B6ECD5@lyra.org>

Stuart Hungerford wrote:
>...
> This one is a bit more prosaic: I'm looking for some
> realistic examples of DTD's and XML documents that
> make use of namespaces.

Not very realistic, but it provides numerous examples: the XML
Namespaces specification.

   http://www.w3.org/TR/REC-xml-names/

A realistic application of namespaces can be seen in the WebDAV
specification:

   ftp://ftp.isi.edu/in-notes/rfc2518.txt

> My understanding is that a validating parser will not
> treat namespace prefixes as "special" in any way in
> a DTD.

If a validating parser does not understand namespaces, then it will not
be able to validate an XML document that uses them. For example, it sees
"<foo:bar/>" and "<bar/>" as different elements, and no fudging of the
"foo" prefix will fix that. The only workaround is to use a default
namespace for the WHOLE document so that the DTD refers to <bar/> and
the document uses <bar/> (and the element in the doc falls into the
appropriate namespace via an xmlns="..." attribute). Note that this
implies only one namespace per document.

> I've seen short examples where the xmlns:foo
> attribute is defined as a FIXED attribute (in David
> Megginson's "19 questions" document), and now
> I'm a bit confused.

I'm not familiar with DTD terminology, so I don't know what FIXED is
attempting to state.

> Can anyone point me to some good learning
> examples?

Hopefully the two docs above will provide ample information. I posted a
module to this list a couple weeks ago which will properly and quickly
parse XML documents with namespaces (I don't believe the xml package has
a parser/DOM capable of doing so, although it appears Python's xmllib.py
can (albeit slowly)). My module is available at:

  http://www.lyra.org/greg/python/qp_xml.py

Note that it is based on top of pyexpat, so you'll need that module,
too.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From larsga@ifi.uio.no  Mon Apr 19 09:13:04 1999
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: 19 Apr 1999 10:13:04 +0200
Subject: [XML-SIG] Looking for namespace examples...
In-Reply-To: <371ADE97.71B6ECD5@lyra.org>
References: <000d01be899c$1d810210$0301a8c0@restless.com> <371ADE97.71B6ECD5@lyra.org>
Message-ID: <wkr9phxc33.fsf@ifi.uio.no>

* Greg Stein
| 
| [ns & validation] The only workaround is to use a default namespace
| for the WHOLE document so that the DTD refers to <bar/> and the
| document uses <bar/> (and the element in the doc falls into the
| appropriate namespace via an xmlns="..." attribute). Note that this
| implies only one namespace per document.

You can also use FIXED attribute declarations and, by implication,
always use the same prefix for the same namespace. This essentially
leaves you with the first Namespace WD, although in a different syntax.
 
* Stuart Hungerford
|
| I've seen short examples where the xmlns:foo attribute is defined as
| a FIXED attribute (in David Megginson's "19 questions" document),
| and now I'm a bit confused.
 
* Greg Stein
|
| I'm not familiar with DTD terminology, so I don't know what FIXED is
| attempting to state.

That the element will always have the attribute with the specified
value, whether the user bothered to explicitly add it in the document
or not.
 
| Hopefully the two docs above will provide ample information. I posted a
| module to this list a couple weeks ago which will properly and quickly
| parse XML documents with namespaces (I don't believe the xml package has
| a parser/DOM capable of doing so, although it appears Python's xmllib.py
| can (albeit slowly)). 

Both xmllib and xmlproc can handle namespaces. In xmlproc this
requires you to use an extra module, which comes with the parser.

--Lars M.


From gstein@lyra.org  Mon Apr 19 09:14:39 1999
From: gstein@lyra.org (Greg Stein)
Date: Mon, 19 Apr 1999 01:14:39 -0700
Subject: [XML-SIG] DOM API
References: <199904171425.IAA03919@malatesta.local> <371AD27F.7E0334A0@lyra.org> <wkso9xxdkp.fsf@ifi.uio.no>
Message-ID: <371AE5EF.13BA1B8C@lyra.org>

Lars Marius Garshol wrote:
> 
> * Greg Stein
> |
> | Didn't hear back on that, though... does anybody have any feelings
> | one way or another about including the module?
> 
> I think it makes sense to have something a bit more lightweight and
> easier to use than the DOM. However, why not build it on top of SAX
> instead of pyexpat? No reason to restrict ourselves to just one
> parser, is there?

No particular reason, although it will be somewhat slower if based on
SAX. I see in drv_pyexpat.py that the startElement handler does a good
bit of work before getting to the "real" start handler. It would be nice
to skip that :-)  (honestly, though, I don't know what kind of overhead
it creates).

It might be nice to switch it to SAX and bench the pure pyexpat version
against the SAX version.

I do agree that SAX-based would be the Right Thing, but I'm also willing
to trade that for speed since people can always use the DOM if they need
to use a different, underlying parser (such as xmlproc).

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From larsga@ifi.uio.no  Mon Apr 19 09:32:49 1999
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: 19 Apr 1999 10:32:49 +0200
Subject: [XML-SIG] DOM API
In-Reply-To: <371AE5EF.13BA1B8C@lyra.org>
References: <199904171425.IAA03919@malatesta.local> <371AD27F.7E0334A0@lyra.org> <wkso9xxdkp.fsf@ifi.uio.no> <371AE5EF.13BA1B8C@lyra.org>
Message-ID: <wkpv51xb66.fsf@ifi.uio.no>

* Lars Marius Garshol
|
| I think it makes sense to have something a bit more lightweight and
| easier to use than the DOM. However, why not build it on top of SAX
| instead of pyexpat? No reason to restrict ourselves to just one
| parser, is there?

* Greg Stein
| 
| No particular reason, although it will be somewhat slower if based
| on SAX. 

It will, so maybe we should consider making two builders?

| I see in drv_pyexpat.py that the startElement handler does a good
| bit of work before getting to the "real" start handler. It would be
| nice to skip that :-) (honestly, though, I don't know what kind of
| overhead it creates).

If you have a lot of attributes I guess it will be slow, but I think
applications using your qp_xml will essentially have to redo that work
(and quite possibly in a less efficient manner), since they can't just
do a simple lookup to get the attribute values.

So your qp_xml would be nicer if it had a hash of attributes instead
of a list, and applications based on it would very likely be faster.

Also, I think it might make sense to modify pyexpat to create a hash
in the PyAPI wrapping instead of a list as it does now. That would
most likely be both the fastest and the nicest solution.
 
| It might be nice to switch it to SAX and bench the pure pyexpat
| version against the SAX version.

Feel free. I don't have the time, I'm afraid.
 
| I do agree that SAX-based would be the Right Thing, but I'm also
| willing to trade that for speed since people can always use the DOM
| if they need to use a different, underlying parser (such as
| xmlproc).

Or sgmlop, or htmllib, or sgmllib. Or, when I get round to it, SP or
Java parsers under JPython. Maybe also RXP.

--Lars M.


From fredrik@pythonware.com  Mon Apr 19 09:58:46 1999
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Mon, 19 Apr 1999 10:58:46 +0200
Subject: [XML-SIG] DOM API
References: <199904171425.IAA03919@malatesta.local> <371AD27F.7E0334A0@lyra.org> <wkso9xxdkp.fsf@ifi.uio.no> <371AE5EF.13BA1B8C@lyra.org>
Message-ID: <01f101be8a42$d665e830$f29b12c2@pythonware.com>

Greg wrote:
> No particular reason, although it will be somewhat slower if based on
> SAX. I see in drv_pyexpat.py that the startElement handler does a good
> bit of work before getting to the "real" start handler. It would be nice
> to skip that :-)  (honestly, though, I don't know what kind of overhead
> it creates).
> 
> It might be nice to switch it to SAX and bench the pure pyexpat version
> against the SAX version.
> 
> I do agree that SAX-based would be the Right Thing, but I'm also willing
> to trade that for speed since people can always use the DOM if they need
> to use a different, underlying parser (such as xmlproc).

one could imagine that once we've settled on an API,
there could be different implementations of the tree
builder...

perhaps the "qp API" could be turned into a "standard
python light-weight dom-like interface"?  and to get that
process started, maybe you could post an interface
summary?

Cheers /F
fredrik@pythonware.com
http://www.pythonware.com


From gstein@lyra.org  Mon Apr 19 09:47:39 1999
From: gstein@lyra.org (Greg Stein)
Date: Mon, 19 Apr 1999 01:47:39 -0700
Subject: [XML-SIG] DOM API
References: <199904171425.IAA03919@malatesta.local> <371AD27F.7E0334A0@lyra.org> <wkso9xxdkp.fsf@ifi.uio.no> <371AE5EF.13BA1B8C@lyra.org> <wkpv51xb66.fsf@ifi.uio.no>
Message-ID: <371AEDAB.25CE6C7C@lyra.org>

Lars Marius Garshol wrote:
>...
> * Greg Stein
> |
> | No particular reason, although it will be somewhat slower if based
> | on SAX.
> 
> It will, so maybe we should consider making two builders?

I have no motivation to do so :-), but will certainly accept the changes
from somebody who is.

> | I see in drv_pyexpat.py that the startElement handler does a good
> | bit of work before getting to the "real" start handler. It would be
> | nice to skip that :-) (honestly, though, I don't know what kind of
> | overhead it creates).
> 
> If you have a lot of attributes I guess it will be slow, but I think
> applications using your qp_xml will essentially have to redo that work
> (and quite possibly in a less efficient manner), since they can't just
> do a simple lookup to get the attribute values.
> 
> So your qp_xml would be nicer if it had a hash of attributes instead
> of a list, and applications based on it would very likely be faster.

Yes, I thought of this one, but looking at the code, I see that I
haven't actually done that yet. heh. I'm out for about two weeks, but
will change this when I return.

I intend to do { (URI, name) : value }.

> Also, I think it might make sense to modify pyexpat to create a hash
> in the PyAPI wrapping instead of a list as it does now. That would
> most likely be both the fastest and the nicest solution.

Yup. Should ask Jack about his intentions here. Keep it close to Expat,
or provide a little more Python-ish version. There is also the
backwards-compat issue :-)

> | It might be nice to switch it to SAX and bench the pure pyexpat
> | version against the SAX version.
> 
> Feel free. I don't have the time, I'm afraid.

Not me. As I said... I'm not motivated to do so :-).

I believe that multi-parser support is handled by DOM. If you want quick
and light-weight, then use qp_xml and pyexpat.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From gstein@lyra.org  Mon Apr 19 10:28:29 1999
From: gstein@lyra.org (Greg Stein)
Date: Mon, 19 Apr 1999 02:28:29 -0700
Subject: [XML-SIG] qp_xml API (was: DOM API)
References: <199904171425.IAA03919@malatesta.local> <371AD27F.7E0334A0@lyra.org> <wkso9xxdkp.fsf@ifi.uio.no> <371AE5EF.13BA1B8C@lyra.org> <01f101be8a42$d665e830$f29b12c2@pythonware.com>
Message-ID: <371AF73D.52254043@lyra.org>

Fredrik Lundh wrote:
> one could imagine that once we've settled on an API,
> there could be different implementations of the tree
> builder...

Seems reasonable.

> perhaps the "qp API" could be turned into a "standard
> python light-weight dom-like interface"?  and to get that
> process started, maybe you could post an interface
> summary?

All right. Below is the summary. This is also the first opportunity for
public review, so I will welcome any suggestions for change.

qp_xml.error: a string for exceptions. [ed. this "should" become a
class]

qp_xml.Parser: the parser class. Typical use is: instantiate and call
the parse() method. The class is not thread-safe, but one-per-thread is
fine.

Parser.parse(input): input may be a string or an object supporting the
"read" method (e.g. a file or httplib.HTTPResponse (from my new httplib
module)). The input must represent a complete XML document. It will be
fully parsed and a lightweight representation will be returned. This
method may be called any number of times (for multiple documents). The
returned object is an instance of qp_xml._element.

_element.name: element ("tag") name

_element.ns: a Python string. The namespace URI this element's name
belongs to, or the empty string for "no namespace".

_element.lang: the xml:lang value that applies to this element's
attributes and content. It is inherited from the parent, pulled from
this element's attributes, or is None if no xml:lang is in scope.

_element.children: a Python list of the child elements, in order

_element.attrs: ### currently a list of objects representing attributes,
each object containing ns, name, value attributes. this will change to a
mapping of { (URI, name) : value }. ###

_element.first_cdata: a Python string which contains the element's
contents that are between the start tag and the first child element (if
present, otherwise the contents between the start/end tags). This will
be the empty string in both cases: <foo/> and <foo></foo>.

_element.following_cdata: a Python string containing the PARENT
element's content which follows this element's end tag (up to the next
child element of the parent, or the parent's end tag).

qp_xml.dump(f, element): uses f.write() to dump the element as XML.
Namespaces and xml:lang values will be inserted. Automatic selection of
namespace prefixes will be used as appropriate.

qp_xml.textof(element): return this element's contents
(non-recursively).


The *_cdata fields are reasonably "interesting" ... Here is a sample of
a few elements and how the cdata fields are filled in:

<elem1>
  elem1.first_cdata contents
  <elem2>
    elem2.first_cdata contents
  </elem2>
  elem2.following_cdata contents
  <elem3/>
  elem3.following_cdata contents
</elem1>

The textof(elem1) function will return elem1.first_cdata +
elem2.following_cdata + elem3.following_cdata.

The *_cdata fields preserve whitespace.


Commentary:

Note that clients only need to import qp_xml, instantiate
qp_xml.Parser(), and call parse() (which returns an object). They only
deal with one object type in the return value (qp_xml._element), and
they directly access the fields in it. The object defines no methods.

Most clients will use .name, .attrs, and .children. qp_xml.textof(elem)
will return the element's text contents. Certain clients may use .ns to
test if the element is in the namespace they are looking for; a few
clients will use .lang to interpret attribute values and element
contents.


Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From paul@prescod.net  Mon Apr 19 18:17:19 1999
From: paul@prescod.net (Paul Prescod)
Date: Mon, 19 Apr 1999 12:17:19 -0500
Subject: [XML-SIG] DOM API
References: <199904171425.IAA03919@malatesta.local>
Message-ID: <371B651E.A1771FEB@prescod.net>

uche.ogbuji@fourthought.com wrote:
> 
> This does make sense.  The first question would be philisohical: should such a
> unified interface stick closely to the W3C's IDL, or should it be more
> faithful to Python (i.e. returning PyLists instead of NodeList objects).  This
> is the main difference between the two Python DOM implementation.  We could
> build an adapter accordingly (most of the work is already don with
> DOM.Ext.NodeList2PyList, etc), but I'd like to hear people's opinions first.

Do we really have to choose? If, for example, a NodeList object can act as
a Python sequence then don't we have the best of both worlds? I mean if
you really need a PyList then you can use "map" to generate one.

I would like to think that Python is sufficiently flexible that most of
these choices could be made in a DOM compatible AND Python compatible way.

The downside of doing both is that a Java or C++ implementation of a "raw"
DOM accessed over CORBA or COM would not be compatible -- but we could
write Python wrappers that would make them so.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

"The Excursion [Sport Utility Vehicle] is so large that it will come
equipped with adjustable pedals to fit smaller drivers and sensor 
devices that warn the driver when he or she is about to back into a
Toyota or some other object." -- Dallas Morning News


From Tim Lavoie <tim.lavoie@beyondtv.net>  Mon Apr 19 18:46:24 1999
From: Tim Lavoie <tim.lavoie@beyondtv.net> (Tim Lavoie)
Date: Mon, 19 Apr 1999 12:46:24 -0500
Subject: [XML-SIG] XBEL questions
Message-ID: <19990419124624.A15472@beyondtv.net>

I've just started tinkering with the XBEL package and its sample
scripts, converting from Netscape (4.5) to XBEL format. The script
needed tags converted to upper-case to recognize what Communicator had 
written, no big deal. What did puzzle me was the output; the DTD lists 
tags in lower case, but the bookmark.py script generated everything in 
upper case. Since XML is case-sensitive, isn't this wrong?

The other thing I noticed is that the output gags the xmlwf test
program which accompanies James Clark's expat parser. The offending
line contains a URL with multiple CGI parameters, with the error
message pointing to the second "=" character. This character follows
the second parameter name, which as in all HTML is preceded by a "&"
character. Could the problem be that tag contents need to be encoded
first? The tag looks like:

   <URL>http://foo.domain/cgi/some.cgi?Appl=param1&Section=param2</URL>

	Cheers,
	Tim


From larsga@ifi.uio.no  Mon Apr 19 20:42:05 1999
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: Mon, 19 Apr 1999 21:42:05 +0200 (MET DST)
Subject: [XML-SIG] xmlproc: Version 0.61 released\!
Message-ID: <199904191942.VAA08945@ifi.uio.no>

Changes since version 0.60:

  - the parser is now even faster, especially when validating. The
  parser should now be several times faster for very large DTDs.
  - various minor bug fixes, plus an embarrasing one in xvcmd.py
  - some API extensions:
    - catalog.CatalogParser now accepts an error language parameter
    - catalog.xmlproc_catalog now accepts an optional error handler
      parameter
    - added a utils module with a ready-made error handler that prints
      error messages to a file-like object
    - added a new method get_valid_elements to xmldtd.ElementType, so
      that it's now possible to find out which elements are allowed in
      a given state (or point) in the content model of an element


This version is mainly released to fix the bug in xvcmd.py, which was
too glaring to be overlooked.

Experiments with DTD caching have been performed and it has turned out
to be feasible, but surprisingly subtle. The speed benefits also seem
to be disappointingly small. If anyone really wants this feature, let
me know and I'll implement it.

--Lars M.


From paul@prescod.net  Mon Apr 19 19:51:46 1999
From: paul@prescod.net (Paul Prescod)
Date: Mon, 19 Apr 1999 13:51:46 -0500
Subject: [XML-SIG] DOM API
References: <199904171425.IAA03919@malatesta.local> <371AD27F.7E0334A0@lyra.org> <wkso9xxdkp.fsf@ifi.uio.no> <371AE5EF.13BA1B8C@lyra.org> <01f101be8a42$d665e830$f29b12c2@pythonware.com>
Message-ID: <371B7B42.64E93DE4@prescod.net>

Fredrik Lundh wrote:
> 
> perhaps the "qp API" could be turned into a "standard
> python light-weight dom-like interface"?  and to get that
> process started, maybe you could post an interface
> summary?

I'm going to propose instead a light-weight DOM subset. I would rather not
require PyXML users to memorize two different APIs depending on whether
they doing light-weight work or heavy-weight work. Apart from my decision
to suggest a DOM subset, I have made my subset a little more functional in
some places and a little less in others. My bias is to expose *more* of
the underlying XML structure (processing instructions, attributes) and
relegate handling for lang and namespace to the more complex APIs (or
extensions to this API).

--

error (like qp_xml.error)
Parser (like qp_xml.Parser)

Parser.parse(input) (like qp_xml.parse but returns a document object)

Node.ChildNodes (a sequence of nodes property)
Node.NodeType (an integer a la DOM property)

Document.DocumentElement (an element node property)

Element.Attributes (a map of names to attribute objects property)
Element.GetAttribute (returns an attribute's value)
Element.TagName 
Element.PreviousSibling 
Element.NextSibing 

CharacterData.Data (a PyString property)

Attribute.Name
Attribute.Value

ProcessingInstruction.Target (string property)
ProcessingInstruction.Data (string property)

--

Note that I use the words "sequence" and "map" in their Python sense
above. Either a PyList or a NodeList Object could both be a sequence.
Either a PyDict or a NamedNodeList Object could be a map.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

"The Excursion [Sport Utility Vehicle] is so large that it will come
equipped with adjustable pedals to fit smaller drivers and sensor 
devices that warn the driver when he or she is about to back into a
Toyota or some other object." -- Dallas Morning News


From Jeffrey Chang <jefftc@leland.Stanford.EDU>  Tue Apr 20 00:50:55 1999
From: Jeffrey Chang <jefftc@leland.Stanford.EDU> (Jeffrey Chang)
Date: Mon, 19 Apr 1999 16:50:55 -0700 (PDT)
Subject: [XML-SIG] ElementType.content_model interpretation of '*'
Message-ID: <Pine.GSO.3.96.990419145524.1803A-100000@saga3.Stanford.EDU>

I am using xmlproc.dtdparser.DTDParser and xmlproc.xmldtd.CompleteDTD to
parse and store the contents of a DTD file (xmlproc v0.60).  I have a
question about the the interpretation of the contents within a DTD
element. 

I load a DTD definition into a variable 'd'.  The definition contains an
element:
<!ELEMENT test (a,b*)>

Then, when I look at the content model of test:
>>> d.elems['test'].content_model
{                            # I've reformatted this for readability
'start': 1L, 
     1L: [(6L, 'a')],
     4L: [(4L, 'b')], 
     6L: [(4L, 'b')], 
'final': 4L 
}

According to this content model, 'test' must contain 1 'a' and at least 1
'b' before reaching the final state.  I believe the 'b' should be
optional, and would have expected a content model more like:
'start': 1L, 
     1L: [(4L, 'a')],
     4L: [(4L, 'b')],
'final': 4L 


I also tested this with the following element:
<!ELEMENT test (a,b+)>

In this case, I get a content model that looks reasonable:
{
'start': 1L
     1L: [(2L, 'a')], 
     2L: [(4L, 'b')], 
     4L: [(4L, 'b')], 
'final': 4L, 
}


Please let me know if my interpretation of the XML specs, or the
content_model data structure is incorrect.

BTW, Lars, thanks very much for xmlproc!  It is much-needed tool. 

Jeff


From gstein@lyra.org  Tue Apr 20 08:51:35 1999
From: gstein@lyra.org (Greg Stein)
Date: Tue, 20 Apr 1999 00:51:35 -0700 (PDT)
Subject: [XML-SIG] DOM API
In-Reply-To: <371B7B42.64E93DE4@prescod.net>
Message-ID: <Pine.LNX.3.95.990420003128.32319A-100000@ns1.lyra.org>

On Mon, 19 Apr 1999, Paul Prescod wrote:
> I'm going to propose instead a light-weight DOM subset. I would rather not
> require PyXML users to memorize two different APIs depending on whether
> they doing light-weight work or heavy-weight work. Apart from my decision
> to suggest a DOM subset, I have made my subset a little more functional in
> some places and a little less in others. My bias is to expose *more* of
> the underlying XML structure (processing instructions, attributes) and
> relegate handling for lang and namespace to the more complex APIs (or
> extensions to this API).

euh... I can definitely state that in the applications that I've been
working with, that PIs are bogus, but namespaces are absolutely required.
(that's how my code came to be!)

A general comment about your "subset" -- it is still heavyweight! Details
below...

> Parser.parse(input) (like qp_xml.parse but returns a document object)

How is a "document" different in your mind, than an element that happens
to be the root of a tree? I don't understand from your post. IMO, if you
wnat simple, then just give the user a tree... that's all the dumb XML is
anyhow.

> Node.ChildNodes (a sequence of nodes property)
> Node.NodeType (an integer a la DOM property)

NodeType is bogus. It should be absolutely obvious from the context what a
Node is. If you have so many objects in your system that you need NodeType
to distinguish them, then you are certainly not a light-weight solution.

> Document.DocumentElement (an element node property)

If Document has no other properties, then it is totally bogus. Just return
the root Element. Why the hell return an object with a single property
that refers to another object? Just return that object!

> Element.Attributes (a map of names to attribute objects property)
> Element.GetAttribute (returns an attribute's value)

If you want light-weight, then GetAttribute is bogus given that the same
concept is easily handled via the .Attributes value. Why introduce a
method to simply do Element.Attributes.get(foo) ??

> Element.TagName 
> Element.PreviousSibling 
> Element.NextSibing 

These Sibling things mean one of two things:

1) you have introduced loops in your data structure
2) you have introduced the requirement for the proxy crap that the current
DOM is dealing with (the Node vs _nodeData thing).

(1) is mildly unacceptable in a light-weight solution (you don't want
people to do a quick parse of data, and then require them to follow it up
with .close()). (2) throws the whole notion of "light" out the window. You
no longer have a simple, direct model of the parsed XML data.

> CharacterData.Data (a PyString property)

How do you get one of these objects? As soon as you say that an
Element.ChildNodes can return one of these, then you have complicated the
model. To keeps things simple, .ChildNodes should return objects of the
*same* type. Otherwise, all the clients are going to need to test the
contents. Clients will also have a hard time finding the right data.

Case in point: I wrote a first draft davlib.py against the DOM. Damn it
was a serious bitch to simply extract the CDATA contents of an element!
Moreover, it was also a total bitch to simply say "give me the child
elements". Of course, that didn't work since the DOM insisted on returning
a list of a mix of CDATA and elements.

The whole notion of mixing "node types" in a list is completely bogus if
you want direct simplicity in a model. It is one of my biggest problems
with the DOM thing. Some yahoos over in the XML DOM world want all this
nifty OO crap, yet they have built something that is hardly usable in a
practical application. Ergo, we have all kinds of filters and walking
solutions just to deal with mapping the complicated DOM structure into
something that is even marginally useful.

IMO, the XML DOM model is a neat theoretical expression of OO modelling of
an XML document. For all practical purposes, it is nearly useless. (again:
IMO) ... I mean hey: does anybody actually use the DOM to *generate* XML?
Screw that -- I use "print". I can't imagine generating XML using the DOM.
Complicated and processing intensive.

Sorry to go off here, but the DOM really bugs me. I think it is actually a
net-negative for the XML community to deal with the beast. I would love to
be educated on the positive benefits for expressing an XML document thru
the DOM model.

> Attribute.Name
> Attribute.Value

Use a mapping. Toss the intermediate object. If you just have name and
value, then you don't need separate objects. Present the attributes as a
mapping.

> ProcessingInstruction.Target (string property)
> ProcessingInstruction.Data (string property)

I have yet to see a specification related to XML that depends on PIs.
Until that happens, then I don't see how these are relevant.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From fredrik@pythonware.com  Tue Apr 20 10:40:32 1999
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Tue, 20 Apr 1999 11:40:32 +0200
Subject: [XML-SIG] DOM API
References: <Pine.LNX.3.95.990420003128.32319A-100000@ns1.lyra.org>
Message-ID: <008b01be8b11$e19fe550$f29b12c2@pythonware.com>

Greg wrote:
> On Mon, 19 Apr 1999, Paul Prescod wrote:
> > I'm going to propose instead a light-weight DOM subset. I would rather not
> > require PyXML users to memorize two different APIs depending on whether
> > they doing light-weight work or heavy-weight work.

the downside with Paul's line of reasoning is that it makes it
impossible to come up with something that is light-weight
also from the CPU's perspective...  not good.

> euh... I can definitely state that in the applications that I've been
> working with, that PIs are bogus, but namespaces are absolutely required.
> (that's how my code came to be!)

as far as I can tell, *all* upcoming XML standards use namespaces.
for a layman like me, they're pretty much part of the standard, so
having them in the core API is a good thing...

...

> Case in point: I wrote a first draft davlib.py against the DOM. Damn it
> was a serious bitch to simply extract the CDATA contents of an element!
> Moreover, it was also a total bitch to simply say "give me the child
> elements". Of course, that didn't work since the DOM insisted on returning
> a list of a mix of CDATA and elements.
> 
> The whole notion of mixing "node types" in a list is completely bogus if
> you want direct simplicity in a model.

well, our internal coreXML system returns a list consisting of Element
and and plain old strings (for CDATA).  the Element class has helpers
to deal with elements that contain only strings, and elements that
contain only child elements.  most code use these helpers, and auto-
matically flags "bad" XML documents.

I'm not yet convinced that your solution is easier to use -- but I might
change my mind...  just give me some time to think about it.

> It is one of my biggest problems with the DOM thing. Some yahoos
> over in the XML DOM world want all this nifty OO crap, yet they
> have built something that is hardly usable in a practical application.

> IMO, the XML DOM model is a neat theoretical expression of OO
> modelling of an XML document. For all practical purposes, it is
> nearly useless.

Am I the only one who think this year's W3C april's fool joke
was really scary...

> I mean hey: does anybody actually use the DOM to *generate* XML?
> Screw that -- I use "print". I can't imagine generating XML using the DOM.
> Complicated and processing intensive.

...

as an aside, here's an excerpt from Garnet, using our light-weight
XML builder...  root is a parent element, package is an "archive
handler" that takes care of "external entities" (if XML had been
designed by real programmers, it would have supported binary
data from the start ;-)

    def dump(self, root, package=None):
        stack = root.addelement("stack")
        if self.pcs:
            stack.addelement("pcs", self.pcs.tag)
        for i in self.stack:
            item = stack.addelement("item")
            title = i.gettitle()
            if title:
                item.addelement("title", title)
            extent = string.join(map(str, i.getextent()))
            item.addelement("extent", extent)
            i.dump(item, package)

doing this with print statements is quite a bit more error
prone.  this model is also interface-driven -- there's nothing
in here that deals directly with the file format.

...

I want something really light-weight, and highly pythonish, and I
don't care the slightest about TLA compatibility.  the "qp" API is
pretty close to what I want, but I think I can make it even simpler.
more on that later.

Cheers /F


From Fred L. Drake, Jr." <fdrake@acm.org  Tue Apr 20 13:58:10 1999
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Tue, 20 Apr 1999 08:58:10 -0400 (EDT)
Subject: [XML-SIG] DOM API
In-Reply-To: <Pine.LNX.3.95.990420003128.32319A-100000@ns1.lyra.org>
References: <371B7B42.64E93DE4@prescod.net>
 <Pine.LNX.3.95.990420003128.32319A-100000@ns1.lyra.org>
Message-ID: <14108.31202.340223.456144@weyr.cnri.reston.va.us>

Greg Stein writes:
 > an XML document. For all practical purposes, it is nearly useless. (again:
 > IMO) ... I mean hey: does anybody actually use the DOM to *generate* XML?
 > Screw that -- I use "print". I can't imagine generating XML using the DOM.

  Perhaps I missed some context.  I use the DOM to edit structured
data; the input is essentially LaTeX, and the output is SGML/XML.  I
perform fairly large, structured edits before writing the data back
out.
  I agree the DOM would be painful for generating a small amount of
XML from a source structured very differently from the output, but my
application leads me to believe that though the DOM is fairly tedious, 
it gives me the ability to control the output in ways that support my
nit-picky approach to documents.
  Don't get me wrong: I'm sure a substantially better API could be
designed, especially if it wasn't intended to translate cleanly (or at 
all) to languages other than Python.  But I don't have time or
interest in that; the DOM works well enough in it's absense.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives


From Jeff.Johnson@icn.siemens.com  Tue Apr 20 16:00:41 1999
From: Jeff.Johnson@icn.siemens.com (Jeff.Johnson@icn.siemens.com)
Date: Tue, 20 Apr 1999 11:00:41 -0400
Subject: [XML-SIG] How can I search for a string of text
Message-ID: <85256759.0051CA54.00@li01.lm.ssc.siemens.com>


Hello everyone,

I need to remove a string from my HTML files but I don't know the best way to
find it.  There are usually line feeds in the HTML between the string so the
string does not appear as one DOM text node.  Does anyone know the best way to
find contiguous text that spans multiple DOM nodes?

While I'm at it, is there a good way to remove blank lines from ?ML files?  As I
read and rewrite my XML files, I find that extra line feeds accumulate.  I've
tried a few different approaches but have never been fully satisfied with them.

Thanks much,
Jeff


From paul@prescod.net  Tue Apr 20 15:47:36 1999
From: paul@prescod.net (Paul Prescod)
Date: Tue, 20 Apr 1999 09:47:36 -0500
Subject: [XML-SIG] DOM API
References: <Pine.LNX.3.95.990420003128.32319A-100000@ns1.lyra.org>
Message-ID: <371C9388.D0690375@prescod.net>

Greg Stein wrote:
> A general comment about your "subset" -- it is still heavyweight!

It wasn't clear what I was optimizing for: performance or simplicity. They
aren't always the same thing.

> euh... I can definitely state that in the applications that I've been
> working with, that PIs are bogus, but namespaces are absolutely required.
> (that's how my code came to be!)

> I have yet to see a specification related to XML that depends on PIs.
> Until that happens, then I don't see how these are relevant.

http://www.w3.org/TR/REC-xml
http://www.w3.org/TR/xml-stylesheet
http://www.w3.org/TR/NOTE-dcd
http://www.w3.org/TR/NOTE-ddml

Well let's put it this way: XML 1.0 uses PIs. So does the stylesheet
binding extension (for CSS and XSL). 

I don't doubt that namespaces are important but they can easily be viewed
as an extension of (or layer on top of) the minimal API.

> How is a "document" different in your mind, than an element that happens
> to be the root of a tree? I don't understand from your post. IMO, if you
> wnat simple, then just give the user a tree... that's all the dumb XML is
> anyhow.

Consider the "canonical Web-enabled XML document":

<?xml version="1.0"?>
<?xml-stylesheet blah blah blah?>
<!DOCTYPE MYDOC SYSTEM "http://...">
<MYDOC/>

There are four objects there. If we want it to be a tree we need a wrapper
object that contains them. You could argue that in the lightweight API the
version and doctype information could disappear but surely we want to
allow people to figure out what stylesheets are attached to their
documents!

> NodeType is bogus. It should be absolutely obvious from the context what a
> Node is. If you have so many objects in your system that you need NodeType
> to distinguish them, then you are certainly not a light-weight solution.

XML is a dynamically typed language, like XML. If I have a mix of
elements, characters and processing instructions then I need some way of
differentiating them. I don't feel like it is the place of an API to
decide that XML is a strongly typed language and silently throw away
important information from the document.

> > Document.DocumentElement (an element node property)
> 
> If Document has no other properties, then it is totally bogus. Just return
> the root Element. Why the hell return an object with a single property
> that refers to another object? Just return that object!

Document should also have ChildNodes.

> If you want light-weight, then GetAttribute is bogus given that the same
> concept is easily handled via the .Attributes value. Why introduce a
> method to simply do Element.Attributes.get(foo) ??

GetAttribute is simpler, more direct and maybe more efficient in some
cases. It works with simple strings and not attribute objects.

> > Element.TagName
> > Element.PreviousSibling
> > Element.NextSibing
> 
> These Sibling things mean one of two things:
> 
> 1) you have introduced loops in your data structure
> 2) you have introduced the requirement for the proxy crap that the current
> DOM is dealing with (the Node vs _nodeData thing).
> 
> (1) is mildly unacceptable in a light-weight solution (you don't want
> people to do a quick parse of data, and then require them to follow it up
> with .close()). 

I don't see this as a big deal.

This is an efficiency versus simplicity issue. These functions are
extremely convenient in a lot of situations.

> Case in point: I wrote a first draft davlib.py against the DOM. Damn it
> was a serious bitch to simply extract the CDATA contents of an element!

XML is a dynamically typed language. "I've implemented Java and now I'm
trying to implement Python and I notice that you guys through these
PyObject things around and they make my life harder. I'm going to dump
them from my implementation." 

> Moreover, it was also a total bitch to simply say "give me the child
> elements". Of course, that didn't work since the DOM insisted on returning
> a list of a mix of CDATA and elements.

It told you what was in your document.

If you want to include helper functions to do this stuff then I say fine:
but if you want to throw away the real structure of the document then I
don't think that that is appropriate.

> IMO, the XML DOM model is a neat theoretical expression of OO modelling of
> an XML document. For all practical purposes, it is nearly useless. (again:
> IMO) ... I mean hey: does anybody actually use the DOM to *generate* XML?
> Screw that -- I use "print". I can't imagine generating XML using the DOM.
> Complicated and processing intensive.

I'm not sure what your point is here. I wouldn't use the DOM *or* qp_xml
to generate XML in most cases. As you point out "print" or "file.write" is
sufficient in most applications. This has nothing to do with the DOM and
everything to do with the fact that writing to a file is inherently a
streaming operation so a tree usually gets in the way.

> Sorry to go off here, but the DOM really bugs me. I think it is actually a
> net-negative for the XML community to deal with the beast. I would love to
> be educated on the positive benefits for expressing an XML document thru
> the DOM model.

I think that the DOM is broken for a completely different set of reasons
than you do. But the DOM is also hugely popular and more widely
implemented than many comparable APIs in other domains. I'm told that
Microsoft's DOM impelementation is referenced in dozens of their products
and throughout many upcoming technologies. Despite its flaws, the DOM is
an unqualified success and some people like it more than XML itself. They
are building DOM interfaces to non-XML data!

> Use a mapping. Toss the intermediate object. If you just have name and
> value, then you don't need separate objects. Present the attributes as a
> mapping.

In this case I am hamstrung by DOM compatibility. This is a small price to
pay as long as we keep the simpler GetAttribute methods. The only reason
to get the attribute objects is when you want to iterate over all
attributes which is probably relatively rare.

-- 

 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

"The Excursion [Sport Utility Vehicle] is so large that it will come
equipped with adjustable pedals to fit smaller drivers and sensor 
devices that warn the driver when he or she is about to back into a
Toyota or some other object." -- Dallas Morning News


From paul@prescod.net  Tue Apr 20 17:11:15 1999
From: paul@prescod.net (Paul Prescod)
Date: Tue, 20 Apr 1999 11:11:15 -0500
Subject: [XML-SIG] DOM API
References: <Pine.LNX.3.95.990420003128.32319A-100000@ns1.lyra.org> <008b01be8b11$e19fe550$f29b12c2@pythonware.com>
Message-ID: <371CA723.FAEBF6AA@prescod.net>

Fredrik Lundh wrote:
> 
> the downside with Paul's line of reasoning is that it makes it
> impossible to come up with something that is light-weight
> also from the CPU's perspective...  not good.

That isn't true. I tend to think that usability is more important than
performance but if we decide to optimize for performance then we can make
a DOM-compatible API that is as fast as "qp". I mean the only thing that
is harder to implement in the miniDOM is siblings -- where I chose
convenience over efficiency. We can make the opposite choice.

In fact, I think that the namespace and language support in qp already
makes it relatively "heavyweight".

> I want something really light-weight, and highly pythonish, and I
> don't care the slightest about TLA compatibility.

It isn't a question of TLA compatibility. It's about using the data models
used everywhere else in the world. Python conforms to posix conventions
for file and socket operations, C conventions for string interpolation,
Perl conventions for regular expressions, Unix conventions for globbing
and so forth. If I wanted idiosyncratic invented-just-for-us interfaces I
would go and use Perl. 

To me, this is the central issue: to me, the Guido's genious lies in the
fact that he usually chooses adapt something before re-inventing it. This
makes learning Python easy. "Oh yeah, I recognize that from the other
languages I use." Well, SAX and DOM are what the other languages use.

Anyhow, "qp" is hardly more "Pythonic" than in the lightweight DOM API.
The following_cdata stuff is not like any API I've ever seen in Python or
elsewhere. The call for "pythonic-ness" is mostly a strawman. The DOM
works better in Python than in almost any other language: Nodelists are
lists, NamedNodeLists are maps, object types are instance classes, lists
can be heterogenous, etc.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

"The Excursion [Sport Utility Vehicle] is so large that it will come
equipped with adjustable pedals to fit smaller drivers and sensor 
devices that warn the driver when he or she is about to back into a
Toyota or some other object." -- Dallas Morning News


From jday@picard.csihq.com  Tue Apr 20 17:53:44 1999
From: jday@picard.csihq.com (John Day)
Date: Tue, 20 Apr 1999 12:53:44 -0400
Subject: [XML-SIG] How can I search for a string of text
Message-ID: <4.2.0.32.19990420125203.00a45bf0@mail.csihq.com>

Use a SAX interface to access the characters of the text file. Sounds like you
might know the enclosing tag names too (<BODY> ... </BODY>), so you might
be able to narrow the search somewhat. In any case, SAX will present the 
characters
as a stream for filtering.
-jday
At 11:00 AM 4/20/99 -0400, you wrote:

>Hello everyone,

>I need to remove a string from my HTML files but I don't know the best way to
>find it. There are usually line feeds in the HTML between the string so the
>string does not appear as one DOM text node. Does anyone know the best way to
>find contiguous text that spans multiple DOM nodes?

>While I'm at it, is there a good way to remove blank lines from ?ML files? 
>As I
>read and rewrite my XML files, I find that extra line feeds accumulate. I've
>tried a few different approaches but have never been fully satisfied with 
>them.

>Thanks much,
>Jeff


_______________________________________________
XML-SIG maillist - XML-SIG@python.org
http://www.python.org/mailman/listinfo/xml-sig


From akuchlin@cnri.reston.va.us  Thu Apr 22 00:01:46 1999
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Wed, 21 Apr 1999 19:01:46 -0400 (EDT)
Subject: [XML-SIG] How can I search for a string of text
In-Reply-To: <85256759.0051CA54.00@li01.lm.ssc.siemens.com>
References: <85256759.0051CA54.00@li01.lm.ssc.siemens.com>
Message-ID: <14110.22394.584206.529707@amarok.cnri.reston.va.us>

Jeff.Johnson@icn.siemens.com writes:
>I need to remove a string from my HTML files but I don't know the best way to
>find it.  There are usually line feeds in the HTML between the string so the
>string does not appear as one DOM text node.  Does anyone know the best way to
>find contiguous text that spans multiple DOM nodes?

	The normalize() method on an Element node consolidates the
subtree so there are no adjacent Text nodes, merging Text nodes that
are next to each other into a single node.  So you could do
document.rootElement.normalize(), and then rely on the string being
contained within one node.  That won't catch tricky cases -- do you
need to find it if an entity expands to the string, or to part of the
string?  if the string had a PI in the middle of it, would it still
count as a match? -- but it'll certainly help with the simple case.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
It is not that I wanted to know a great deal, in order to acquire what is now
called expertise, and which enables one to become an expert-tease to people
who don't know as much as you do about the tiny corner you have made your own.
    -- Robertson Davies, _The Rebel Angels_


From mike.olson@fourthought.com  Thu Apr 22 00:36:45 1999
From: mike.olson@fourthought.com (Mike Olson)
Date: Wed, 21 Apr 1999 18:36:45 -0500
Subject: [XML-SIG] DOM API
References: <Pine.LNX.3.95.990420003128.32319A-100000@ns1.lyra.org> <008b01be8b11$e19fe550$f29b12c2@pythonware.com> <371CA723.FAEBF6AA@prescod.net>
Message-ID: <371E610D.BA7861AE@fourthought.com>

This is a cryptographically signed message in MIME format.

--------------msE1C4377628C886A3F4622D9F
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit


First, for those of you that think PyDOM is heavy weight you obviously haven't
treid 4DOM :)

But, then that is its purpose.  We built it to follow the W3C spec to the
letter so it is big and can be not very user friendly at times.  However, it
is completely usable over an ORB or COM with clients in Java, C++, COBOL
whatever.  It fits our needs nicely.  And as Paul mentions, with somany other
people starting to use it I imagine the CORBA functionality will be usefull to
others as well.

I see PyDOM as its light weight brother.  It gets all of its speed increase
from not having to wrap every list in a NodeList, and not having to wrap every
dict in a NamedNodeMap.  Last I looked it also does not create some of the
"not so important" nodes such as Attrs.

Do we really need a third interface?

qp sounds like a great optimization if you don't care about the original
document structure or PIs. (you could never  get 4XSL to use this interface).

In some cases you may care about the document structure but not care about
name spaces, or attributes, or elements...  I think if we try to make it any
more lite weight we will only satisfy 1/2 of us because we will need to start
dropping "non important" parts of the original document.  If you really need
more speed, then grap SAX or expat and go.  or post the super modifed class as
libraries, not standard APIs.


Back to the subject at hand.  I think that PyDOM and 4DOM could have a
standard interface for applications built on top ie 4XSL.  I think we would
have to come both ways a little though.  Ex. I think it would be very hard to
get rid of the idea of a Attr class from 4DOM, while I think we could very
easily extend the NodeList class to support native python list manipulation.
This would only be the case if the application is never intended for use over
an ORB (cannot call __getitem__ over an ORB very easily).

If we do this though, I don't see that alot will be gained.  You can swap in
and out DOM implementations when ever you like, but you would not be able to
use the 4DOM implementation and expect 4XSL to function over an ORB.  If
applications that use the DOM use 4DOMS new "quick" API then the speed will be
about equivelent to that of PyDOM (probably still a bit slower but not by
much).  Your choice would be down to, do I like Andy better, or Mike...:)

Later


Paul Prescod wrote:

> Fredrik Lundh wrote:
> >
> > the downside with Paul's line of reasoning is that it makes it
> > impossible to come up with something that is light-weight
> > also from the CPU's perspective...  not good.
>
> That isn't true. I tend to think that usability is more important than
> performance but if we decide to optimize for performance then we can make
> a DOM-compatible API that is as fast as "qp". I mean the only thing that
> is harder to implement in the miniDOM is siblings -- where I chose
> convenience over efficiency. We can make the opposite choice.
>
> In fact, I think that the namespace and language support in qp already
> makes it relatively "heavyweight".
>
> > I want something really light-weight, and highly pythonish, and I
> > don't care the slightest about TLA compatibility.
>
> It isn't a question of TLA compatibility. It's about using the data models
> used everywhere else in the world. Python conforms to posix conventions
> for file and socket operations, C conventions for string interpolation,
> Perl conventions for regular expressions, Unix conventions for globbing
> and so forth. If I wanted idiosyncratic invented-just-for-us interfaces I
> would go and use Perl.
>
> To me, this is the central issue: to me, the Guido's genious lies in the
> fact that he usually chooses adapt something before re-inventing it. This
> makes learning Python easy. "Oh yeah, I recognize that from the other
> languages I use." Well, SAX and DOM are what the other languages use.
>
> Anyhow, "qp" is hardly more "Pythonic" than in the lightweight DOM API.
> The following_cdata stuff is not like any API I've ever seen in Python or
> elsewhere. The call for "pythonic-ness" is mostly a strawman. The DOM
> works better in Python than in almost any other language: Nodelists are
> lists, NamedNodeLists are maps, object types are instance classes, lists
> can be heterogenous, etc.
>
> --
>  Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
>  http://itrc.uwaterloo.ca/~papresco
>
> "The Excursion [Sport Utility Vehicle] is so large that it will come
> equipped with adjustable pedals to fit smaller drivers and sensor
> devices that warn the driver when he or she is about to back into a
> Toyota or some other object." -- Dallas Morning News
>
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://www.python.org/mailman/listinfo/xml-sig

--
Mike Olson
Member Consultant
FourThought LLC
http://www.fourthought.com http://opentechnology.org


---

"No program is interesting in itself to a programmer. It's only interesting as
long
as there are new challenges and new ideas coming up." --- Linus Torvalds


--------------msE1C4377628C886A3F4622D9F
Content-Type: application/x-pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIIKmQYJKoZIhvcNAQcCoIIKijCCCoYCAQExCzAJBgUrDgMCGgUAMAsGCSqGSIb3DQEHAaCC
CCUwggTvMIIEWKADAgECAhAOCY8cYeSQOObs5zKyDmWRMA0GCSqGSIb3DQEBBAUAMIHMMRcw
FQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UECxMWVmVyaVNpZ24gVHJ1c3QgTmV0d29y
azFGMEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9yZXBvc2l0b3J5L1JQQSBJbmNvcnAuIEJ5
IFJlZi4sTElBQi5MVEQoYyk5ODFIMEYGA1UEAxM/VmVyaVNpZ24gQ2xhc3MgMSBDQSBJbmRp
dmlkdWFsIFN1YnNjcmliZXItUGVyc29uYSBOb3QgVmFsaWRhdGVkMB4XDTk5MDMwNTAwMDAw
MFoXDTk5MDUwNDIzNTk1OVowggEKMRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UE
CxMWVmVyaVNpZ24gVHJ1c3QgTmV0d29yazFGMEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9y
ZXBvc2l0b3J5L1JQQSBJbmNvcnAuIGJ5IFJlZi4sTElBQi5MVEQoYyk5ODEeMBwGA1UECxMV
UGVyc29uYSBOb3QgVmFsaWRhdGVkMSYwJAYDVQQLEx1EaWdpdGFsIElEIENsYXNzIDEgLSBO
ZXRzY2FwZTETMBEGA1UEAxQKTWlrZSBPbHNvbjEpMCcGCSqGSIb3DQEJARYabWlrZS5vbHNv
bkBmb3VydGhvdWdodC5jb20wgZ8wDQYJKoZIhvcNAQEBBQADgY0AMIGJAoGBANKGswZUnQ/B
IfNlZWIIy6G6AkyjYgPRhXynebPtI5ARMq9xDo2zgLgWE+8QffdoZp2hUnTpm63B6cG8yqH1
PnA/7SB2roIfml1vnOwXgNuBctciTmnrac4GWgL0CM9839fJZh47QIVYPlCbOPtnvnH1NGGD
jFWAVX7vmES72Dl9AgMBAAGjggGPMIIBizAJBgNVHRMEAjAAMIGsBgNVHSAEgaQwgaEwgZ4G
C2CGSAGG+EUBBwEBMIGOMCgGCCsGAQUFBwIBFhxodHRwczovL3d3dy52ZXJpc2lnbi5jb20v
Q1BTMGIGCCsGAQUFBwICMFYwFRYOVmVyaVNpZ24sIEluYy4wAwIBARo9VmVyaVNpZ24ncyBD
UFMgaW5jb3JwLiBieSByZWZlcmVuY2UgbGlhYi4gbHRkLiAoYyk5NyBWZXJpU2lnbjARBglg
hkgBhvhCAQEEBAMCB4AwgYYGCmCGSAGG+EUBBgMEeBZ2ZDQ2NTJiZDYzZjIwNDcwMjkyOTg3
NjNjOWQyZjI3NTA2OWM3MzU5YmVkMWIwNTlkYTc1YmM0YmM5NzAxNzQ3ZGE1ZDNmMjE0MWJl
YWRiMmJkMmU4OTIxM2FlNmFmOWRmMTE0OTk5YTNiODQ1ZjlmM2VhNDUwYzAzBgNVHR8ELDAq
MCigJqAkhiJodHRwOi8vY3JsLnZlcmlzaWduLmNvbS9jbGFzczEuY3JsMA0GCSqGSIb3DQEB
BAUAA4GBAIuxBeIOBMHbj5yM/Vu4UJxDcz4Xtc7h0K8c6d82SiwwKLN5Gbew69PevcN6Ak+p
D8LO4NyCH8Cfu3acoT0Efi99XjWvdi2eSbDJUw6MvgJtnAfY03zM+Cf31A/1iyrvr3hD45/c
yhUNRh8f6qX1NzeKvvh5AcYD1bsi+0wnP0D8MIIDLjCCApegAwIBAgIRANJ2Lo0UDD19sqgl
Xa/uDXUwDQYJKoZIhvcNAQECBQAwXzELMAkGA1UEBhMCVVMxFzAVBgNVBAoTDlZlcmlTaWdu
LCBJbmMuMTcwNQYDVQQLEy5DbGFzcyAxIFB1YmxpYyBQcmltYXJ5IENlcnRpZmljYXRpb24g
QXV0aG9yaXR5MB4XDTk4MDUxMjAwMDAwMFoXDTA4MDUxMjIzNTk1OVowgcwxFzAVBgNVBAoT
DlZlcmlTaWduLCBJbmMuMR8wHQYDVQQLExZWZXJpU2lnbiBUcnVzdCBOZXR3b3JrMUYwRAYD
VQQLEz13d3cudmVyaXNpZ24uY29tL3JlcG9zaXRvcnkvUlBBIEluY29ycC4gQnkgUmVmLixM
SUFCLkxURChjKTk4MUgwRgYDVQQDEz9WZXJpU2lnbiBDbGFzcyAxIENBIEluZGl2aWR1YWwg
U3Vic2NyaWJlci1QZXJzb25hIE5vdCBWYWxpZGF0ZWQwgZ8wDQYJKoZIhvcNAQEBBQADgY0A
MIGJAoGBALtaRIoEFrtV/QN6ii2UTxV4NrgNSrJvnFS/vOh3Kp258Gi7ldkxQXB6gUu5SBNW
LccI4YRCq8CikqtEXKpC8IIOAukv+8I7u77JJwpdtrA2QjO1blSIT4dKvxna+RXoD4e2HOPM
xpqOf2okkuP84GW6p7F+78nbN2rISsgJBuSZAgMBAAGjfDB6MBEGCWCGSAGG+EIBAQQEAwIB
BjBHBgNVHSAEQDA+MDwGC2CGSAGG+EUBBwEBMC0wKwYIKwYBBQUHAgEWH3d3dy52ZXJpc2ln
bi5jb20vcmVwb3NpdG9yeS9SUEEwDwYDVR0TBAgwBgEB/wIBADALBgNVHQ8EBAMCAQYwDQYJ
KoZIhvcNAQECBQADgYEAiLg3O93alDcAraqf4YEBcR6Sam0v9vGd08pkONwbmAwHhluFFWoP
uUmFpJXxF31ntH8tLN2aQp7DPrSOquULBt7yVir6M8e+GddTTMO9yOMXtaRJQmPswqYXD11Y
Gkk8kFxVo2UgAP0YIOVfgqaxqJLFWGrBjQM868PNBaKQrm4xggI8MIICOAIBATCB4TCBzDEX
MBUGA1UEChMOVmVyaVNpZ24sIEluYy4xHzAdBgNVBAsTFlZlcmlTaWduIFRydXN0IE5ldHdv
cmsxRjBEBgNVBAsTPXd3dy52ZXJpc2lnbi5jb20vcmVwb3NpdG9yeS9SUEEgSW5jb3JwLiBC
eSBSZWYuLExJQUIuTFREKGMpOTgxSDBGBgNVBAMTP1ZlcmlTaWduIENsYXNzIDEgQ0EgSW5k
aXZpZHVhbCBTdWJzY3JpYmVyLVBlcnNvbmEgTm90IFZhbGlkYXRlZAIQDgmPHGHkkDjm7Ocy
sg5lkTAJBgUrDgMCGgUAoIGxMBgGCSqGSIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcN
AQkFMQ8XDTk5MDQyMTIzMzY0OFowIwYJKoZIhvcNAQkEMRYEFMuz5l/ulfETz2WQ8N/AosaL
ECkbMFIGCSqGSIb3DQEJDzFFMEMwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMAcGBSsO
AwIHMA0GCCqGSIb3DQMCAgFAMA0GCCqGSIb3DQMCAgEoMA0GCSqGSIb3DQEBAQUABIGAoo4T
rR3aACiAiRozvrP6Ok+JAI+I29iNHzO0A/wu1mbvzVNg8SUsxaJ2zydxmSmu+XYoVEuKF6JZ
zr13w9spDmjh70QoM4syYa/zfHfoRgPXXM2vnAItdCM+A4ZdpK5o1pL9QXlQhaHJDMFO4mbb
ZbBRp0c8mcXyIvokJ1lRPrk=
--------------msE1C4377628C886A3F4622D9F--


From akuchlin@cnri.reston.va.us  Thu Apr 22 03:28:50 1999
From: akuchlin@cnri.reston.va.us (A.M. Kuchling)
Date: Wed, 21 Apr 1999 22:28:50 -0400
Subject: [XML-SIG] XBEL questions
In-Reply-To: <19990419124624.A15472@beyondtv.net>
References: <19990419124624.A15472@beyondtv.net>
Message-ID: <199904220228.WAA13517@207-172-49-2.s2.tnt14.ann.va.dialup.rcn.com>

Tim Lavoie writes:
 > I've just started tinkering with the XBEL package and its sample
 > scripts, converting from Netscape (4.5) to XBEL format. The script
 > needed tags converted to upper-case to recognize what Communicator had 
 > written, no big deal. What did puzzle me was the output; the DTD lists 
 > tags in lower case, but the bookmark.py script generated everything in 
 > upper case. Since XML is case-sensitive, isn't this wrong?

	How were you running it?  bookmark.py doesn't have a block of
code that runs if __name__ == '__main__', so you can't run bookmark.py
directly; you must have been running ns_parse.py, and that seems to
produce lower-case output as it should.  I also don't understand the
requirement for uppercase input, because in ns_parse.py the
startElement() and endElement() both convert the element name to
lowercase.  Can you provide the exact command you ran, and perhaps a
sample bookmark file as well? (Privately to me is fine.)

 > message pointing to the second "=" character. This character follows
 > the second parameter name, which as in all HTML is preceded by a "&"
 > character. Could the problem be that tag contents need to be encoded
 > first? The tag looks like:

	Good catch; one of the dump_xbel() methods should have read
escape(href) instead of just href.  I've fixed this in the CVS tree;
thanks!

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
Been there, Remiel. Done that, wore the tee-shirt, ate the burger, bought the
original cast album, choreographed the legions of the damned and orchestrated
the screaming...
    -- Lucifer, in SANDMAN #60: "The Kindly Ones:4"


From uche.ogbuji@fourthought.com  Thu Apr 22 06:08:50 1999
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Wed, 21 Apr 1999 23:08:50 -0600
Subject: [XML-SIG] DOM API
In-Reply-To: Your message of "Mon, 19 Apr 1999 12:17:19 CDT."
 <371B651E.A1771FEB@prescod.net>
Message-ID: <199904220508.XAA03662@malatesta.local>

> > This does make sense.  The first question would be philisohical: should such a
> > unified interface stick closely to the W3C's IDL, or should it be more
> > faithful to Python (i.e. returning PyLists instead of NodeList objects).  This
> > is the main difference between the two Python DOM implementation.  We could
> > build an adapter accordingly (most of the work is already don with
> > DOM.Ext.NodeList2PyList, etc), but I'd like to hear people's opinions first.
> 
> Do we really have to choose? If, for example, a NodeList object can act as
> a Python sequence then don't we have the best of both worlds? I mean if
> you really need a PyList then you can use "map" to generate one.

Dieter Maurer already pointed out to me that my memory was fuzzy, and that 
PyDOM already provides combined (NodeList and PyList) interfaces.

The main problem with 4DOM's overloading NodeList with PyList behavior, 
besides our desire to remain close to the spec except in clearly-marked 
exceptions, is the fact that you can't invoke methods of the form "__method__ 
" across an ORB.  In fact, strictly speaking, you can't encode them into IDL.

I know that this brings up yet again the question of why we insist on 
ORB-enabling 4DOM, but it has to do with much of the work we have been doing 
with 4DOM: some for clients, and some hopefully to become separate open 
products soon.  Interfacing to object-database adapters, for instance, is a 
lot easier if one can directly take advantage of the ODMG-OMG bindings.

We have considered a lightweight 4DOM that isn't so ORB-fanatic, but we don't 
really have the time for this, and besides, PyDOM fills that niche quite well.

I know, I know, the problem remains that PyDOM and 4DOM are different enough 
to complicate portable Python DOM applications.  I hope this conversation can 
lead us to a way about that.

> I would like to think that Python is sufficiently flexible that most of
> these choices could be made in a DOM compatible AND Python compatible way.

Python is, of course, but not CORBA, helas.

> The downside of doing both is that a Java or C++ implementation of a "raw"
> DOM accessed over CORBA or COM would not be compatible -- but we could
> write Python wrappers that would make them so.

Well, we do provide the NodeListToPylist and PyListToNodeList, and we do often 
use these wrappers in our own apps.  It appears you're asking for more, though.

BTW, we're preparing a pretty neat demo of interchanging DOM between Java and 
Python over an ORB with 4DOM on the Python side.  We've already found that you 
can do powerful things with this arrangement.  Now if only Fnorb would play 
more nicely with other ORBs via IIOP, or if ILU would be less insistent on 
explicit server-side reference of objects.

-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org


From paul@prescod.net  Thu Apr 22 05:54:54 1999
From: paul@prescod.net (Paul Prescod)
Date: Wed, 21 Apr 1999 23:54:54 -0500
Subject: [XML-SIG] DOM API
References: <Pine.LNX.3.95.990420003128.32319A-100000@ns1.lyra.org> <008b01be8b11$e19fe550$f29b12c2@pythonware.com> <371CA723.FAEBF6AA@prescod.net> <371E610D.BA7861AE@fourthought.com>
Message-ID: <371EAB9E.5B70AEE@prescod.net>

Mike Olson wrote:
> 
> I see PyDOM as its light weight brother.  It gets all of its speed increase
> from not having to wrap every list in a NodeList, and not having to wrap every
> dict in a NamedNodeMap.  Last I looked it also does not create some of the
> "not so important" nodes such as Attrs.

If those are created lazily then there is no runtime cost if you don't use
them.

> In some cases you may care about the document structure but not care about
> name spaces, or attributes, or elements...  I think if we try to make it any
> more lite weight we will only satisfy 1/2 of us because we will need to start
> dropping "non important" parts of the original document.

Yes, this is what worries me. XML is XML. If you want to have a parser
flag to turn off parts of XML then that's cool but a standard API should
not throw away document content without a flag.

> If you really need
> more speed, then grap SAX or expat and go.  

That's what I was thinking earlier today: if these applications need speed
so much then why are they using a tree API at all? My rule of thumb is:
filter==fast, tree==convenient. That's why I instictively put in sibling
pointers in my "minidom".

> Back to the subject at hand.  I think that PyDOM and 4DOM could have a
> standard interface for applications built on top ie 4XSL.  I think we would
> have to come both ways a little though.  Ex. I think it would be very hard to
> get rid of the idea of a Attr class from 4DOM, while I think we could very
> easily extend the NodeList class to support native python list manipulation.

If you make 4DOM more Pythonish and PyDOM gets closer to DOM conformance
then it seems to me that everybody wins unless some outright
incompatibilities are found.

> This would only be the case if the application is never intended for use over
> an ORB (cannot call __getitem__ over an ORB very easily).
> 
> If we do this though, I don't see that alot will be gained.  You can swap in
> and out DOM implementations when ever you like, but you would not be able to
> use the 4DOM implementation and expect 4XSL to function over an ORB.  

Sorry, I don't get that.

> If
> applications that use the DOM use 4DOMS new "quick" API then the speed will be
> about equivelent to that of PyDOM 

4DOM has a quick API? Or are you saying that the Python-ish extensions
would *be* the quick API? Does 4DOM still require the installation of an
ORB?

Your choice would be down to, do I like Andy better, or Mike...:)

This part I understand. I like Mike. We're talking about Michael Jordan,
right?

Anyhow, the main thing I prefer about PyDOM is not performance but the
fact that my customers don't have to install an ORB to use it.

---

I don't quite follow the last paragraph but I'll take a stab:

 * we could make an API that used Python-ish features without being
incompatible with the DOM. 

 * 4DOM could add the Python-ish features and PyDOM could fix any outright
incompatibilities (e.g. attrs)

 * maybe we could add a few convenience functions to make life easier
(e.g. getText, getChildElements)

 * programs that used some DOM features would be incompatible with PyDOM
because it would have optimized some of them away.

 * programs that used the Python-ish features would not work over an ORB.

Actually, I don't buy that last point (maybe its a straw person). We can
easily make a bridge that adds the Pythonish features to objects on the
other side of an ORB (4DOM, Java, whatever). Of course when you are using
4DOM in the same process space you wouldn't use the bridge.

In summary, I think that unifying the APIs of these libraries is the right
thing to do and will give real benefits.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

"The Excursion [Sport Utility Vehicle] is so large that it will come
equipped with adjustable pedals to fit smaller drivers and sensor 
devices that warn the driver when he or she is about to back into a
Toyota or some other object." -- Dallas Morning News


From Greg Stein <gstein@lyra.org>  Thu Apr 22 06:52:23 1999
From: Greg Stein <gstein@lyra.org> (Greg Stein)
Date: Wed, 21 Apr 1999 22:52:23 -0700 (PDT)
Subject: [XML-SIG] qp API
In-Reply-To: <371EAB9E.5B70AEE@prescod.net>
Message-ID: <Pine.LNX.3.95.990421223551.12908D-100000@ns1.lyra.org>

All right.... it seems apparent that something like the qp-api that I
proposed (in response to Fredrik) isn't going to really satisfy a number
of people for a "lightweight" API. It seems that a tendency exists to push
towards the DOM facilities.

What is the approach from here? Can we really examine the qp-api interface
with the intent of a lightweight system?

Actually: that is a good point.... what is "lightweight" ? I define that
as something that is fast, has a small set of objects, and has a small
interface (few objects/methods).
 
A question was asked: do we need Yet Another Interface? I believe that we
do. IMO, the qp interface is very well tuned towards apps being able to
interpret what is really going on when an XML doc arrives (yes, within
certain constraints). IMO, the DOM is great for translations of input XML
to output XML. But someting like qp is handy for grabbing input and
dealing with it (I was never able to really do that well with the DOM).
 
Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From uche.ogbuji@fourthought.com  Thu Apr 22 07:35:32 1999
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Thu, 22 Apr 1999 00:35:32 -0600
Subject: [XML-SIG] DOM API
In-Reply-To: Your message of "Wed, 21 Apr 1999 23:54:54 CDT."
 <371EAB9E.5B70AEE@prescod.net>
Message-ID: <199904220635.AAA03840@malatesta.local>

> 4DOM has a quick API? Or are you saying that the Python-ish extensions
> would *be* the quick API? Does 4DOM still require the installation of an
> ORB?

No.  We removed that restriction several versions ago.  You can just use the 
"make orbless" configuration to run without an ORB.

> Anyhow, the main thing I prefer about PyDOM is not performance but the
> fact that my customers don't have to install an ORB to use it.

Well, by that criteria, now you have a choice.

I have to second your summary to Fred.  I'm not as hung up with performance.  
We've used 4DOM for some heavy lifting (not to mention Mike's diversion 
writing a graphics-heavy Web-based solitaire game in Fnorb: we need to find 
him more to do).  We tend to run into bottlenecks elsewhere before the DOM.

>  * we could make an API that used Python-ish features without being
> incompatible with the DOM. 

The only problem for 4DOM is those double-underscore methods.  Maybe there's a 
way to wrapper this that escapes me.

>  * 4DOM could add the Python-ish features and PyDOM could fix any outright
> incompatibilities (e.g. attrs)
>
>  * maybe we could add a few convenience functions to make life easier
> (e.g. getText, getChildElements)

I have no problem with this.  We already add many such convenient methods to 
our 4DOM Ext package: GetElementsByTagName, GetElementsById, Strip, etc.  
These mostly use the DOM level 2 NodeIterator stuff, BTW.

>  * programs that used some DOM features would be incompatible with PyDOM
> because it would have optimized some of them away.

As long as it's documented, I don't see this as a problem.

>  * programs that used the Python-ish features would not work over an ORB.
>
> Actually, I don't buy that last point (maybe its a straw person). We can
> easily make a bridge that adds the Pythonish features to objects on the
> other side of an ORB (4DOM, Java, whatever). Of course when you are using
> 4DOM in the same process space you wouldn't use the bridge.

Yes, but some Pythonish features would be quite a bear to get by an IDL 
compiler, and it's nice being able to (theoretically) just plug into any 
Java/C++/etc. module across the ORB without first adding a trickly "Pythonic" 
adapter.  This adapter would also have to be re-written for each remote 
implementation.

> In summary, I think that unifying the APIs of these libraries is the right
> thing to do and will give real benefits.

Well, we're certainly interested, and I think that if we can sort out the 
double-underscore thing, we're most of the way there.

-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org


From stuart.hungerford@webone.com.au  Thu Apr 22 13:32:56 1999
From: stuart.hungerford@webone.com.au (Stuart Hungerford)
Date: Thu, 22 Apr 1999 22:32:56 +1000
Subject: [XML-SIG] New XSL draft -- any Python plans?
Message-ID: <000c01be8cbc$4056c040$0301a8c0@restless.com>

Hi all,

There's a new draft of the XSL proposal available (19990421) and
it includes some interesting features for calling "functions" in other
notations--a wonderful use for Python if I've understood the 
proposal right.

Can anyone working on Python XSL tools (e.g. 4XSL) tell us your
plans on supporting the new draft?

Stu


From uche.ogbuji@fourthought.com  Thu Apr 22 14:14:19 1999
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Thu, 22 Apr 1999 07:14:19 -0600
Subject: [XML-SIG] ANN: 4XSL 0.6.1
Message-ID: <199904221314.HAA04564@malatesta.local>

4XSL 0.6.0 was. as we found out, not very usable outside our purposes.  We've 
been more careful testing and packaging 4XSL 0.6.1:

* We've added command line options to specify the style-sheets, ignore 
PI-specified style-sheets, and validate the XML file
* Now debugging info is only printed if you set a special environment variable
* All XSL templates except for xsl:counter and its dependents have been 
implemented.
* We've applied the usual and numerous bug-fixes

Thanks for all who gave feed-back, and were patient with 4XSL 0.6.0.  We're 
announcing this version only to the python-xml list for now until we complete 
xsl:counter and update against the latest XSL(T) draft.  However, please feel 
free to distribute, discuss and use it as you wish, according to the license 
(which is unchanged from 0.6.0 and much like Python).

===============================================================================

4XSL is an XSL processor written in Python, using 4DOM.

This is really an alpha-level release, although we have used it successfully 
to render our Web site (www.FourThought.com), which quite thoroughly exercises 
the features.

You can download 4XSL file from

ftp:///starship.python.net/pub/crew/uche/4XSL/4XSL-0.6.1.tar.gz

See the README in the archive to get started.  Feedback welcome (to 
4Web@fourthought.com).  All templates except for xsl:counter and dependents 
are supported, and the full set of patterns.

Thanks for all the interest from this group.

-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org


From uche.ogbuji@fourthought.com  Thu Apr 22 14:22:24 1999
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Thu, 22 Apr 1999 07:22:24 -0600
Subject: [XML-SIG] Re: ANN: 4XSL 0.6.1
Message-ID: <199904221322.HAA04593@malatesta.local>

Please note that the 4XSL 0.6.1 package comes bundled with, and requires 4DOM 
0.7.2.  You do not need to install a CORBA environment to install 4DOM 
versions after 0.7.0.  4DOM 0.7.2 is not yet widely distributed, but it has 
been tested internally, against 4XSL and other DOM applications.  Many 
optimizations have been applied since 4DOM 0.7.0.

-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org


From Jeff.Johnson@icn.siemens.com  Thu Apr 22 16:10:37 1999
From: Jeff.Johnson@icn.siemens.com (Jeff.Johnson@icn.siemens.com)
Date: Thu, 22 Apr 1999 11:10:37 -0400
Subject: [XML-SIG] How can I search for a string of text
Message-ID: <8525675B.00536154.00@li01.lm.ssc.siemens.com>


normalize... haven't tried that one before.  Sounds like it might do the trick.
The string in question should not have any entities or elements mixed into it so
I don't need to worry about that.  I'll give it a shot.

Thanks :)


Jeff.Johnson@icn.siemens.com writes:
>I need to remove a string from my HTML files but I don't know the best way to
>find it.  There are usually line feeds in the HTML between the string so the
>string does not appear as one DOM text node.  Does anyone know the best way to
>find contiguous text that spans multiple DOM nodes?


>"Andrew M. Kuchling" <akuchlin@cnri.reston.va.us> writes:
>    The normalize() method on an Element node consolidates the
>subtree so there are no adjacent Text nodes, merging Text nodes that
>are next to each other into a single node.  So you could do
>document.rootElement.normalize(), and then rely on the string being
>contained within one node.  That won't catch tricky cases -- do you
>need to find it if an entity expands to the string, or to part of the
>string?  if the string had a PI in the middle of it, would it still
>count as a match? -- but it'll certainly help with the simple case.


From paul@prescod.net  Thu Apr 22 16:58:13 1999
From: paul@prescod.net (Paul Prescod)
Date: Thu, 22 Apr 1999 10:58:13 -0500
Subject: [XML-SIG] DOM API
References: <199904220635.AAA03840@malatesta.local>
Message-ID: <371F4715.2971DF2D@prescod.net>

uche.ogbuji@fourthought.com wrote:
> 
> No.  We removed that restriction several versions ago.  You can just 
> use the "make orbless" configuration to run without an ORB.

Neat. And presumably after I do a "make orbless" I can ship the resulting
package so that it doesn't have to be re-made on the client side. Maybe
you guys should make that the default so that it works like other
Python-written DOMs that do not have to be configured.

> Yes, but some Pythonish features would be quite a bear to get by an IDL
> compiler, and it's nice being able to (theoretically) just plug into any
> Java/C++/etc. module across the ORB without first adding a trickly "Pythonic"
> adapter.  This adapter would also have to be re-written for each remote
> implementation.

This is the central issue. Why would it have to be rewritten for each
remote implementation? Presumably we expect all CORBA/DOM compliant
implementations to supply the same interface. Can't we wrap that interface
so that the same wrappers should work for all of them.

I mean I know that some things like document creation and parsing are
non-standard but we should be able to uniformly wrap the rest.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

"The Excursion [Sport Utility Vehicle] is so large that it will come
equipped with adjustable pedals to fit smaller drivers and sensor 
devices that warn the driver when he or she is about to back into a
Toyota or some other object." -- Dallas Morning News


From paul@prescod.net  Thu Apr 22 17:04:03 1999
From: paul@prescod.net (Paul Prescod)
Date: Thu, 22 Apr 1999 11:04:03 -0500
Subject: [XML-SIG] ANN: 4XSL 0.6.1
References: <199904221314.HAA04564@malatesta.local>
Message-ID: <371F4873.D68D7EB2@prescod.net>

uche.ogbuji@fourthought.com wrote:
> 
> We're
> announcing this version only to the python-xml list for now until 
> we complete xsl:counter and update against the latest XSL(T) draft.  

Which is it? Counters were removed from XSLT. :) 

For those who are interested:

E. Changes from Previous Public Working Draft

       The following is a summary of changes since the previous public
working draft.

       Select patterns, string expressions and boolean expressions have
been combined and generalized into an expression language with multiple
data types (see [6 Expressions and Patterns]).

       xsl:strip-space and xsl:preserve-space have an elements attribute
which specifies a list of element types, rather than a element attribute
specifying a single element type.

       The id() function has been split into id() and idref().

       xsl:id has been replaced by the xsl:key element (see [6.4.1
Declaring Keys]), and associated key() and keyref() functions.

       The doc() and docref() have been added to support multiple source
documents.

       Namespace wildcards (ns:*) have been added.

       ancestor() and ancestor-or-self() have been replaced by a more
general facility for addressing different axes.

       Positional qualifiers (first-of-type(), first-of-any(),
last-of-type(), last-of-any()) have been replaced by the position() and
last() functions and numeric expressions inside [].

       Counters have been removed. An expr attribute has been added to
xsl:number which in conjunction with the position() allows numbering of
sorted node lists.

       Multiple adjacent uses of [] are allowed.

       Macros and templates have been unified by allowing templates to be
named and have parameters.

       xsl:constant have been replaced by xsl:variable which allows
variables to be typed and local.

       The default for priority on xsl:template has changed (see [7.4
Conflict Resolution for Template Rules]).

       An extension mechanism has been added (see [6.4.2 Declaring
Extension Functions]).

       The namespace URIs have been changed.

       xsl:copy-of has been added (see [9.5 Copying]).

       A error recovery mechanism to allow forwards-compatibility has been
added (see [3 Forwards-compatible Processing]).

       A namespace attribute has been added to xsl:element and
xsl:attribute.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

"The Excursion [Sport Utility Vehicle] is so large that it will come
equipped with adjustable pedals to fit smaller drivers and sensor 
devices that warn the driver when he or she is about to back into a
Toyota or some other object." -- Dallas Morning News


From dieter@handshake.de  Thu Apr 22 20:58:22 1999
From: dieter@handshake.de (Dieter Maurer)
Date: Thu, 22 Apr 1999 19:58:22 +0000 (/etc/localtime)
Subject: [XML-SIG] New XSL draft -- any Python plans?
In-Reply-To: <000c01be8cbc$4056c040$0301a8c0@restless.com>
References: <000c01be8cbc$4056c040$0301a8c0@restless.com>
Message-ID: <14111.32515.571734.849028@lindm.dm>

Stuart Hungerford writes:
 > Can anyone working on Python XSL tools (e.g. 4XSL) tell us your
 > plans on supporting the new draft?
I plan to suuport the new draft in XSL-Pattern 0.4.
However, it will only be available in some weeks.

- Dieter


From dieter@handshake.de  Thu Apr 22 21:56:17 1999
From: dieter@handshake.de (Dieter Maurer)
Date: Thu, 22 Apr 1999 20:56:17 +0000 (/etc/localtime)
Subject: [XML-SIG] DOM API
In-Reply-To: <199904220508.XAA03662@malatesta.local>
References: <371B651E.A1771FEB@prescod.net>
 <199904220508.XAA03662@malatesta.local>
Message-ID: <14111.35252.250202.333651@lindm.dm>

uche.ogbuji@fourthought.com writes:
 > The main problem with 4DOM's overloading NodeList with PyList behavior, 
 > besides our desire to remain close to the spec except in clearly-marked 
 > exceptions, is the fact that you can't invoke methods of the form "__method__ 
 > " across an ORB.  In fact, strictly speaking, you can't encode them into IDL.
ILU has a nice extension called "custom surrogates".
It allows the raw CORBA objects to be wrapped by custom objects
providing additional functionality, e.g. Python list or
dictionary emulation.

I do not know, how wide spread this (or a similar) feature is in
other ORB's. It is very useful, though.
 
- Dieter


From paul@prescod.net  Thu Apr 22 22:37:05 1999
From: paul@prescod.net (Paul Prescod)
Date: Thu, 22 Apr 1999 16:37:05 -0500
Subject: [XML-SIG] ANN: Minidom 0.6
Message-ID: <371F9681.D0C314AC@prescod.net>

This is a multi-part message in MIME format.
--------------5BA374B008AE171CF0613077
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Attached is a minature, lightweight subset of the DOM with a few
extensions for namespace handling. (I guess an extended subset is a
contradiction in terms but you get the idea!)

I propose that

 * this become part of the xml package

 *  we consider the DOM-creation functions and namespaces extensions for
adoption in a standard Python DOM API

 * DOM-haters try this out and clearly describe where it falls down in
their applications

 * we try to figure out the right set of convenience functions to make the
DOM more palatable for everybody (if possible).

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

"The Excursion [Sport Utility Vehicle] is so large that it will come
equipped with adjustable pedals to fit smaller drivers and sensor 
devices that warn the driver when he or she is about to back into a
Toyota or some other object." -- Dallas Morning News
--------------5BA374B008AE171CF0613077
Content-Type: text/plain; charset=us-ascii;
 name="minidom.py"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="minidom.py"

"""
minidom.py -- a lightweight DOM implementation based on SAX.

Version 0.6

Usage:
======

dom = DOMFromString( string )
dom = DOMFromURL( URL, SAXbuilder=None )
dom = DOMFromFile( file, SAXbuilder=None ):

Actually, the three constructor methods work with PyDOM as well as minidom.
Use xml.dom.sax_builder.SaxBuilder() for PyDOM.

Classes:
=======
The main classes are Document, Element and Text

Document:
	childNodes: heterogenous Python list
	documentElement: root element

Element:
	# main properties
	tagName: element type name (with colon, if it has one)
	childNodes: heterogenous Python list

	# attribute getting methods
	getAttribute( "foo" ): string value of foo attribute
	getAttribute( "foo", "someURI" ): string value of foo attribute in 						namespace named by URI
	# namespaces stuff:
	prefix: type name prefix
	localName: type name following colon
        uri: uri associated with prefix

	#advanced attribute stuff
	attributes: returns attribute mapping object

Text:
	data: get the text data

Todo:
=====
 * convenience methods for getting elements and text.
 * more testing
 * bring some of the writer an linearizer code into conformance with this
   interface
"""
from xml.sax import saxexts
from xml.sax.saxlib import HandlerBase
import string
from StringIO import StringIO
import dom.core

class Node:
	inGetAttr=None
	def __getattr__( self, key ):
		if self.inGetAttr:
			raise AttributeError, key
		elif key[0:4]=="get_":
			return (lambda self=self, key=key: 
				getattr( self, key[4:] ))
		else:
			raise AttributeError, key
		#	self.inGetAttr=1
		#	func = getattr( self, "get_"+key )
		#	del self.inGetAttr
		#	return func()


class Document( Node ):
	nodeType=dom.core.DOCUMENT_NODE
	def __init__( self ):
		self.childNodes=[]
		self.documentElement=None

__URI=0
__VALUE=1

__PREFIX=0
__LOCAL=1

def _qname2String( key ):
	if key[__PREFIX]:
		return string.join( key, ":" )
	else:
		return key[__LOCAL]

def _getVal( val ):
	return val[__VALUE]

class Attribute(Node):
	def __init__( self, name, value ):
		self.name=name
		self.value=value	

class AttributeList: 
	def __init__( self, attrs ):
		self.__attrs=attrs

	def items( self ):
		names = map( _qname2String, self.__attrs.keys() )
		values = map( _getVal, self.__attrs.values() )
		return map( None, names, map( Attribute, names, values ) )
	
	def keys( self ):
		return map( _qname2String, self.__attrs.keys() )

	def values( self ):
		return map( _getVal, self.__attrs.values() )

	def __getitem__( self, attname ):
		if type( attname )==types.String: 
			parts = string.split( attname, ":")
			if len(parts)==1:
				tup = self.__attrs[(None,parts[0])]
			else:
				tup = self.__attrs[tuple(parts)]
			return tup[__VALUE]
		elif type(attname)==types.TupleType and len( attname ) == 2:
			local,uri=attname
			for key,val in self.__attrs.items():
				if val[__URI]==uri and key[__LOCAL]==local:
					return val[__VALUE]
			raise KeyError, attname
		else:
			raise TypeError, attname

class Element( Node ):
	nodeType=dom.core.ELEMENT_NODE
	def __init__( self, tagName ):
		self.tagName = tagName
		self.childNodes=[]
		self.__attrs=None

	def getAttribute( self, attname, uri=None ):
		if uri:
			return self.__attrs[(attname,uri)]
		else:
			return self.__attrs[attname]


class Comment( Node ):
	nodeType=dom.core.COMMENT_NODE
	def __init__(self, data ):
		self.data=data

class ProcessingInstruction( Node ):
	nodeType=dom.core.PROCESSING_INSTRUCTION_NODE
	def __init__(self, target, data ):
		self.target = target
		self.data = data

class Text( Node ):
	nodeType=dom.core.TEXT_NODE
	def __init__(self, data ):
		self.data = data

class Error( Node ):
	def __init__(self, *args ):
		self.message = string.join( map( repr, args ) ) 

	def __repr__( self ):
		return self.message

class SaxBuilder( HandlerBase ):
	def __init__(self ):
		HandlerBase.__init__(self)
		self.cur_node = self.document = Document()
		self.cur_node.namespace={"xml": 
					"http://www.w3.org/XML/1998/namespace",
					None:None, "xmlns":None}
		self.cur_node.parent=None

	def addChild( self, node ):
		self.cur_node.childNodes.append( node )

	def nssplit( self, qname ):
		if string.find( qname, ":" )!=-1:
			prefix,local = string.split( qname, ":" )
		else:
			prefix,local = None,qname
		
		node = self.cur_node
		while node:
			if node.namespace.has_key(prefix):
				uri = node.namespace[prefix]
				return (prefix,local,uri)
			node=node.parent

		raise Error, "Namespace def not found for "+prefix

	def handleAttrs( self, attrs ):
		outattrs = {}
		handleLater = []

		for (attrname,value) in attrs.items():
			if attrname[0:6]=="xmlns:":
				prefix,local=string.split( attrname, ":" )
				outattrs[(prefix,local)]=(None,value)
				self.cur_node.namespace[local]=value
			elif attrname=="xmlns":
				prefix,local=(None,"xmlns")
				outattrs[(prefix,local)]=(None,value)
				self.cur_node.namespace[None]=value
			else: 
				handleLater.append( (attrname, value ) )

		for (attrname,value) in handleLater:
			(prefix,local,uri)=self.nssplit( attrname )
			outattrs[(prefix, local)]=(uri,value)

		return outattrs

	def startElement( self, tagname , attrs={} ):

		node = Element( tagname )
		self.addChild( node )

		node.parent = self.cur_node
		self.cur_node = node
		self.cur_node.namespace = {None:None,"xmlns":None}
		node.attributes = AttributeList( self.handleAttrs( attrs ) )

		node.tagname = tagname
		(node.prefix, node.localName, node.uri)= self.nssplit( tagname )


	def endElement( self, name, attrs={} ):
		del self.cur_node.namespace

		node = self.cur_node
		self.cur_node = node.parent
		del node.parent

	def comment( self, s):
		self.addChild( Comment( s  ) )

	def processingInstruction( self, target, data ):
		node = ProcessingInstruction( target, data )
		self.addChild( node )

	def characters( self, chars, start, length ): 
		node = Text( chars[start:start+length] )
		self.addChild( node )

	def endDocument( self ):
		assert( not self.cur_node.parent )
		del self.cur_node.parent
		for node in self.cur_node.childNodes:
			if node.nodeType==dom.core.ELEMENT_NODE:
				self.document.documentElement = node
		if not self.document.documentElement:
			raise Error, "No document element"

		del self.cur_node.namespace

# public constructors
def DOMFromString( string ):
	return DOMFromFile( StringIO( string ) )

def DOMFromURL( URL, builder=None ):
	builder = builder or SaxBuilder()
	p=saxexts.make_parser()
	p.setDocumentHandler( builder  )
	p.parse( URL )
	return builder.document

def DOMFromFile( file, builder=None ):
	builder = builder or SaxBuilder()
	p=saxexts.make_parser()
	p.setDocumentHandler( builder  )
	p.parseFile( file )
	return builder.document

if __name__=="__main__":
	import sys, os
	file = os.path.join( os.path.dirname( sys.argv[0] ), "test/quotes.xml" )
	docs=[]
	docs.append( DOMFromURL( file  ) )
	docs.append( DOMFromFile( open( file ) ) )
	docs.append( DOMFromString( open( file ).read()  ) )

	from xml.dom.writer import XmlWriter 
	import xml.dom.sax_builder

	# test against PyDOM
	docs.append( DOMFromURL( file,  xml.dom.sax_builder.SaxBuilder() ) )

	outputs=[]

	for doc in docs:
		outputs.append( StringIO() )
		XmlWriter(outputs[-1]).walk( doc )

	for output in outputs[1:]:
		assert output.getvalue() == outputs[0].getvalue()
	print output.getvalue()

# I don't like modules that export their imported modules
for key,value in locals().items():
	if `type( value )` =="<type 'module'>":
		del locals()[key]
del key, value


--------------5BA374B008AE171CF0613077--


From paul@prescod.net  Fri Apr 23 17:55:08 1999
From: paul@prescod.net (Paul Prescod)
Date: Fri, 23 Apr 1999 11:55:08 -0500
Subject: [XML-SIG] qp API
References: <Pine.LNX.3.95.990421223551.12908D-100000@ns1.lyra.org>
Message-ID: <3720A5EC.D79A464@prescod.net>

Greg Stein wrote:
> 
> Actually: that is a good point.... what is "lightweight" ? I define that
> as something that is fast, has a small set of objects, and has a small
> interface (few objects/methods).
> 
> A question was asked: do we need Yet Another Interface? I believe that we
> do. IMO, the qp interface is very well tuned towards apps being able to
> interpret what is really going on when an XML doc arrives (yes, within
> certain constraints). IMO, the DOM is great for translations of input XML
> to output XML. But someting like qp is handy for grabbing input and
> dealing with it (I was never able to really do that well with the DOM).

I hear three different issues:

 * performance
 * size of interface
 * walking-around convenience

I think that a lightweight DOM implementation can go a long way toward
meeting these requirements.

Performance: 

If we take out parent and sibling pointers, I see know reason that a DOM
implementation should be more than a few percent slower than qp_xml. In
the minidom implementation I am working on, 60% of the code and probably a
big chunk of the runtime is dedicated to the stupid^H^H^H^H^H^H
inconvenient namespace processing. If we're both doing namespace
processing we will both incur that overhead. Even with namespaces, whole
thing is less than 300 lines of code!

Size of interface: 

Minidom has 3 builder methods (building from strings, files and filenames)
and 6 runtime classes -- only one of which is even mildly complex (again,
because of namespace handling) If you are handling simple documents
without PIs and comments  then you only need to deal with three classes:
document, element and text. In other words the interface that most people
will use is rather small.

convenience:

We can add convenience functions that allow people with different
interests to find the information that they need in an XML document.
Convenience functions add to the interface but they don't really affect
performance much. So far I have identified a need to be able to iterate
over the elements, find a child element by its type name and deeply
conctenate the text of a node. What else?

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

"The Excursion [Sport Utility Vehicle] is so large that it will come
equipped with adjustable pedals to fit smaller drivers and sensor 
devices that warn the driver when he or she is about to back into a
Toyota or some other object." -- Dallas Morning News


From paul@prescod.net  Fri Apr 23 18:06:22 1999
From: paul@prescod.net (Paul Prescod)
Date: Fri, 23 Apr 1999 12:06:22 -0500
Subject: [XML-SIG] PySAX more pythonish
Message-ID: <3720A88E.C041B08B@prescod.net>

I would like the attributes parameter to startElement to be defaulted in
all SAX implementations.

I would also like a new method called "text" similar to the one
implemented by xml.dom.sax_builder. "text" just takes a text string
instead of a string and offsets. That's a little more pythonish for both
the caller and callback.

The default DocumentHandler would re-route "characters" to "text". Someone
who needed the (potentially) more efficient behavior of "characters" could
override the implementation and re-route text to characters instead.

What do you think?
-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

Company spokeswoman Lana Simon stressed that Interactive 
Yoda is not a Furby. Well, not exactly. 

"This is an interactive toy that utilizes Furby technology," 
Simon said. "It will react to its surroundings and will talk." 
  - http://www.wired.com/news/news/culture/story/19222.html


From Fred L. Drake, Jr." <fdrake@acm.org  Fri Apr 23 20:18:34 1999
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Fri, 23 Apr 1999 15:18:34 -0400 (EDT)
Subject: [XML-SIG] PySAX more pythonish
In-Reply-To: <3720A88E.C041B08B@prescod.net>
References: <3720A88E.C041B08B@prescod.net>
Message-ID: <14112.51082.402231.654127@weyr.cnri.reston.va.us>

Paul Prescod writes:
 > I would like the attributes parameter to startElement to be defaulted in
 > all SAX implementations.
...
 > I would also like a new method called "text" similar to the one
 > implemented by xml.dom.sax_builder. "text" just takes a text string

  Sounds good to me!


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives


From Fred L. Drake, Jr." <fdrake@acm.org  Fri Apr 23 20:21:34 1999
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Fri, 23 Apr 1999 15:21:34 -0400 (EDT)
Subject: [XML-SIG] qp API
In-Reply-To: <3720A5EC.D79A464@prescod.net>
References: <Pine.LNX.3.95.990421223551.12908D-100000@ns1.lyra.org>
 <3720A5EC.D79A464@prescod.net>
Message-ID: <14112.51262.675379.668123@weyr.cnri.reston.va.us>

Paul Prescod writes:
 > If we take out parent and sibling pointers, I see know reason that a DOM
 > implementation should be more than a few percent slower than qp_xml. In

  I'd like to see the parent pointer kept, but I'm also fine with an
explicit destroy() or close() method instead of those damnable
proxies.
  I haven't actually needed sibling pointers, so I'm not sure I care
about them.  They can be computed easily enough if someone wants the
data on an "occaisional" basis.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives


From akuchlin@cnri.reston.va.us  Fri Apr 23 21:01:02 1999
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Fri, 23 Apr 1999 16:01:02 -0400 (EDT)
Subject: [XML-SIG] qp API
In-Reply-To: <14112.51262.675379.668123@weyr.cnri.reston.va.us>
References: <Pine.LNX.3.95.990421223551.12908D-100000@ns1.lyra.org>
 <3720A5EC.D79A464@prescod.net>
 <14112.51262.675379.668123@weyr.cnri.reston.va.us>
Message-ID: <14112.52252.324863.576300@amarok.cnri.reston.va.us>

Fred L. Drake writes:
>  I'd like to see the parent pointer kept, but I'm also fine with an
>explicit destroy() or close() method instead of those damnable
>proxies.

	What problems do the proxies present?  It would be possible to
remove them and use an explicit destroy() if they present technical
problems of their own.
		
>  I haven't actually needed sibling pointers, so I'm not sure I care
>about them.  They can be computed easily enough if someone wants the
>data on an "occaisional" basis.

	If you have parent and child pointers, you don't need sibling
pointers since you just go up to the parent & retrieve its children. 

	I haven't really formed an opinion about the Minidom module.
On the one hand, I don't like adding an interface that resembles
another interface; too many similar choices can be confusing.  (But if
PyDOM is upward-compatible with Minidom, that may not be a problem.)
On the other hand, PyDOM *is* quite heavyweight, and I can understand
the desire for something similar.  Can people please give their
opinions about this?  

	(I do like the convenience functions like DOMFromString;
something similar should definitely be added, perhaps to dom.utils.)
	
-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
I don't believe in an afterlife, so I don't have to spend my whole life
fearing hell, or fearing heaven even more. For whatever the tortures of hell,
I think the boredom of heaven would be even worse.
    -- Isaac Asimov 1920-1992 RIP


From Fred L. Drake, Jr." <fdrake@acm.org  Fri Apr 23 21:18:48 1999
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Fri, 23 Apr 1999 16:18:48 -0400 (EDT)
Subject: [XML-SIG] qp API
In-Reply-To: <14112.52252.324863.576300@amarok.cnri.reston.va.us>
References: <Pine.LNX.3.95.990421223551.12908D-100000@ns1.lyra.org>
 <3720A5EC.D79A464@prescod.net>
 <14112.51262.675379.668123@weyr.cnri.reston.va.us>
 <14112.52252.324863.576300@amarok.cnri.reston.va.us>
Message-ID: <14112.54696.546630.72873@weyr.cnri.reston.va.us>

I wrote:
 >  I'd like to see the parent pointer kept, but I'm also fine with an
 >explicit destroy() or close() method instead of those damnable
 >proxies.

Andrew M. Kuchling writes:
 > 	What problems do the proxies present?  It would be possible to
 > remove them and use an explicit destroy() if they present technical

  They require a lot of object creation, and slow things down a lot
for tree walking and generally ensuring you have sufficiently
"current" references.

 > 	If you have parent and child pointers, you don't need sibling
 > pointers since you just go up to the parent & retrieve its children. 

  That's what I meant about them being easily computable.

 > PyDOM is upward-compatible with Minidom, that may not be a problem.)
 > On the other hand, PyDOM *is* quite heavyweight, and I can understand
 > the desire for something similar.  Can people please give their
 > opinions about this?  

  I think sufficient compatibility can be kept.  While what I've been
doing isn't performance critical, it can be a real nuissance.  I'd
like it to be fast for the same reasons I want a compiler to be fast:
sometimes I'm actually waiting in blocking mode.  ;-(
  I may have a more interesting need for performance in the future,
but I'm not sure yet.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives


From paul@prescod.net  Fri Apr 23 21:53:06 1999
From: paul@prescod.net (Paul Prescod)
Date: Fri, 23 Apr 1999 15:53:06 -0500
Subject: [XML-SIG] qp API
References: <Pine.LNX.3.95.990421223551.12908D-100000@ns1.lyra.org>
 <3720A5EC.D79A464@prescod.net> <14112.51262.675379.668123@weyr.cnri.reston.va.us>
Message-ID: <3720DDB2.2483DD7F@prescod.net>

"Fred L. Drake" wrote:
> 
> Paul Prescod writes:
>  > If we take out parent and sibling pointers, I see know reason that a DOM
>  > implementation should be more than a few percent slower than qp_xml. In
> 
>   I'd like to see the parent pointer kept, but I'm also fine with an
> explicit destroy() or close() method instead of those damnable
> proxies.

The problem with close() is that it is O(N) with the size of your
document, isn't it? I'm on the fence about parent pointers...maybe they
should be a construction option. They would be off by default.

>   I haven't actually needed sibling pointers, so I'm not sure I care
> about them.  They can be computed easily enough if someone wants the
> data on an "occaisional" basis.

True.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

Company spokeswoman Lana Simon stressed that Interactive 
Yoda is not a Furby. Well, not exactly. 

"This is an interactive toy that utilizes Furby technology," 
Simon said. "It will react to its surroundings and will talk." 
  - http://www.wired.com/news/news/culture/story/19222.html


From Fred L. Drake, Jr." <fdrake@acm.org  Fri Apr 23 22:50:29 1999
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Fri, 23 Apr 1999 17:50:29 -0400 (EDT)
Subject: [XML-SIG] qp API
In-Reply-To: <3720DDB2.2483DD7F@prescod.net>
References: <Pine.LNX.3.95.990421223551.12908D-100000@ns1.lyra.org>
 <3720A5EC.D79A464@prescod.net>
 <14112.51262.675379.668123@weyr.cnri.reston.va.us>
 <3720DDB2.2483DD7F@prescod.net>
Message-ID: <14112.60197.369126.212959@weyr.cnri.reston.va.us>

Paul Prescod writes:
 > The problem with close() is that it is O(N) with the size of your
 > document, isn't it? I'm on the fence about parent pointers...maybe they
 > should be a construction option. They would be off by default.

  O(N) is right, but the constant is small enough to make up for it
with any measure of real work going on while the tree is live.  Having 
it be optional would be quite sufficient for me.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives


From paul@prescod.net  Fri Apr 23 22:04:25 1999
From: paul@prescod.net (Paul Prescod)
Date: Fri, 23 Apr 1999 16:04:25 -0500
Subject: [XML-SIG] qp API
References: <Pine.LNX.3.95.990421223551.12908D-100000@ns1.lyra.org>
 <3720A5EC.D79A464@prescod.net>
 <14112.51262.675379.668123@weyr.cnri.reston.va.us> <14112.52252.324863.576300@amarok.cnri.reston.va.us>
Message-ID: <3720E059.28CD59BC@prescod.net>

"Andrew M. Kuchling" wrote:
> 
>         If you have parent and child pointers, you don't need sibling
> pointers since you just go up to the parent & retrieve its children.

Well, yes and no. If you have 10,000 nodes how do you get the next and
previous node easily? (easily is the key word here)

>         I haven't really formed an opinion about the Minidom module.
> On the one hand, I don't like adding an interface that resembles
> another interface; too many similar choices can be confusing.  (But if
> PyDOM is upward-compatible with Minidom, that may not be a problem.)

I certainly intend for minidom to be a subset of PyDOM and 4DOM. Any
extensions I made should be interpreted as suggestions for extensions to
PyDOM and 4DOM.

>         (I do like the convenience functions like DOMFromString;
> something similar should definitely be added, perhaps to dom.utils.)

Why not in "dom" itself? I don't see them as utilities but as the
fundamental, commonly used entry points to DOM functionality.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

Company spokeswoman Lana Simon stressed that Interactive 
Yoda is not a Furby. Well, not exactly. 

"This is an interactive toy that utilizes Furby technology," 
Simon said. "It will react to its surroundings and will talk." 
  - http://www.wired.com/news/news/culture/story/19222.html


From paul@prescod.net  Fri Apr 23 22:16:05 1999
From: paul@prescod.net (Paul Prescod)
Date: Fri, 23 Apr 1999 16:16:05 -0500
Subject: [XML-SIG] DOM API
References: <Pine.LNX.3.95.990421223551.12908D-100000@ns1.lyra.org>
 <3720A5EC.D79A464@prescod.net>
 <14112.51262.675379.668123@weyr.cnri.reston.va.us> <14112.52252.324863.576300@amarok.cnri.reston.va.us>
Message-ID: <3720E315.A4532E83@prescod.net>

"Andrew M. Kuchling" wrote:
> 
>         (I do like the convenience functions like DOMFromString;
> something similar should definitely be added, perhaps to dom.utils.)

As I said in my other messages, I want minidom to be a of PyDOM and 4DOM
and hopefully the start of a common API. In that vein, minidom makes some
decisions and extensions that we should discuss:

dom = DOMFromString( string, SAXbuilder=None )
dom = DOMFromURL( URL, SAXbuilder=None )
dom = DOMFromFile( file, SAXbuilder=None )

The default SAXBuilder would probably be the PyDOM or minidom builder.

Minidom uses mixed lower-first for property names. For compatibility with
PyDOM, properties can be requested through get_ methods. My question is:
do we really need get_ methods? They don't seem very Pythonish to me. Or
maybe we can use them as implementation mechanism (_get_) but not expose
them to the client.

I prefer the class-specific properties to the weird generic ones: tagName
to nodeName, value to nodeValue and so forth. Obviously PyDOM and 4DOM
would implement both but I don't see any reason to support that redundancy
in minidom.

I made some namespace extensions because we can't wait forever to do
namespace support.

getAttribute( "foo", "http://www.blah.bar" )

Looks up the obvious attribute.

element.localName gets the second have of the element type name.

element.uri gets the URI associated with the prefix.

element.prefix gets the element's prefix. I don't think that the
namespaces view that prefixes are irrelevant should obviate the XML 1.0
view that they are NOT. Even if we accept the namespaces view of the world
entirely, prefixes are chosen to be mmenonmic so they shouldn't be
discared by software.

element.attributes returns an attribute mapping object that I think
behaves exactly like PyDOMs except for namespace support:

x.attributes["foo", "http://www.blah.bar"]

This also works, however:

x.attributes["bar:foo"] (just as in PyDOM)

Namespace attributes ARE maintained as attributes. keys(), items() and
values() should be the same as PyDOM.

I should unify my Error class with PyDOM's.

I am considering the following enhancements:

element.elements: returns a list of element children.

element.getText: returns a list of deep list of data from the text nodes.
Do your own string.join to choose an appropriate join character.

element.getChild("FOO") returns the first child (not descendant) element
with specified element type name.

element.getChild("FOO", "http://...") does the obvious thing.

element.getChild( "#PCDATA" ) gets a list of child text nodes.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

Company spokeswoman Lana Simon stressed that Interactive 
Yoda is not a Furby. Well, not exactly. 

"This is an interactive toy that utilizes Furby technology," 
Simon said. "It will react to its surroundings and will talk." 
  - http://www.wired.com/news/news/culture/story/19222.html


From ke@gnu.franken.de  Sat Apr 24 06:26:12 1999
From: ke@gnu.franken.de (Karl Eichwalder)
Date: 24 Apr 1999 07:26:12 +0200
Subject: [XML-SIG] xml-0.5.1: LICENCE (xmlarch)
Message-ID: <yegzp3yr3m3.fsf@luna.gnu.franken.de>

There's room for interpretation.  The LICENCE file says:

    xmlarch:
    --------------------------------------------------------------------
    Copyright (C) 1998 by Geir O. Gr�nmo, grove@infotek.no
    
    Free for commercial and non-commercial use.
    --------------------------------------------------------------------

But arch/xmlarch.py says (as it stands, this implies, that it's "unfree"
under certain circumstances):

    Copyright (C) 1998 by Geir O. Gr�nmo, grove@infotek.no
    
    It is free for non-commercial use, if you modify it please let me
    know.

aTdHvAaNnKcSe for clarification!

-- 
Karl Eichwalder


From gstein@lyra.org  Sat Apr 24 11:21:02 1999
From: gstein@lyra.org (Greg Stein)
Date: Sat, 24 Apr 1999 03:21:02 -0700
Subject: [XML-SIG] DOM Considered Harmful :-)
Message-ID: <37219B0E.3123A201@lyra.org>

All right... I've been slow to respond and only minimal because I was
out this past week (but still had minimal access). I'm leaving in about
three hours to Mexico... I'll have zero access for a week. Of course,
this means that I have the privilege of posting something highly
controversial with the hope that an argument will continue for the next
seven days and I can rejoin it at that time :-)

Okay... seriously, though, I'd like to state my opposition to a DOM, a
subset, or a DOM-like API for a "lightweight" XML parsing solution.

Here are my assumptions/requirements/etc:

1) lightweight means:
   a) fast as possible
   b) conceptually simple for the user
   c) narrow interface (somewhat related to (b))
2) 1b, 1c imply simple doc, so a non-DOM interface is not a hurdle
3) this API is only for consuming XML
4) it is fine to "fall back" to the DOM if the lightweight API doesn't
meet a client's needs
   a) corollary: the ability to swap in alternative parsers is not
required
   b) corollary: ORB compat is not required
   c) corollary: stylistic compatibility (with other language's XML
libraries) is not required


A couple items have been discussed on the list which I'd like to call
out and respond to:

1) the DOM concept of node types

IMO, this is one of the most broken things about the DOM. The child
nodes end up being some random mixture of various element types. Any
client trying to deal with this must *test* each node before they use it
to see if they're looking at the right thing. This is very troublesome.
As a real-world example, when I coded davlib against the DOM and I
needed the first (only) child element of my <propfind> element, there
was no easy and evident approach to this. I knew that child element
would be a <prop>, but what happened was that Text nodes were mixed in.
"oh, well do a findByTagName" or whatever. That wouldn't help on the
next case, where I needed each of the child elements for the <prop>.

Also, look at that answer: "use findByTagName" ... that is simply a
mechanism to get around the fact that the DOM has introduced a
hard-to-use structure. Paul recently followed up to his original
proposal with another proposal to add new methods to his element
objects. Specifically, the getChild() method -- again, this was
introduced *solely* due to the fact that the DOM has a heterogeneous
list of children. The client must apply various filters and other
processing to get useful information. The system must apply tests "is
this the right node type?" here and there.

In one of Paul's original responses to my post, he listed "convenience"
as part of the definition of "lightweight". It sure is, but his response
to making a DOM subset convenient was to introduce helper functions.

I think this is quite broken. As a comparison, the qp_xml module returns
an element that has *only* elements for children. There is no filtering
or other things to get past. The list items are *known*. The text is
stored outside of that list so that you don't have to manually separate
the two all the time. Essentially, qp_xml is easy/convenient
*inherently* rather than patched-up via convenience functions.

In summary, I maintain that any DOM-style system is not inherently
simple or easy to use because of its heterogeneous node lists. I further
believe that something like qp_xml is much nicer all around because its
simplicity/ease/etc originates right from the bottom, rather than being
hidden behind a second layer of API.

Disclaimer: qp_xml does have a convenience function (the textof()
function, which could/should be a method instead). The existence of the
function is based solely on the underlying representation of text
contents, where that design was chosen to be able to retain the document
structure (insofar as elems/text are retained).


2) the close() method and parent/sibling relationships

Adding parent/sibling relationships introduces loops unless you use
proxies or introduce a close() method (if there is another way, then I'd
like to learn it). Proxies are out for efficiency reasons -- objects get
constructed every time you simply want to peek into the data structure.
While the complexity is (mostly) hidden from the client, it is still
there. You don't end up with simple data structures... instead, you get
a lot of "mechanism" in there to deal with intercepting accesses so that
you can create a proxy to bundle up the necessary data.

A close() type method introduces other problems. If you aren't careful,
then it is easy to leak the entire parse tree. What happens if you pass
a subset of the tree to another subsystem? You will have one of two
problems: 1) the client avoids calling close() so the subsystem can use
parent references (this leaks the whole tree); or 2) the client calls
close() so the subsystem only retains its subtree, but now its
(expected) parent/sibling relationsips no longer work. It has a set of
objects that don't fully respond to their published API.

Other alternatives: ways to detach the subtree or specifying that the
elements have two defined states (with and without parent/sibling
relationships). Gee... now we're getting into complex APIs for the
client to deal with.

I'm tremendously in favor of the model returned by qp_xml. You get a set
of simple objects that have no methods. They are really just attribute
retainers. Inside these, you have a *Python* list of children, and a
*Python* mapping of attributes. Nothing fancy. Simple and easy.

Note: personally, I believe that the client can operate quite fine
without parent or sibling pointers. If a function needs an element's
parent, then whoever passed the element should pass the parent, too.
From a conceptual level, I am also a bit shaky on an element knowing
anything about its parents or siblings. It would seem that anything
dealing with a particular element should do so in a context-free manner.

Note 2: if you really need parents/siblings (i.e. it is difficult to
structure your app to avoid them), then you can always fall back to the
DOM.


Okay... now a couple other issues:

* processing instructions.  (thanx Paul for the links)

I looked at the three specs that Paul linked (didn't need the XML spec..
I knew what they were! :-). Two of them, the DDML and DCD specs, use PIs
only as a means of checking the conformance of a document. The document
can be parsed and handled with or without the PIs.

The third: style sheets. Ick. The PI contains actual data, rather than
conformance issues. I note that a Rationale has been appended to the
spec. I bet that was added because the PI is used for more than document
processing (i.e. it alters semantics).

A minimal approach to PIs might be to include *only* the PIs that occur
in the prolog into a list. Since the xml-stylesheet PI can only occur in
the prolog, this approach would pick them up. (not that I like it though
:-)

* note to Paul: the code you posted is broken :-). You apply the default
namespace to attributes that have no prefix. The XML Namespaces spec
states that no prefix on an attribute means "no namespace". You also
fail to distinguish between "no namespace" in the original state of
beginning to parse, and when somebody resets it using xmlns="". In
addition, you reset the default namespace to "no namespace" inside each
startElement.
[ and a Q: why do you have the "xmlns" prefix defined in startElement? ]
[ design comment: I don't think you want to retain prefixes... if
clients believe they can use a prefix that you provide, then problems
will develop *very* quickly. If the client isn't careful, they could end
up with conflicting prefixes. trust me on this one... mod_dav has a
*bitch* of a time dealing with namespace prefixes. I highly recommend
that you drop them; similarly, I believe you should filter the xmlns*
attributes. ]
[ design comment: you should probably index your attributes by (uri,
local) rather than prefix. the client does not know the prefixes ahead
of time, so they will be unable to fetch the attributes. ]

* comments on loss of information (also on "why not use SAX?")

The tree form is very useful. Without it, then an application would need
to implement a state machine to effectively process parsed XML. Seeing a
<prop> element means nothing in itself. When you post-process the tree
and step down thru the tree, the parent <propstat> will place you into
the proper state.

For programmers/clients, the tree model is also very handy. It exists
*outside* of the parsing event. Clients may not be able to structure
their responses to the input to be part of the parsing event stream.

Regarding loss of info: for many applications, the client only needs to
know the contents. The finer details of the document structure are
pointless. These applications are typically using XML as a data transfer
mechanism, rather than a layout mechanism. DAV and XML-RPC are two
examples. PIs and comments are not useful.


I'll send some individual replies to the other emails. This email,
however, is my overall summary and argument against DOM-like APIs.

I maintain that an API such as that provided by qp_xml is very useful
for a particular class of applications. Further, I maintain that it
would be a Good Thing to include qp_xml (or whatever name and with
whatever API/code tweaks) be included into the XML distribution.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From gstein@lyra.org  Sat Apr 24 11:41:26 1999
From: gstein@lyra.org (Greg Stein)
Date: Sat, 24 Apr 1999 03:41:26 -0700
Subject: [XML-SIG] DOM API
References: <Pine.LNX.3.95.990420003128.32319A-100000@ns1.lyra.org> <371C9388.D0690375@prescod.net>
Message-ID: <37219FD6.73061D0@lyra.org>

Paul Prescod wrote:
> ...
> http://www.w3.org/TR/REC-xml
> http://www.w3.org/TR/xml-stylesheet
> http://www.w3.org/TR/NOTE-dcd
> http://www.w3.org/TR/NOTE-ddml
> 
> Well let's put it this way: XML 1.0 uses PIs.

XML 1.0 *defines* PIs. That is very different.

> So does the stylesheet
> binding extension (for CSS and XSL).

This is what I was looking for: the *use* of a PI.

Per my other email (treatise? :-), I think that I've discovered we are
operating within two classes of applications:

* data-oriented use of XML
* layout-oriented use of XML

For the former, I have not seen a case where a PI is necessary. For the
latter: yes, you need a PI for stylesheets. Too bad... you get to use
the DOM :-)

> I don't doubt that namespaces are important but they can easily be viewed
> as an extension of (or layer on top of) the minimal API.

Nope. Namespaces are critical, as Fredrik has pointed out. My endeavors
to use namespaces within the DOM style of programming has also led me to
believe that it isn't a simple extension or layer on top of a minimal
API. Why? Well... if you attempt to post-process the namespace
information, then where do you store it? The client that is doing the
post-processing only receives *proxy* objects. It cannot drop the
information there since those objects are *not* persistent. Instead, the
client has to reach into the internals of the DOM to set (and get!) the
namespace info. Bleck!

> There are four objects there. If we want it to be a tree we need a wrapper
> object that contains them. You could argue that in the lightweight API the
> version and doctype information could disappear but surely we want to
> allow people to figure out what stylesheets are attached to their
> documents!

I maintain that the stylesheets are not applicable to certain classes of
XML processing. So yes, they get punted too.

A simple API of elements and text is more than suitable.

> > NodeType is bogus. It should be absolutely obvious from the context what a
> > Node is. If you have so many objects in your system that you need NodeType
> > to distinguish them, then you are certainly not a light-weight solution.
> 
> XML is a dynamically typed language, like XML. If I have a mix of
> elements, characters and processing instructions then I need some way of
> differentiating them. I don't feel like it is the place of an API to
> decide that XML is a strongly typed language and silently throw away
> important information from the document.

Hello? It *is* the place of the API to define semantics. That is what
APIs do.

I can understand if you don't like this particular semantic, but I feel
your argument is deeply flawed.

> > > Document.DocumentElement (an element node property)
> >
> > If Document has no other properties, then it is totally bogus. Just return
> > the root Element. Why the hell return an object with a single property
> > that refers to another object? Just return that object!
> 
> Document should also have ChildNodes.

Your spec didn't show it. Okay... so it has ChildNodes. How do you get
the root element? Oops. You have to scan for the thing. Painful!

> > If you want light-weight, then GetAttribute is bogus given that the same
> > concept is easily handled via the .Attributes value. Why introduce a
> > method to simply do Element.Attributes.get(foo) ??
> 
> GetAttribute is simpler, more direct and maybe more efficient in some
> cases. It works with simple strings and not attribute objects.

It will *never* be more efficient. Accessing a Python attribute and
doing a map-fetch will always be faster than a method call. Plain and
simple.

(caveat: as I mentioned in prior posts, qp_xml should be using a mapping
rather than a list of objects... dunno what I was thinking)

> > > Element.TagName
> > > Element.PreviousSibling
> > > Element.NextSibing
> >
> > These Sibling things mean one of two things:
> >
> > 1) you have introduced loops in your data structure
> > 2) you have introduced the requirement for the proxy crap that the current
> > DOM is dealing with (the Node vs _nodeData thing).
> >
> > (1) is mildly unacceptable in a light-weight solution (you don't want
> > people to do a quick parse of data, and then require them to follow it up
> > with .close()).
> 
> I don't see this as a big deal.
> 
> This is an efficiency versus simplicity issue. These functions are
> extremely convenient in a lot of situations.

The origin of qp_xml was for efficiency first, simplicity second. I
maintain that qp_xml provides both.

I will agree to disagree that parents and siblings are useful. (IMO,
they are not, and only serve to complicate the system).

> > Case in point: I wrote a first draft davlib.py against the DOM. Damn it
> > was a serious bitch to simply extract the CDATA contents of an element!
> 
> XML is a dynamically typed language. "I've implemented Java and now I'm
> trying to implement Python and I notice that you guys through these
> PyObject things around and they make my life harder. I'm going to dump
> them from my implementation."

Again, back to this "dynamically typed language". That is your point of
view, rather than a statement of fact. I won't attempt to characterize
how you derived that point of view (from the DOM maybe?), but it is NOT
the view that I hold.

XML is a means of representing structured data. That structure takes the
form of elements (with attributes) and contained text. I do not see how
XML is a programming langauge, or that it is dynamically typed. It is
simply a representation in my mind.

And I'll ignore the quote which just seems to be silliness or
flamebait...

> > Moreover, it was also a total bitch to simply say "give me the child
> > elements". Of course, that didn't work since the DOM insisted on returning
> > a list of a mix of CDATA and elements.
> 
> It told you what was in your document.

I also get that from qp_xml with a lot less hassle, so that says to me
that the DOM is introducing needless complexity/hassle for the client.

> If you want to include helper functions to do this stuff then I say fine:
> but if you want to throw away the real structure of the document then I
> don't think that that is appropriate.

Helper functions are simply a mechanism to patch the inherent complexity
introduced by the DOM. It does not need to be so complicated. Python has
excellent mechanisms to hold structured data; qp_xml uses them to
provide excellent benefit (relative to the DOM).

The only "structure" that I toss are PIs and comments. I do not view
those as "structure". The contents (elements, attributes, text) are
retained and can be reconstructed from the structure that qp_xml
returns.

> > IMO, the XML DOM model is a neat theoretical expression of OO modelling of
> > an XML document. For all practical purposes, it is nearly useless. (again:
> > IMO) ... I mean hey: does anybody actually use the DOM to *generate* XML?
> > Screw that -- I use "print". I can't imagine generating XML using the DOM.
> > Complicated and processing intensive.
> 
> I'm not sure what your point is here. I wouldn't use the DOM *or* qp_xml
> to generate XML in most cases. As you point out "print" or "file.write" is
> sufficient in most applications. This has nothing to do with the DOM and
> everything to do with the fact that writing to a file is inherently a
> streaming operation so a tree usually gets in the way.

Most of the DOM's interface is for *building* a DOM structure. It is
conceivable that those APIs only exist as a way to response to parsing
events, but I believe their existence is due to the fact that people
want to build a DOM and then generate the resulting XML. Otherwise, we
could have had two levels of the DOM interface: read-only (with private
construction mechanisms), and read-write (as exemplified by the current
DOM).

I believe that the notion of build/generate via the DOM is bogus. It
seems you agree :-), and that print or file.write is more appropriate.
Fredrik has some utility objects to do it. All fine. The DOM just blows
:-)

> > Sorry to go off here, but the DOM really bugs me. I think it is actually a
> > net-negative for the XML community to deal with the beast. I would love to
> > be educated on the positive benefits for expressing an XML document thru
> > the DOM model.
> 
> I think that the DOM is broken for a completely different set of reasons
> than you do. But the DOM is also hugely popular and more widely
> implemented than many comparable APIs in other domains. I'm told that

I could care less about compatibility. I'm trying to write an
application here. Geez... using your viewpoint: if I wanted
compatibility, then maybe I should use Java or C since everybody else
uses that.

> Microsoft's DOM impelementation is referenced in dozens of their products
> and throughout many upcoming technologies. Despite its flaws, the DOM is
> an unqualified success and some people like it more than XML itself. They
> are building DOM interfaces to non-XML data!

Goody for them. That doesn't help me write my application.

> > Use a mapping. Toss the intermediate object. If you just have name and
> > value, then you don't need separate objects. Present the attributes as a
> > mapping.
> 
> In this case I am hamstrung by DOM compatibility. This is a small price to
> pay as long as we keep the simpler GetAttribute methods. The only reason
> to get the attribute objects is when you want to iterate over all
> attributes which is probably relatively rare.

This is why I say "toss the DOM". Help your client programmers, rather
than be subserviant to the masses distorted view of XML programming :-)

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From gstein@lyra.org  Sat Apr 24 11:52:12 1999
From: gstein@lyra.org (Greg Stein)
Date: Sat, 24 Apr 1999 03:52:12 -0700
Subject: [XML-SIG] DOM API
References: <Pine.LNX.3.95.990420003128.32319A-100000@ns1.lyra.org> <008b01be8b11$e19fe550$f29b12c2@pythonware.com> <371CA723.FAEBF6AA@prescod.net>
Message-ID: <3721A25C.6DD5DA3D@lyra.org>

Paul Prescod wrote:
> 
> Fredrik Lundh wrote:
> >
> > the downside with Paul's line of reasoning is that it makes it
> > impossible to come up with something that is light-weight
> > also from the CPU's perspective...  not good.
> 
> That isn't true. I tend to think that usability is more important than
> performance but if we decide to optimize for performance then we can make
> a DOM-compatible API that is as fast as "qp". I mean the only thing that
> is harder to implement in the miniDOM is siblings -- where I chose
> convenience over efficiency. We can make the opposite choice.

I maintain that qp_xml is both highly performant and highly usable.

Per my other emails, I do not believe that the DOM is highly usable. I
also tend to believe that being slaved to the DOM API will always hamper
your performance when you *access* the data structure. Sure... you might
be able to build it nearly as fast (nearly! you may have more objects to
create), but you are constraining access to be through methods rather
than Python data structures.

> In fact, I think that the namespace and language support in qp already
> makes it relatively "heavyweight".

Those are necessary to retain all information from the input XML. Toss
those and you *really* toss out information.

IMO, they do not introduce any "heaviness". They are two attributes that
you can totally ignore. If your document is unconcerned with namespaces,
then ignore the .ns attribute. If you don't care about language-specific
handling in your app, then ignore the .lang attribute. *Nothing* forces
you to use those attributes, so that means they do *not* impinge upon
your client. The only thing their presence does is to add some
descriptive text in the API specification and introduce some overhead in
the parsing process.

> > I want something really light-weight, and highly pythonish, and I
> > don't care the slightest about TLA compatibility.

Go Fredrik! :-)  My kinda guy :-)

> It isn't a question of TLA compatibility. It's about using the data models
> used everywhere else in the world. Python conforms to posix conventions

Hello!?!?! Fredrik just said he DOES NOT CARE.

Why are you stating that he SHOULD? He gets to program according to
whatever guidelines *he* wants.

> To me, this is the central issue: to me, the Guido's genious lies in the
> fact that he usually chooses adapt something before re-inventing it. This
> makes learning Python easy. "Oh yeah, I recognize that from the other
> languages I use." Well, SAX and DOM are what the other languages use.

As I've said before, "Goody for them." You cannot be a slave to a single
API when it does not fulfill your needs. If I may speak for Fredrik, the
two of us want a Pythonish and *fast* way to parse XML, and we don't
care what other languages do because our application is in *PYTHON*. The
DOM does not satisfy our requirements.

> Anyhow, "qp" is hardly more "Pythonic" than in the lightweight DOM API.
> The following_cdata stuff is not like any API I've ever seen in Python or
> elsewhere. The call for "pythonic-ness" is mostly a strawman. The DOM
> works better in Python than in almost any other language: Nodelists are
> lists, NamedNodeLists are maps, object types are instance classes, lists
> can be heterogenous, etc.

The qp stuff uses native Python lists, mappings, and strings. The DOM
uses NodeLists, NamedNodeLists, and TextElements.

following_cdata is a design choice to model the underlying XML. My
implementation of it as a string attribute on an object is very
Pythonish. You just happen to disagree with my design choice. I don't
feel that the choice is not-Python, but is actually an interesting and
unique way to model the XML.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From gstein@lyra.org  Sat Apr 24 11:55:42 1999
From: gstein@lyra.org (Greg Stein)
Date: Sat, 24 Apr 1999 03:55:42 -0700
Subject: [XML-SIG] ANN: Minidom 0.6
References: <371F9681.D0C314AC@prescod.net>
Message-ID: <3721A32E.6A7C1AD@lyra.org>

I posted a general commentary along with a few items in my big email
note. I'll just wrap up with a few extra items here:

Paul Prescod wrote:
> 
> Attached is a minature, lightweight subset of the DOM with a few
> extensions for namespace handling. (I guess an extended subset is a
> contradiction in terms but you get the idea!)
> 
> I propose that
> 
>  * this become part of the xml package

This would be fine, but I do not believe it is okay to include minidom
to the exclusion of qp_xml (or a similar model).

>  *  we consider the DOM-creation functions and namespaces extensions for
> adoption in a standard Python DOM API

Agreed, although the model your propose may need some work, per my other
email.

>  * DOM-haters try this out and clearly describe where it falls down in
> their applications

The DOM model itself is hard to work with. This follows the same
pattern.

>  * we try to figure out the right set of convenience functions to make the
> DOM more palatable for everybody (if possible).

The convenience functions are simply mechanism to avoid the inherent
complexity. The convenience functions will also reduce the speed
benefits that we are trying to achieve. If you don't reduce complexity
or increase speed, they why go this route?

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From gstein@lyra.org  Sat Apr 24 11:59:04 1999
From: gstein@lyra.org (Greg Stein)
Date: Sat, 24 Apr 1999 03:59:04 -0700
Subject: [XML-SIG] qp API
References: <Pine.LNX.3.95.990421223551.12908D-100000@ns1.lyra.org> <3720A5EC.D79A464@prescod.net>
Message-ID: <3721A3F8.6B4A37F2@lyra.org>

Paul Prescod wrote:
>...
> If we take out parent and sibling pointers, I see know reason that a DOM
> implementation should be more than a few percent slower than qp_xml. In

Building it will be about the same. Accessing it will be slower and
harder.

> Size of interface:
> 
> Minidom has 3 builder methods (building from strings, files and filenames)
> and 6 runtime classes -- only one of which is even mildly complex (again,
> because of namespace handling) If you are handling simple documents
> without PIs and comments  then you only need to deal with three classes:
> document, element and text. In other words the interface that most people
> will use is rather small.

Clients will still need to learn and understand the interface (to then
discover they only need a subset). The presence of the other classes
adds complexity to the situation.

> convenience:
> 
> We can add convenience functions that allow people with different

No additional comments here :-)

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From gstein@lyra.org  Sat Apr 24 12:01:05 1999
From: gstein@lyra.org (Greg Stein)
Date: Sat, 24 Apr 1999 04:01:05 -0700
Subject: [XML-SIG] DOM API
References: <Pine.LNX.3.95.990421223551.12908D-100000@ns1.lyra.org>
 <3720A5EC.D79A464@prescod.net>
 <14112.51262.675379.668123@weyr.cnri.reston.va.us> <14112.52252.324863.576300@amarok.cnri.reston.va.us> <3720E315.A4532E83@prescod.net>
Message-ID: <3721A471.19D88620@lyra.org>

Paul Prescod wrote:
>...
> element.prefix gets the element's prefix. I don't think that the
> namespaces view that prefixes are irrelevant should obviate the XML 1.0
> view that they are NOT. Even if we accept the namespaces view of the world
> entirely, prefixes are chosen to be mmenonmic so they shouldn't be
> discared by software.

I discussed this in my other email, but wanted to emphasize the point:

  Retaining the prefix is a *very* bad idea.

I will elaborate if necessary when I return from Mexico, but I will
earnestly ask that any further progress on miniDOM should first remove
this from the API.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From gstein@lyra.org  Sat Apr 24 12:06:13 1999
From: gstein@lyra.org (Greg Stein)
Date: Sat, 24 Apr 1999 04:06:13 -0700
Subject: [XML-SIG] punchy and belligerent? :-)
Message-ID: <3721A5A5.38DA68E7@lyra.org>

My apologies to all, and especially Paul, for any feelings of
defensiveness or insult that my recent series of posts may have
engendered. That certainly is not my intent. My issue is with the DOM
API itself, and with my views on what a clean/simple/lean API should
look like. Needless to say, I have strong convictions here :-)
(that isn't to say that I'm against changes to qp_xml, but simply that I
want to avoid certain characteristics of the DOM... in particular, I'd
like to ask Fredrik to post his suggestions/changes/alternative module
code)

Paul: please don't take any of my comments personally. They are all
based against the DOM. You just happen to be the person posting
commentary on the DOM, so you (unfortunately) have born the brunt of my
posts.

Let's all hope that if I go an absorb many liters of tequila over the
next week that I'll return without my DOM-bashing crusade :-)

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From paul@prescod.net  Sat Apr 24 18:52:54 1999
From: paul@prescod.net (Paul Prescod)
Date: Sat, 24 Apr 1999 12:52:54 -0500
Subject: [XML-SIG] DOM API
References: <Pine.LNX.3.95.990420003128.32319A-100000@ns1.lyra.org> <371C9388.D0690375@prescod.net> <37219FD6.73061D0@lyra.org>
Message-ID: <372204F6.4A6174E1@prescod.net>

Greg Stein wrote:
> 
> XML 1.0 *defines* PIs. That is very different.

Okay, so you agree that PIs are part of XML document instance data. Let me
ask you this, do you think that Gadfly should dump the parts of the SQL
spec that Aaron doesn't like?

> Per my other email (treatise? :-), I think that I've discovered we are
> operating within two classes of applications:
> 
> * data-oriented use of XML
> * layout-oriented use of XML

This is a false dichotomy. Many of my customers are data-oriented people
who want to style their data. For instance I was at a stastical company
last week.

I gave you four specifications that used PIs: XML, xml-stylesheet, DCD and
DDML. Only one of those four has anything to do with stylesheets or
formatting. The other three are as applicable to data as to traditional
documents.

> For the former, I have not seen a case where a PI is necessary. For the
> latter: yes, you need a PI for stylesheets. Too bad... you get to use
> the DOM :-)

So to keep PIs out we should split the interface and (further) confuse new
Python programmers?

> Instead, the
> client has to reach into the internals of the DOM to set (and get!) the
> namespace info. Bleck!

Well, I've decided to put namespace info into minDOM even though it made
it significantly less "lightweight." 

> I maintain that the stylesheets are not applicable to certain classes of
> XML processing. So yes, they get punted too.

If there is a class of processing that does not use a feature then the
feature should be removed? Goodbye namespaces. Goodbye sub-elements.

> A simple API of elements and text is more than suitable.

Not data access APIs. XML's semantics are partially defined in the XML
specification itself and will be fully specified in an upcoming
specification called the "XML Information Set."

http://www.w3.org/TR/NOTE-xml-infoset-req

"The XML Information Set will describe these abstract XML objects and
their properties. It will provide a common reference set that other
specifications can use and extend to construct their underlying data
models, and will help to ensure interoperability among the various
XML-based specifications and among XML software tools in general."

Technical and intellectual interoperability is what I'm fighting for.

> Your spec didn't show it. Okay... so it has ChildNodes. How do you get
> the root element? Oops. You have to scan for the thing. Painful!

doc.childNodes
doc.documentElement

> It will *never* be more efficient. Accessing a Python attribute and
> doing a map-fetch will always be faster than a method call. Plain and
> simple.

This gets back to Mike's question: Are we creating a new library here or
defining a new *interface*? If we're defining a library then we know all
of the performance implications in advance.

Because if we are defining an interface then we need to consider
implementations that are implemented in ways that do not use Python hashes
underneath. Generating the hash or map-wrapper could be expensive.

> The origin of qp_xml was for efficiency first, simplicity second. I
> maintain that qp_xml provides both.

first_cdata, following_cdata, non-recursive text dumping? Doesn't seem
very simple to me. It is completely unlike any API I have ever seen, even
in strongly typed programming languages where it would seem more
appropriate.

> Again, back to this "dynamically typed language". That is your point of
> view, rather than a statement of fact. I won't attempt to characterize
> how you derived that point of view (from the DOM maybe?), but it is NOT
> the view that I hold.

The contents of an element are *by definition*, elements, characters and
processing instructions. You can't wish that fact away. That's a
heterogenous  list. 

WD-XML: "PIs are not part of the document's character data, but must be
passed through to the application."

> XML is a means of representing structured data. That structure takes the
> form of elements (with attributes) and contained text. I do not see how
> XML is a programming langauge, or that it is dynamically typed. It is
> simply a representation in my mind.

XML is not a programming language but it explicitly supports heterogenous
lists.

> And I'll ignore the quote which just seems to be silliness or
> flamebait...

My point: I don't think Python implementors should try to pretend that
Python does not (for example) support heterogenous lists and neither
should XML implementors.

> > > Moreover, it was also a total bitch to simply say "give me the child
> > > elements". Of course, that didn't work since the DOM insisted on returning
> > > a list of a mix of CDATA and elements.
> >
> > It told you what was in your document.
> 
> I also get that from qp_xml with a lot less hassle, so that says to me
> that the DOM is introducing needless complexity/hassle for the client.

It isn't needless complexity if you need the PIs. I could find an
application of XML that doesn't use attributes -- do we now define an API
that dumps those too?

> The only "structure" that I toss are PIs and comments. I do not view
> those as "structure". The contents (elements, attributes, text) are
> retained and can be reconstructed from the structure that qp_xml
> returns.

Fortunately it is not up to us to define XML. The XML specification says
that processors should pass them along to applications.

> Most of the DOM's interface is for *building* a DOM structure. It is
> conceivable that those APIs only exist as a way to response to parsing
> events, but I believe their existence is due to the fact that people
> want to build a DOM and then generate the resulting XML. 

In some cases they do. In other cases they read a DOM, make a small
modification and then write that. In still other cases, they make a DOM,
edit by hand in a graphical, DOM-based editor and then write that out. In
yet other cases, DOM modifications are performed in order to create a
graphical effect in a browser.

> Otherwise, we
> could have had two levels of the DOM interface: read-only (with private
> construction mechanisms), and read-write (as exemplified by the current
> DOM).

That's exactly what we have. Minidom is the read-only version with private
construction mechanisms and PyDOM/4DOM are read-write. 

> I could care less about compatibility. I'm trying to write an
> application here. 

If you could care less about compatibility, maybe you shouldn't be using
XML. XML is about compatibility.

> Geez... using your viewpoint: if I wanted
> compatibility, then maybe I should use Java or C since everybody else
> uses that.

Slavish adherence to conventions is not a good idea, but neither is
reinventing wheels. From my point of view that's exactly what qp_xml does.

> Goody for them. That doesn't help me write my application.

You have a library. It works for you. What's the problem? Now you want to
make it a standard API. That means that user interface concers become
important. Here are some important principles of interface design are:

 * reuse what people already know
 * do not unnecessarily multiply interfaces

People know and seem to like, the DOM. A subset can be made about as fast,
convenient and small as qp_xml.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

Company spokeswoman Lana Simon stressed that Interactive 
Yoda is not a Furby. Well, not exactly. 

"This is an interactive toy that utilizes Furby technology," 
Simon said. "It will react to its surroundings and will talk." 
  - http://www.wired.com/news/news/culture/story/19222.html


From ken@bitsko.slc.ut.us  Fri Apr 23 22:53:05 1999
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 23 Apr 1999 16:53:05 -0500
Subject: [XML-SIG] DOM API
In-Reply-To: Paul Prescod's message of "Sat, 24 Apr 1999 12:52:54 -0500"
References: <Pine.LNX.3.95.990420003128.32319A-100000@ns1.lyra.org> <371C9388.D0690375@prescod.net> <37219FD6.73061D0@lyra.org> <372204F6.4A6174E1@prescod.net>
Message-ID: <m3vhen2eda.fsf@biff.bitsko.slc.ut.us>

Paul Prescod <paul@prescod.net> writes:

> Greg Stein wrote:
> > 
> > XML 1.0 *defines* PIs. That is very different.
> 
> Okay, so you agree that PIs are part of XML document instance
> data.

Why not have an option in the DOM tree builder or a method on the
document and element nodes to remove PIs?  Very similar to the
normalize() method for joining consecutive text nodes.

The combination of SAX filters and a SAX DOM tree builder allows one
to choose (write a filter for) any type of tree you want to see.

It seems very important to me that the tree model itself be able to
hold every type of node an application may need (including even
non-DOM, non-XML nodes), but it is also important to be able to
constrain the nodes in a tree to _just_ those nodes an application
wants.


Re. lightweight API, I haven't seen qp yet so I'm not exactly sure
what it's API is.  When I think of a ``lightweight'' XML tree I don't
think of an API, per se, at all.  A lightweight XML tree to me is a
nested tree of XML objects.  For example, an element would have
attributes name, attributes (a dictionary), and contents (a list) and
a PI would have attributes target and data.

The core objects are in classes, but there are no (or no strong need
to have) methods in the core classes.  Methods are added to the core
classes by ``extensions''.  Extensions are things like normalize, get
elements by tag name, get elements by id, visitors, filters, writers,
converters, etc.

The effect, though, is that outside of calling methods to act on the
tree all you're doing is working directly with the XML objects and
their attributes.

This pattern works well on any type of tree that has both complex data
types and many categories of functions that may be applied to the
tree, such as 2D and 3D graphics, directed graphs, networks, component
hierarchies, etc.

-- 
  Ken MacLeod
  ken@bitsko.slc.ut.us


From ken@bitsko.slc.ut.us  Fri Apr 23 23:11:01 1999
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 23 Apr 1999 17:11:01 -0500
Subject: [XML-SIG] qp_xml API (was: DOM API)
In-Reply-To: Greg Stein's message of "Mon, 19 Apr 1999 02:28:29 -0700"
References: <199904171425.IAA03919@malatesta.local> <371AD27F.7E0334A0@lyra.org> <wkso9xxdkp.fsf@ifi.uio.no> <371AE5EF.13BA1B8C@lyra.org> <01f101be8a42$d665e830$f29b12c2@pythonware.com> <371AF73D.52254043@lyra.org>
Message-ID: <m3u2u72dje.fsf@biff.bitsko.slc.ut.us>

Greg Stein <gstein@lyra.org> writes:
> Parser.parse(input): input may be a string or an object supporting
> the "read" method (e.g. a file or httplib.HTTPResponse (from my new
> httplib module)). The input must represent a complete XML
> document. It will be fully parsed and a lightweight representation
> will be returned. This method may be called any number of times (for
> multiple documents). The returned object is an instance of
> qp_xml._element.

It was suggested in an earlier thread that multiple builders should be
allowed for.

A technique for implementing this is to take the `parse' function out
of the tree class altogether and put tree builders into their own
classes.

There is very little functional difference between the two (i.e. all
you're doing is moving the `parse' function you have into a different
class, it still returns a tree), but the semantic difference of ``who
can build a tree'' becomes very clear.

This can be very useful for the DOM and DOM-subset packages being
talked about elsewhere.  For example, a DOM-builder that takes SAX
events and calls DOM-factory methods to build a tree can be used to
build any of the DOM trees.

-- 
  Ken MacLeod
  ken@bitsko.slc.ut.us


From uche.ogbuji@fourthought.com  Sun Apr 25 16:00:34 1999
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Sun, 25 Apr 1999 09:00:34 -0600
Subject: [XML-SIG] SAX2: Parser properties
In-Reply-To: Your message of "09 Apr 1999 22:43:58 +0200."
 <wk4smpo775.fsf@ifi.uio.no>
Message-ID: <199904251500.JAA07692@malatesta.local>

I'm sorry I'm responding so late, but better so than never, I hope.

> The first three properties come from the JavaSAX proposal, while the
> last one was invented by yours truly.
> 
> 
> http://xml.org/sax/properties/namespace-sep <String> (write-only)
>   Set the separator to be used between the URI part of a name and the
>   local part of a name when namespace processing is being performed
>   (see the http://xml.org/sax/features/namespaces feature).  By
>   default, the separator is a single space.  This property may not be
>   set while a parse is in progress (throws a SAXNotSupportedException).
> 
> http://xml.org/sax/properties/dom-node <Node> (read-only)
>   Get the DOM node currently being visited, if the SAX parser is
>   iterating over a DOM tree.  If the parser recognises and supports
>   this property but is not currently visiting a DOM node, it should
>   return null (this is a good way to check for availability before the
>   parse begins).
> 
>   This property doesn't make much sense for Python, but I see no point
>   in leaving it out, either.

Actually, we are planning a SAX writer for a (hopefully near) future version 
of 4DOM, and this could support this property.

> http://xml.org/sax/properties/xml-string <String> (read-only)
>   Get the literal string of characters associated with the current
>   event.  If the parser recognises and supports this property but is
>   not currently parsing text, it should return null (this is a good
>   way to check for availability before the parse begins).  I stole
>   this idea from Expat.
> 
> 
> In addition, I think PySAX needs the following property:
> 
> http://python.org/sax/properties/data-encoding <String> (read/write)
>   This property can be used to control which character encoding is
>   used for data events that come from the parser. In Java this is not
>   an issue since all strings are Unicode, but in Python it is. Expat
>   reports UTF-8, while xmlproc/xmllib just pass on whatever they're
>   given.
> 
>   Do we need a special SAXEncodingNotSupportedException for this?
>   Otherwise it may be impossible to tell whether the parser doesn't
>   support this at all or whether it just doesn't support this
>   particular encoding.

I agree that this is the best way to go for now, but I think the question 
should be at least raised as to whether it is better to agree on a normal 
encoding form for parser string output and enforcing this in the SAX drivers 
(by conversion, if necessary).

-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org


From uche.ogbuji@fourthought.com  Sun Apr 25 16:02:34 1999
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Sun, 25 Apr 1999 09:02:34 -0600
Subject: [XML-SIG] SAX2: Handler classes
In-Reply-To: Your message of "09 Apr 1999 22:44:50 +0200."
 <wk3e29o75p.fsf@ifi.uio.no>
Message-ID: <199904251502.JAA07706@malatesta.local>

> This list is just copied from the Java proposal. Does anyone think we
> should skip any of these or add any new ones?
> 
> http://xml.org/sax/handlers/lexical <LexicalHandler>
>   Receive callbacks for comments, CDATA sections, and (possibly)
>   entity references.
> 
> http://xml.org/sax/handlers/dtd-decl <DTDDeclHandler>
>   Receive callbacks for element, attribute, and (possibly) parsed
>   entity declarations.
> 
> http://xml.org/sax/handlers/namespace <NamespaceHandler>
>   Receive callbacks for the start and end of the scope of each
>   namespace declaration.

I think they are all important, and I can't think of any additions.

-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org


From uche.ogbuji@fourthought.com  Sun Apr 25 16:13:37 1999
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Sun, 25 Apr 1999 09:13:37 -0600
Subject: [XML-SIG] SAX2: Attribute extensions
In-Reply-To: Your message of "17 Apr 1999 18:06:12 +0200."
 <wkyajr2pvv.fsf@ifi.uio.no>
Message-ID: <199904251513.JAA07722@malatesta.local>

> This posting specifies two interfaces for information needed by the
> DOM (and possibly also others) and also for full XML 1.0 conformance.
> I'm not really sure whether we should actually use all of this, so
> opinions are welcome.
> 
> class AttributeList2:
> 
>   def isSpecified(self,attr):
>     """Returns true if the attribute was explicitly specified in the
>     document and false otherwise. attr can be the attribute name or
>     its index in the AttributeList."""

This is pretty much essential for full DOM support, and thus it would help us 
greatly for the SAX builder in 4DOM.

>   def getEntityRefList(self,attr):
>     """This returns the EntityRefList (see below) for an attribute,
>     which can be specified by name or index."""
> 
> The class below is inteded to be used for discovering entity reference
> boundaries inside attribute values. This is needed because the XML 1.0
> recommendation requires parsers to report unexpanded entity references, 
> also inside attribute values. Whether this is really
> something we want is another matter.

I'm not clear on what the alternative is.  For example,

<spam eggs="xx&monty;xx">

if the parser doesn't expand &monty;, do you suggest that it should instead 
just return the literal "xx&monty;xx" as the attribute value. leaving the 
application to spot the "&" and assume an entity reference appropriately?  
This seems rather a shift in burden to the app.  If this is not what you mean, 
then it would seem to make sense for the parser to report unexpanded entity 
refs.

> class EntityRefList:
> 
>   def getLength(self):
>     "Returns the number of entity references inside this attribute value."
> 
>   def getEntityName(self, ix):
>     "Returns the name of entity reference number ix (zero-based index)."
> 
>   def getEntityRefStart(self, ix):
>     """Returns the index of the first character inside the attribute
>     value that stems from entity reference number ix."""
> 
>   def getEntityRefEnd(self, ix):
>     "Returns the index of the last character in entity reference ix."
> 
> 
> One redeeming feature of this interface is that it lives entirely
> outside the attribute value, and so can be ignored entirely by those
> who are not interested.

-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org


From uche.ogbuji@fourthought.com  Sun Apr 25 16:21:32 1999
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Sun, 25 Apr 1999 09:21:32 -0600
Subject: [XML-SIG] SAX2: LexicalHandler
In-Reply-To: Your message of "17 Apr 1999 18:05:20 +0200."
 <wkzp472pxb.fsf@ifi.uio.no>
Message-ID: <199904251521.JAA07736@malatesta.local>

> This handler is supposed to be used by applications that need
> information about lexical details in the document such as comments and
> entity boundaries. Most applications won't need this, but the DOM will
> find it useful. Support for this handler will be optional.
> 
> This handler has the handerID http://xml.org/sax/handlers/lexical.
> 
> class LexicalHandler:
> 
>   def xmlDecl(self, version, encoding, standalone):
>     """All three parameters are strings. encoding and standalone are not
>     specified on the XML declaration, their values will be None."""

I think you're missing an "If" at the beginning of the last sentence.

>   def startDTD(self, root, publicID, systemID):
>     """This event is reported when the DOCTYPE declaration is
>     encountered. root is the name of the root element type, while the two last
>     parameters are the public and system identifiers of the external
>     DTD subset."""

Excellent.  This would fill a huge hole in SAX -> DOM building.

>   def endDTD(self):
>     "This event is reported after the DTD has been parsed."
> 
>   def startEntity(self, name):
>     """Reports the beginning of a new entity. If the entity is the
>     external DTD subset the name will be '[dtd]'."""
> 
>   def endEntity(self, name):
>     pass
> 
>   def startCDATA(self):
>     pass
> 
>   def endCDATA(self):
>     pass

-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org


From uche.ogbuji@fourthought.com  Sun Apr 25 16:50:21 1999
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Sun, 25 Apr 1999 09:50:21 -0600
Subject: [XML-SIG] qp API
In-Reply-To: Your message of "Fri, 23 Apr 1999 16:04:25 CDT."
 <3720E059.28CD59BC@prescod.net>
Message-ID: <199904251550.JAA07778@malatesta.local>

> >         I haven't really formed an opinion about the Minidom module.
> > On the one hand, I don't like adding an interface that resembles
> > another interface; too many similar choices can be confusing.  (But if
> > PyDOM is upward-compatible with Minidom, that may not be a problem.)
> 
> I certainly intend for minidom to be a subset of PyDOM and 4DOM. Any
> extensions I made should be interpreted as suggestions for extensions to
> PyDOM and 4DOM.

And we are watching with great unterest.  4DOM already has "DOMFromString", 
"DOMFromURL", and "DOMFromFile" equivalents, although we call them "FromXML" 
and "FromHTML", "From*MLURL ", and "From*MLFile".  We also have 
"FromXMLStream" and "FromHTMLStream".  These functions are all in DOM.Ext.

These helper functions are provided since 4DOM 0.7.1, and so is supported by 
all the versions that come with 4XSL.

> >         (I do like the convenience functions like DOMFromString;
> > something similar should definitely be added, perhaps to dom.utils.)
> 
> Why not in "dom" itself? I don't see them as utilities but as the
> fundamental, commonly used entry points to DOM functionality.

Agreed, but I don't expect this in the near future from the beleaguered DOM WG.

-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org


From uche.ogbuji@fourthought.com  Sun Apr 25 17:31:17 1999
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Sun, 25 Apr 1999 10:31:17 -0600
Subject: [XML-SIG] DOM API
In-Reply-To: Your message of "Sat, 24 Apr 1999 03:41:26 PDT."
 <37219FD6.73061D0@lyra.org>
Message-ID: <199904251631.KAA07815@malatesta.local>

I've stayed out of the "justify the DOM" argument because I'm not really 
interested in it.  I like the DOM, I find it powerful and useful, and I use it 
in many places.  I can't help it if others feel the contrary, and I'm not in 
the mood for an emacs/vi, gnome/kde type debate.  However, I am particularly 
puzzled by a couple of comments.

> I believe that the notion of build/generate via the DOM is bogus. It
> seems you agree :-), and that print or file.write is more appropriate.
> Fredrik has some utility objects to do it. All fine. The DOM just blows
> :-)

Build/generate is explicitly outside the scope of the present DOM, so I don't 
see how the latter conclusion follows from the first sentence.

> I could care less about compatibility. I'm trying to write an
> application here. Geez... using your viewpoint: if I wanted
> compatibility, then maybe I should use Java or C since everybody else
> uses that.

It's important to note that many of us _are_ successful building applications 
based on the DOM, and I agree with Paul that the DOM's extraordinary success 
is ample proof against the DOM's being broken for practical use.  For example, 
at FourThought, we've had cause to evaluate commercial Databases with DOM 
support, and the answer is increasingly "all of them".  Now one thing I'll say 
in my observation of DB vendors: one or two of them always adopt the latest 
fad, but you never find such large-scale adoption of a technology in the 
glacial DB world unless there is real merit.

And I say the above even keeping in mind my disappointment with the slow 
adoption of ODMG/OQL.


-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org


From uche.ogbuji@fourthought.com  Sun Apr 25 17:14:31 1999
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Sun, 25 Apr 1999 10:14:31 -0600
Subject: [XML-SIG] DOM API
In-Reply-To: Your message of "Fri, 23 Apr 1999 16:16:05 CDT."
 <3720E315.A4532E83@prescod.net>
Message-ID: <199904251614.KAA07800@malatesta.local>

> As I said in my other messages, I want minidom to be a of PyDOM and 4DOM
> and hopefully the start of a common API. In that vein, minidom makes some
> decisions and extensions that we should discuss:
> 
> dom = DOMFromString( string, SAXbuilder=None )
> dom = DOMFromURL( URL, SAXbuilder=None )
> dom = DOMFromFile( file, SAXbuilder=None )

As I've mentioned, 4DOM already supports these functions, if under different 
names (which we don't mind normalizing to any names that are generally agreed 
upon).  We do have a few additional parameters, though, which I think are 
essential for strict DOM compliance, which I realize is not a key goal of 
PyDOM and minidom, but they're probably fodder for discussion.

def FromXML(      xmlStr,
                  ownerDocument=None,
                  validate=0,
                  keepAllWS=0,
                  catName=None,
                  SAXHandlerClass=XMLDOMGenerator)

* ownerDocument alows us to set this property for generated nodes.  If None, we
create a new Document node from the factory and add the built nodes to the 
document.  If the ownerDocument _is_ set, the new nodes are not added to the
document, and a DocumentFragment is returned instead.  This behavior 
corresponds to most of the use-cases we determined for building.

* validate is to tell the parser whether or not to validate

* keepAllWS basically tells the SAX handler whether to discard 
ignorable_whitespace.

* catName is for Xcatalog support (xmlproc only).  I don't think this needs be 
considered for a unified DOMFromString

* SAXHandlerClass is our equivalent of your SAXBuilder

> The default SAXBuilder would probably be the PyDOM or minidom builder.
> 
> Minidom uses mixed lower-first for property names. For compatibility with
> PyDOM, properties can be requested through get_ methods. My question is:
> do we really need get_ methods? They don't seem very Pythonish to me. Or
> maybe we can use them as implementation mechanism (_get_) but not expose
> them to the client.
> 
> I prefer the class-specific properties to the weird generic ones: tagName
> to nodeName, value to nodeValue and so forth. Obviously PyDOM and 4DOM
> would implement both but I don't see any reason to support that redundancy
> in minidom.
> 
> I made some namespace extensions because we can't wait forever to do
> namespace support.
> 
> getAttribute( "foo", "http://www.blah.bar" )
> 
> Looks up the obvious attribute.
> 
> element.localName gets the second have of the element type name.
> 
> element.uri gets the URI associated with the prefix.
> 
> element.prefix gets the element's prefix. I don't think that the
> namespaces view that prefixes are irrelevant should obviate the XML 1.0
> view that they are NOT. Even if we accept the namespaces view of the world
> entirely, prefixes are chosen to be mmenonmic so they shouldn't be
> discared by software.
> 
> element.attributes returns an attribute mapping object that I think
> behaves exactly like PyDOMs except for namespace support:
> 
> x.attributes["foo", "http://www.blah.bar"]
> 
> This also works, however:
> 
> x.attributes["bar:foo"] (just as in PyDOM)
> 
> Namespace attributes ARE maintained as attributes. keys(), items() and
> values() should be the same as PyDOM.

We might consider this for Namespace support for 4DOM, although we had been 
planning to wait for W3C to jump, so that we could maintain 
standards-compliance.  Right now 4DOM just treats namespaces entirely 
opaquely, i.e. ignores them.  Maybe there is a way to add your above 
suggestions to DOM.Ext.

> I should unify my Error class with PyDOM's.
> 
> I am considering the following enhancements:
> 
> element.elements: returns a list of element children.

In full DOM, this is trivial using Level 2 iterators.  We'd have no problem 
adding a wrapper function to DOM.Ext, though.

> element.getText: returns a list of deep list of data from the text nodes.
> Do your own string.join to choose an appropriate join character.

I'm not sure how useful this is if we omit the semantics of nested elements.  
I would see more use for a method that simply returns the XML text within an 
element, including nested tags.

> element.getChild("FOO") returns the first child (not descendant) element
> with specified element type name.

I've never had a need for such a method.  I often need all such elements, in 
which case I just use getElementsByTagName.

> element.getChild("FOO", "http://...") does the obvious thing.
> 
> element.getChild( "#PCDATA" ) gets a list of child text nodes.

-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org


From dieter@handshake.de  Fri Apr 23 19:49:30 1999
From: dieter@handshake.de (Dieter Maurer)
Date: Fri, 23 Apr 1999 18:49:30 +0000 (/etc/localtime)
Subject: [XML-SIG] addition of "encoding='iso8859-1'" in xml prolog
Message-ID: <14112.48598.425955.848660@lindm.dm>

The XML generators in our XML package (0.5.1) do not
generate UTF-8, but use the character set that happens to
be Pythons character set.

I think, we should allow for an encoding hook
and include a corresponding "encoding" declaration
in the XML prolog.

Tim Lavoie used XBEL (ns_parse.py) on bookmarks with
international (iso8859-1) entries. The resulting
XML was not parsable, because some of the non-ASCII
characters led to invalid UTF-8 codes.

- Dieter


From dieter@handshake.de  Sun Apr 25 22:40:19 1999
From: dieter@handshake.de (Dieter Maurer)
Date: Sun, 25 Apr 1999 21:40:19 +0000 (/etc/localtime)
Subject: [XML-SIG] qp API
In-Reply-To: <3720DDB2.2483DD7F@prescod.net>
References: <Pine.LNX.3.95.990421223551.12908D-100000@ns1.lyra.org>
 <3720DDB2.2483DD7F@prescod.net>
Message-ID: <14115.35341.379579.36825@lindm.dm>

Paul Prescod writes:
 > The problem with close() is that it is O(N) with the size of your
 > document, isn't it? I'm on the fence about parent pointers...maybe they
 > should be a construction option. They would be off by default.
But there is no difference in runtime behavior (O(N)),
whether the close() is explicite or implicite (i.e. because
the reference count reaches 0).

The real problem with an explicite close() are dangling
references. Assume, the application has a reference to
an inner node in the document tree. The close() would
probably remove all parent pointers from the subtree
(this is very similar (a bit worse) to what would happen,
if weakdicts would be used for parent pointer implementation).

- Dieter


From dieter@handshake.de  Sun Apr 25 22:31:29 1999
From: dieter@handshake.de (Dieter Maurer)
Date: Sun, 25 Apr 1999 21:31:29 +0000 (/etc/localtime)
Subject: [XML-SIG] qp API
In-Reply-To: <14112.52252.324863.576300@amarok.cnri.reston.va.us>
References: <14112.51262.675379.668123@weyr.cnri.reston.va.us>
 <14112.52252.324863.576300@amarok.cnri.reston.va.us>
Message-ID: <14115.35045.345868.73082@lindm.dm>

Andrew M. Kuchling writes:
 > 	I haven't really formed an opinion about the Minidom module.
 > On the one hand, I don't like adding an interface that resembles
 > another interface; too many similar choices can be confusing.  (But if
 > PyDOM is upward-compatible with Minidom, that may not be a problem.)
 > On the other hand, PyDOM *is* quite heavyweight, and I can understand
 > the desire for something similar.  Can people please give their
 > opinions about this?  
I am quite happy with PyDOM.
I would be even happier if DOM building and processing would
be faster.

I would not use a different API, if I am not forced to for performance
reasons.

- Dieter


From grove@infotek.no  Mon Apr 26 08:54:54 1999
From: grove@infotek.no (Geir Ove Gr�nmo)
Date: 26 Apr 1999 09:54:54 +0200
Subject: [XML-SIG] PySAX more pythonish
In-Reply-To: <3720A88E.C041B08B@prescod.net>
References: <3720A88E.C041B08B@prescod.net>
Message-ID: <GROVE-827lqzu88h.fsf@pc-grove.infotek.no>

* Paul Prescod
| I would like the attributes parameter to startElement to be defaulted in
| all SAX implementations.
| 
| I would also like a new method called "text" similar to the one
| implemented by xml.dom.sax_builder. "text" just takes a text string
| instead of a string and offsets. That's a little more pythonish for both
| the caller and callback.
| 
| The default DocumentHandler would re-route "characters" to "text". Someone
| who needed the (potentially) more efficient behavior of "characters" could
| override the implementation and re-route text to characters instead.
| 
| What do you think?

I like this. 

There is a small thing to notice in the current implementations: You are
not guaranteed that a sequence of characters is returned by _one_ event
only. Because of buffering in the parsers/drivers you may end up with
several events. This is very inconvenient at times.

Lars Marius has written some code to make sure that these events are
merged into one. I think this was written as a parser filter. I'm not
sure if he intends to include this in the Python SAX libraries, but it
would be very nice to have it available. 

On the other hand, it would also be nice to have the text method do
this. :-)

All the best,
Geir O.


From grove@infotek.no  Mon Apr 26 09:06:50 1999
From: grove@infotek.no (Geir Ove Gr�nmo)
Date: 26 Apr 1999 10:06:50 +0200
Subject: [XML-SIG] Re: xml-0.5.1: LICENCE (xmlarch)
In-Reply-To: <yegzp3yr3m3.fsf@luna.gnu.franken.de>
References: <yegzp3yr3m3.fsf@luna.gnu.franken.de>
Message-ID: <GROVE-821zh7u7ol.fsf@pc-grove.infotek.no>

* Karl Eichwalder
| There's room for interpretation.  The LICENCE file says:
| 
|     xmlarch:
|     --------------------------------------------------------------------
|     Copyright (C) 1998 by Geir O. Gr�nmo, grove@infotek.no
|     
|     Free for commercial and non-commercial use.
|     --------------------------------------------------------------------
| 
| But arch/xmlarch.py says (as it stands, this implies, that it's "unfree"
| under certain circumstances):
| 
|     Copyright (C) 1998 by Geir O. Gr�nmo, grove@infotek.no
|     
|     It is free for non-commercial use, if you modify it please let me
|     know.

Oops, I'll fix this right away. xmlarch is free for _both_ commercial
and non-commercial use. Sorry about the glitch.

Geir O.


From pmadsen@newbridge.com  Mon Apr 26 13:47:04 1999
From: pmadsen@newbridge.com (Paul Madsen)
Date: Mon, 26 Apr 1999 08:47:04 -0400
Subject: [XML-SIG] Windows compiled version of XML toolkit
Message-ID: <37246048.575FEE4F@newbridge.com>

This is a multi-part message in MIME format.
--------------9EC7A2E1913B90CEB7E22F31
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Hi, the Python/XML HOWTO references a pre-compiled version of the
toolkit for Windows. Is there such a beast available?

Thanks for any info.

Paul

--------------9EC7A2E1913B90CEB7E22F31
Content-Type: text/x-vcard; charset=us-ascii;
 name="pmadsen.vcf"
Content-Transfer-Encoding: 7bit
Content-Description: Card for Paul Madsen
Content-Disposition: attachment;
 filename="pmadsen.vcf"

begin:vcard 
n:Madsen;Paul 
tel;work:599-3600 x6589
x-mozilla-html:FALSE
url:http://eis.ca.newbridge.com
org:Newbridge Networks;Electronic Information Services
adr:;;;;;;
version:2.1
email;internet:pmadsen@newbridge.com
title:Structured Information Analyst
x-mozilla-cpt:;-1
fn:Paul  Madsen
end:vcard

--------------9EC7A2E1913B90CEB7E22F31--


From paul@prescod.net  Mon Apr 26 20:50:14 1999
From: paul@prescod.net (Paul Prescod)
Date: Mon, 26 Apr 1999 14:50:14 -0500
Subject: [XML-SIG] PySAX more pythonish
References: <3720A88E.C041B08B@prescod.net> <GROVE-827lqzu88h.fsf@pc-grove.infotek.no>
Message-ID: <3724C376.D99E95E0@prescod.net>

"Geir Ove Gr�nmo" wrote:
> 
> On the other hand, it would also be nice to have the text method do
> this. :-)

I can't think how to implement it easily in HandlerBase.characters. We
would have to implement it in every driver, I think. Lars' filter is
probably the best solution.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

Company spokeswoman Lana Simon stressed that Interactive 
Yoda is not a Furby. Well, not exactly. 

"This is an interactive toy that utilizes Furby technology," 
Simon said. "It will react to its surroundings and will talk." 
  - http://www.wired.com/news/news/culture/story/19222.html


From paul@prescod.net  Mon Apr 26 21:27:53 1999
From: paul@prescod.net (Paul Prescod)
Date: Mon, 26 Apr 1999 15:27:53 -0500
Subject: [XML-SIG] Python DOM Unification -- level
Message-ID: <3724CC49.AAB857A5@prescod.net>

Following are some meta-questions on the proposed Python DOM unification.

First, what is the appropriate level of unification? 

 * Module level:

if sys.argv[1]=="fast":
    from xml import minidom
    dom = minidom
else if sys.argv[1]=="complete":
    from xml import dom
else if sys.argv[1]=="distributed":
    from 4thought import dom

 * Builder level:

if sys.argv[1]=="4thought":
    from 4thought.dom import sax_builder()
else:
   from xml.dom import sax_builder()

xml.dom.FromXML( sax_builder() )

 * Document level:

if sys.argv[1]=="4thought":
    4thought.dom.Gimme.a.document()
else:
    xml.dom.I.need.a.document()

document.doStuff()

My preference is for "Builder level", I think. Portable helper functions
could go into a universal xml.dom package instead of into each package.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

Company spokeswoman Lana Simon stressed that Interactive 
Yoda is not a Furby. Well, not exactly. 

"This is an interactive toy that utilizes Furby technology," 
Simon said. "It will react to its surroundings and will talk." 
  - http://www.wired.com/news/news/culture/story/19222.html


From paul@prescod.net  Mon Apr 26 21:10:36 1999
From: paul@prescod.net (Paul Prescod)
Date: Mon, 26 Apr 1999 15:10:36 -0500
Subject: [XML-SIG] DOM API
References: <199904251614.KAA07800@malatesta.local>
Message-ID: <3724C83C.FA5080E@prescod.net>

I'll have to think about some things more than I have time for right now.
Other stuff:

uche.ogbuji@fourthought.com wrote:
> 
> 
> We might consider this for Namespace support for 4DOM, although we had been
> planning to wait for W3C to jump, so that we could maintain
> standards-compliance.  Right now 4DOM just treats namespaces entirely
> opaquely, i.e. ignores them.  Maybe there is a way to add your above
> suggestions to DOM.Ext.

In response to Greg's comments, I'm starting to think that namespace
processing should be a mode: either completely on or completely off. The
complex, scoped namespaces mechanism is more the result of politics than
technology -- this wasn't how namespaces were supposed to turn out.

> > element.getText: returns a list of deep list of data from the text nodes.
> > Do your own string.join to choose an appropriate join character.
> 
> I'm not sure how useful this is if we omit the semantics of 
> nested elements.

Actually, it gets a fair amount of use and is easy to implement. DSSSL,
XSL and the grove paradigm all provide this feature. Consider:

<SECTION>
   <TITLE>This is the <CODE>XSL</CODE> introduction.</TITLE>
...
</SECTION>

Now I'm generating a TOC, index or cross-reference. I don't care abou the
CODE element -- I just want to treat it as if the tags doen't exist.

I could go either way on this function, though.

> I would see more use for a method that simply returns the XML text within an
> element, including nested tags.

That's a different feature that is also useful. 

> > element.getChild("FOO") returns the first child (not descendant) element
> > with specified element type name.
> 
> I've never had a need for such a method.  I often need all such elements, in
> which case I just use getElementsByTagName.

I'm surprised that you've never needed it. In Greg's data-ish world it
would be incredibly useful but also in the data-ish subsets of the
document world.

<DOCUMENT>
 <METADATA>
   <TITLE>Blah...</TITLE>
   <AUTHOR>Blah...</AUTHOR>
 </METADATA>
...
</DOCUMENT>

doc.documentElement.getChild( "METADATA" ).getChild( "AUTHOR" )

You can emulate this with getElementsByTagName but you incur the overhead
of building and discarding the node list.

> > element.getChild( "#PCDATA" ) gets a list of child text nodes.

I've never needed this one, but Greg seems to...we'll let him defend it
(here or in qp_api) when he gets back.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

Company spokeswoman Lana Simon stressed that Interactive 
Yoda is not a Furby. Well, not exactly. 

"This is an interactive toy that utilizes Furby technology," 
Simon said. "It will react to its surroundings and will talk." 
  - http://www.wired.com/news/news/culture/story/19222.html


From akuchlin@cnri.reston.va.us  Mon Apr 26 22:26:50 1999
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Mon, 26 Apr 1999 17:26:50 -0400 (EDT)
Subject: [XML-SIG] Python DOM Unification -- level
In-Reply-To: <3724CC49.AAB857A5@prescod.net>
References: <3724CC49.AAB857A5@prescod.net>
Message-ID: <14116.55422.189139.235663@amarok.cnri.reston.va.us>

Paul Prescod writes:
> * Builder level:
>
>if sys.argv[1]=="4thought":
>    from 4thought.dom import sax_builder()
>else:
>   from xml.dom import sax_builder()

	I'd lean toward module-level, as long as it's understood that
an implementation can add extra stuff to its module, but builder-level
would also be acceptable.  Note that there isn't that much top-level
stuff required for a DOM module: exception codes, DOMException, the
Node class and its subclasses, NodeList and NamedNodeMap, and 
a createDocument() function.  createDocument is the only thing not
specified by the DOM1 REC, so anyone implementing DOM will have some
version of the above classes and objects.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
The warning message we sent the Russians was a calculated ambiguity that would
be clearly understood.
    -- Alexander Haig


From paul@prescod.net  Mon Apr 26 22:00:42 1999
From: paul@prescod.net (Paul Prescod)
Date: Mon, 26 Apr 1999 16:00:42 -0500
Subject: [XML-SIG] Py-ish PySax Suggestion #2
Message-ID: <3724D3FA.2E589DB6@prescod.net>

I would like to suggest that we copy the *mllib start_foo convention for
PySAX. Here's what a HandlerBase.StartElement would look like for that:

        def startElement( self, tagname, attrs ):
                method = getattr( self, "start_"+tagname, None)
                if method:
                        method( attrs )
                else:
                        self.startUnknownElement( tagname, attrs )

        def endElement( self, tagname, attrs ):
                method = getattr( self, "end_"+tagname, None)
                if method:
                        method()
                else:
                        self.startUnknownElement( tagname )


def startUnknownElement( self, tagname, attrs ):
        pass

def endUnknownElement( self, tagname ):
        pass

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

Company spokeswoman Lana Simon stressed that Interactive 
Yoda is not a Furby. Well, not exactly. 

"This is an interactive toy that utilizes Furby technology," 
Simon said. "It will react to its surroundings and will talk." 
  - http://www.wired.com/news/news/culture/story/19222.html


From paul@prescod.net  Mon Apr 26 23:03:13 1999
From: paul@prescod.net (Paul Prescod)
Date: Mon, 26 Apr 1999 17:03:13 -0500
Subject: [XML-SIG] Python DOM Unification -- level
References: <3724CC49.AAB857A5@prescod.net> <14116.55422.189139.235663@amarok.cnri.reston.va.us>
Message-ID: <3724E2A1.62223458@prescod.net>

"Andrew M. Kuchling" wrote:
> 
> Paul Prescod writes:
> > * Builder level:
> >
> >if sys.argv[1]=="4thought":
> >    from 4thought.dom import sax_builder()
> >else:
> >   from xml.dom import sax_builder()
> 
>         I'd lean toward module-level, as long as it's understood that
> an implementation can add extra stuff to its module, but builder-level
> would also be acceptable.  Note that there isn't that much top-level
> stuff required for a DOM module: exception codes, DOMException, the
> Node class and its subclasses, NodeList and NamedNodeMap, and
> a createDocument() function.  

Shouldn't the exception objects and class constants be shared between DOM
implementations?

Why do Node, NodeList and NamedNodeMap have to be top-level. Does it make
sense for clients to construct them?

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

Company spokeswoman Lana Simon stressed that Interactive 
Yoda is not a Furby. Well, not exactly. 

"This is an interactive toy that utilizes Furby technology," 
Simon said. "It will react to its surroundings and will talk." 
  - http://www.wired.com/news/news/culture/story/19222.html


From paul@prescod.net  Mon Apr 26 23:18:27 1999
From: paul@prescod.net (Paul Prescod)
Date: Mon, 26 Apr 1999 17:18:27 -0500
Subject: [XML-SIG] qp API
References: <Pine.LNX.3.95.990421223551.12908D-100000@ns1.lyra.org>
 <3720DDB2.2483DD7F@prescod.net> <14115.35341.379579.36825@lindm.dm>
Message-ID: <3724E633.58B2A1C1@prescod.net>

Dieter Maurer wrote:
> 
> But there is no difference in runtime behavior (O(N)),
> whether the close() is explicite or implicite (i.e. because
> the reference count reaches 0).

Yeah, I realized that later. Python allows you to forget that it is doing
a lot of work under the covers. Even so, close() is Python code and
refcount cleanup is in the heart of the interpreter.

> The real problem with an explicite close() are dangling
> references. Assume, the application has a reference to
> an inner node in the document tree. The close() would
> probably remove all parent pointers from the subtree

You wouldn't really have a dangling reference -- you would have a
reference to a node that no longer knows its parent. But that is still not
ideal.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

Company spokeswoman Lana Simon stressed that Interactive 
Yoda is not a Furby. Well, not exactly. 

"This is an interactive toy that utilizes Furby technology," 
Simon said. "It will react to its surroundings and will talk." 
  - http://www.wired.com/news/news/culture/story/19222.html


From mike.olson@fourthought.com  Tue Apr 27 00:45:05 1999
From: mike.olson@fourthought.com (Mike Olson)
Date: Mon, 26 Apr 1999 18:45:05 -0500
Subject: [XML-SIG] DOM API
References: <199904251614.KAA07800@malatesta.local> <3724C83C.FA5080E@prescod.net>
Message-ID: <3724FA80.33D46BE0@fourthought.com>

This is a cryptographically signed message in MIME format.

--------------ms96FD3791096FB6D164818BD1
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit


Paul Prescod wrote:

> > I've never had a need for such a method.  I often need all such elements, in
> > which case I just use getElementsByTagName.
>
> I'm surprised that you've never needed it. In Greg's data-ish world it
> would be incredibly useful but also in the data-ish subsets of the
> document world.
>
> <DOCUMENT>
>  <METADATA>
>    <TITLE>Blah...</TITLE>
>    <AUTHOR>Blah...</AUTHOR>
>  </METADATA>
> ...
> </DOCUMENT>
>
> doc.documentElement.getChild( "METADATA" ).getChild( "AUTHOR" )
>

GetElementsByTagName does not stop at the current level, it will check its
children, then their children, ...  This was a big pain for us and I had to
implement a getChildren type method.  to me, that would be more useful then a
getChild.


>
> --
>  Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
>  http://itrc.uwaterloo.ca/~papresco
>
> Company spokeswoman Lana Simon stressed that Interactive
> Yoda is not a Furby. Well, not exactly.
>
> "This is an interactive toy that utilizes Furby technology,"
> Simon said. "It will react to its surroundings and will talk."
>   - http://www.wired.com/news/news/culture/story/19222.html
>
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://www.python.org/mailman/listinfo/xml-sig

--
Mike Olson
Member Consultant
FourThought LLC
http://www.fourthought.com http://opentechnology.org


---

"No program is interesting in itself to a programmer. It's only interesting as
long
as there are new challenges and new ideas coming up." --- Linus Torvalds


--------------ms96FD3791096FB6D164818BD1
Content-Type: application/x-pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIIKmQYJKoZIhvcNAQcCoIIKijCCCoYCAQExCzAJBgUrDgMCGgUAMAsGCSqGSIb3DQEHAaCC
CCUwggTvMIIEWKADAgECAhAOCY8cYeSQOObs5zKyDmWRMA0GCSqGSIb3DQEBBAUAMIHMMRcw
FQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UECxMWVmVyaVNpZ24gVHJ1c3QgTmV0d29y
azFGMEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9yZXBvc2l0b3J5L1JQQSBJbmNvcnAuIEJ5
IFJlZi4sTElBQi5MVEQoYyk5ODFIMEYGA1UEAxM/VmVyaVNpZ24gQ2xhc3MgMSBDQSBJbmRp
dmlkdWFsIFN1YnNjcmliZXItUGVyc29uYSBOb3QgVmFsaWRhdGVkMB4XDTk5MDMwNTAwMDAw
MFoXDTk5MDUwNDIzNTk1OVowggEKMRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UE
CxMWVmVyaVNpZ24gVHJ1c3QgTmV0d29yazFGMEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9y
ZXBvc2l0b3J5L1JQQSBJbmNvcnAuIGJ5IFJlZi4sTElBQi5MVEQoYyk5ODEeMBwGA1UECxMV
UGVyc29uYSBOb3QgVmFsaWRhdGVkMSYwJAYDVQQLEx1EaWdpdGFsIElEIENsYXNzIDEgLSBO
ZXRzY2FwZTETMBEGA1UEAxQKTWlrZSBPbHNvbjEpMCcGCSqGSIb3DQEJARYabWlrZS5vbHNv
bkBmb3VydGhvdWdodC5jb20wgZ8wDQYJKoZIhvcNAQEBBQADgY0AMIGJAoGBANKGswZUnQ/B
IfNlZWIIy6G6AkyjYgPRhXynebPtI5ARMq9xDo2zgLgWE+8QffdoZp2hUnTpm63B6cG8yqH1
PnA/7SB2roIfml1vnOwXgNuBctciTmnrac4GWgL0CM9839fJZh47QIVYPlCbOPtnvnH1NGGD
jFWAVX7vmES72Dl9AgMBAAGjggGPMIIBizAJBgNVHRMEAjAAMIGsBgNVHSAEgaQwgaEwgZ4G
C2CGSAGG+EUBBwEBMIGOMCgGCCsGAQUFBwIBFhxodHRwczovL3d3dy52ZXJpc2lnbi5jb20v
Q1BTMGIGCCsGAQUFBwICMFYwFRYOVmVyaVNpZ24sIEluYy4wAwIBARo9VmVyaVNpZ24ncyBD
UFMgaW5jb3JwLiBieSByZWZlcmVuY2UgbGlhYi4gbHRkLiAoYyk5NyBWZXJpU2lnbjARBglg
hkgBhvhCAQEEBAMCB4AwgYYGCmCGSAGG+EUBBgMEeBZ2ZDQ2NTJiZDYzZjIwNDcwMjkyOTg3
NjNjOWQyZjI3NTA2OWM3MzU5YmVkMWIwNTlkYTc1YmM0YmM5NzAxNzQ3ZGE1ZDNmMjE0MWJl
YWRiMmJkMmU4OTIxM2FlNmFmOWRmMTE0OTk5YTNiODQ1ZjlmM2VhNDUwYzAzBgNVHR8ELDAq
MCigJqAkhiJodHRwOi8vY3JsLnZlcmlzaWduLmNvbS9jbGFzczEuY3JsMA0GCSqGSIb3DQEB
BAUAA4GBAIuxBeIOBMHbj5yM/Vu4UJxDcz4Xtc7h0K8c6d82SiwwKLN5Gbew69PevcN6Ak+p
D8LO4NyCH8Cfu3acoT0Efi99XjWvdi2eSbDJUw6MvgJtnAfY03zM+Cf31A/1iyrvr3hD45/c
yhUNRh8f6qX1NzeKvvh5AcYD1bsi+0wnP0D8MIIDLjCCApegAwIBAgIRANJ2Lo0UDD19sqgl
Xa/uDXUwDQYJKoZIhvcNAQECBQAwXzELMAkGA1UEBhMCVVMxFzAVBgNVBAoTDlZlcmlTaWdu
LCBJbmMuMTcwNQYDVQQLEy5DbGFzcyAxIFB1YmxpYyBQcmltYXJ5IENlcnRpZmljYXRpb24g
QXV0aG9yaXR5MB4XDTk4MDUxMjAwMDAwMFoXDTA4MDUxMjIzNTk1OVowgcwxFzAVBgNVBAoT
DlZlcmlTaWduLCBJbmMuMR8wHQYDVQQLExZWZXJpU2lnbiBUcnVzdCBOZXR3b3JrMUYwRAYD
VQQLEz13d3cudmVyaXNpZ24uY29tL3JlcG9zaXRvcnkvUlBBIEluY29ycC4gQnkgUmVmLixM
SUFCLkxURChjKTk4MUgwRgYDVQQDEz9WZXJpU2lnbiBDbGFzcyAxIENBIEluZGl2aWR1YWwg
U3Vic2NyaWJlci1QZXJzb25hIE5vdCBWYWxpZGF0ZWQwgZ8wDQYJKoZIhvcNAQEBBQADgY0A
MIGJAoGBALtaRIoEFrtV/QN6ii2UTxV4NrgNSrJvnFS/vOh3Kp258Gi7ldkxQXB6gUu5SBNW
LccI4YRCq8CikqtEXKpC8IIOAukv+8I7u77JJwpdtrA2QjO1blSIT4dKvxna+RXoD4e2HOPM
xpqOf2okkuP84GW6p7F+78nbN2rISsgJBuSZAgMBAAGjfDB6MBEGCWCGSAGG+EIBAQQEAwIB
BjBHBgNVHSAEQDA+MDwGC2CGSAGG+EUBBwEBMC0wKwYIKwYBBQUHAgEWH3d3dy52ZXJpc2ln
bi5jb20vcmVwb3NpdG9yeS9SUEEwDwYDVR0TBAgwBgEB/wIBADALBgNVHQ8EBAMCAQYwDQYJ
KoZIhvcNAQECBQADgYEAiLg3O93alDcAraqf4YEBcR6Sam0v9vGd08pkONwbmAwHhluFFWoP
uUmFpJXxF31ntH8tLN2aQp7DPrSOquULBt7yVir6M8e+GddTTMO9yOMXtaRJQmPswqYXD11Y
Gkk8kFxVo2UgAP0YIOVfgqaxqJLFWGrBjQM868PNBaKQrm4xggI8MIICOAIBATCB4TCBzDEX
MBUGA1UEChMOVmVyaVNpZ24sIEluYy4xHzAdBgNVBAsTFlZlcmlTaWduIFRydXN0IE5ldHdv
cmsxRjBEBgNVBAsTPXd3dy52ZXJpc2lnbi5jb20vcmVwb3NpdG9yeS9SUEEgSW5jb3JwLiBC
eSBSZWYuLExJQUIuTFREKGMpOTgxSDBGBgNVBAMTP1ZlcmlTaWduIENsYXNzIDEgQ0EgSW5k
aXZpZHVhbCBTdWJzY3JpYmVyLVBlcnNvbmEgTm90IFZhbGlkYXRlZAIQDgmPHGHkkDjm7Ocy
sg5lkTAJBgUrDgMCGgUAoIGxMBgGCSqGSIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcN
AQkFMQ8XDTk5MDQyNjIzNDUwNVowIwYJKoZIhvcNAQkEMRYEFFBPCLGyr3US/EPMH41iiCme
704WMFIGCSqGSIb3DQEJDzFFMEMwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMAcGBSsO
AwIHMA0GCCqGSIb3DQMCAgFAMA0GCCqGSIb3DQMCAgEoMA0GCSqGSIb3DQEBAQUABIGAGiI7
YFtTB1Kh1qV6MrrJ2ecQXCR6HlLp4XhykK12bNsmA5et5GNvcquuX8fMJjPAQ8BCwUqIseto
ANS/5Xe37rh9j9IBQLD6YkhUfkRQIa/hft0J8sQxsiwgIuLG3amjmM4cRkkFxxwLbpiW+W4P
p4lqvrUHqJnSxQR9QiF43Tw=
--------------ms96FD3791096FB6D164818BD1--


From mike.olson@fourthought.com  Tue Apr 27 00:56:12 1999
From: mike.olson@fourthought.com (Mike Olson)
Date: Mon, 26 Apr 1999 18:56:12 -0500
Subject: [XML-SIG] Python DOM Unification -- level
References: <3724CC49.AAB857A5@prescod.net>
Message-ID: <3724FD1C.86EBD9E6@fourthought.com>

This is a cryptographically signed message in MIME format.

--------------msC31DF1719CA3CA2D111E3B60
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit


I would say at the Builder Level, but handle it differently then you suggest.

if sys.argv[1] == '4th' :
    fac = 4dom.Ext.Factory
    builder = 4dom.Ext.Builder
elif sys.argv[1] == 'pydom':
    fac = pydom.factory
    builder = pydom.builder
else
    fac = minidom.fac
    builder = minidom.builder

doc = fac.CreateDocument();
doc = builder.XMLFromURL('www.fourthought.com')


where the factory interface defines everything that is not creatable from a
document.

interface DOMFactory {

    Document CreateDocument();
    HTMLDocument CreateHTMLDocument();
    DocType CreateDocType();
    NodeList CreateNodeList(in sequence<Node>);
    ...
};

and builder defines an interface for creating documents from different
streams

interface DOMBuilder {

    Document FromXMLFile(in string URL);
    HTMLDocument FromHTMLFile(in string URL);
    Document FromXMLString(in string XML);
    ...
};


Mike

Paul Prescod wrote:

> Following are some meta-questions on the proposed Python DOM unification.
>
> First, what is the appropriate level of unification?
>
>  * Module level:
>
> if sys.argv[1]=="fast":
>     from xml import minidom
>     dom = minidom
> else if sys.argv[1]=="complete":
>     from xml import dom
> else if sys.argv[1]=="distributed":
>     from 4thought import dom
>
>  * Builder level:
>
> if sys.argv[1]=="4thought":
>     from 4thought.dom import sax_builder()
> else:
>    from xml.dom import sax_builder()
>
> xml.dom.FromXML( sax_builder() )
>
>  * Document level:
>
> if sys.argv[1]=="4thought":
>     4thought.dom.Gimme.a.document()
> else:
>     xml.dom.I.need.a.document()
>
> document.doStuff()
>
> My preference is for "Builder level", I think. Portable helper functions
> could go into a universal xml.dom package instead of into each package.
>
> --
>  Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
>  http://itrc.uwaterloo.ca/~papresco
>
> Company spokeswoman Lana Simon stressed that Interactive
> Yoda is not a Furby. Well, not exactly.
>
> "This is an interactive toy that utilizes Furby technology,"
> Simon said. "It will react to its surroundings and will talk."
>   - http://www.wired.com/news/news/culture/story/19222.html
>
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://www.python.org/mailman/listinfo/xml-sig

--
Mike Olson
Member Consultant
FourThought LLC
http://www.fourthought.com http://opentechnology.org


---

"No program is interesting in itself to a programmer. It's only interesting
as long
as there are new challenges and new ideas coming up." --- Linus Torvalds


--------------msC31DF1719CA3CA2D111E3B60
Content-Type: application/x-pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIIKmQYJKoZIhvcNAQcCoIIKijCCCoYCAQExCzAJBgUrDgMCGgUAMAsGCSqGSIb3DQEHAaCC
CCUwggTvMIIEWKADAgECAhAOCY8cYeSQOObs5zKyDmWRMA0GCSqGSIb3DQEBBAUAMIHMMRcw
FQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UECxMWVmVyaVNpZ24gVHJ1c3QgTmV0d29y
azFGMEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9yZXBvc2l0b3J5L1JQQSBJbmNvcnAuIEJ5
IFJlZi4sTElBQi5MVEQoYyk5ODFIMEYGA1UEAxM/VmVyaVNpZ24gQ2xhc3MgMSBDQSBJbmRp
dmlkdWFsIFN1YnNjcmliZXItUGVyc29uYSBOb3QgVmFsaWRhdGVkMB4XDTk5MDMwNTAwMDAw
MFoXDTk5MDUwNDIzNTk1OVowggEKMRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UE
CxMWVmVyaVNpZ24gVHJ1c3QgTmV0d29yazFGMEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9y
ZXBvc2l0b3J5L1JQQSBJbmNvcnAuIGJ5IFJlZi4sTElBQi5MVEQoYyk5ODEeMBwGA1UECxMV
UGVyc29uYSBOb3QgVmFsaWRhdGVkMSYwJAYDVQQLEx1EaWdpdGFsIElEIENsYXNzIDEgLSBO
ZXRzY2FwZTETMBEGA1UEAxQKTWlrZSBPbHNvbjEpMCcGCSqGSIb3DQEJARYabWlrZS5vbHNv
bkBmb3VydGhvdWdodC5jb20wgZ8wDQYJKoZIhvcNAQEBBQADgY0AMIGJAoGBANKGswZUnQ/B
IfNlZWIIy6G6AkyjYgPRhXynebPtI5ARMq9xDo2zgLgWE+8QffdoZp2hUnTpm63B6cG8yqH1
PnA/7SB2roIfml1vnOwXgNuBctciTmnrac4GWgL0CM9839fJZh47QIVYPlCbOPtnvnH1NGGD
jFWAVX7vmES72Dl9AgMBAAGjggGPMIIBizAJBgNVHRMEAjAAMIGsBgNVHSAEgaQwgaEwgZ4G
C2CGSAGG+EUBBwEBMIGOMCgGCCsGAQUFBwIBFhxodHRwczovL3d3dy52ZXJpc2lnbi5jb20v
Q1BTMGIGCCsGAQUFBwICMFYwFRYOVmVyaVNpZ24sIEluYy4wAwIBARo9VmVyaVNpZ24ncyBD
UFMgaW5jb3JwLiBieSByZWZlcmVuY2UgbGlhYi4gbHRkLiAoYyk5NyBWZXJpU2lnbjARBglg
hkgBhvhCAQEEBAMCB4AwgYYGCmCGSAGG+EUBBgMEeBZ2ZDQ2NTJiZDYzZjIwNDcwMjkyOTg3
NjNjOWQyZjI3NTA2OWM3MzU5YmVkMWIwNTlkYTc1YmM0YmM5NzAxNzQ3ZGE1ZDNmMjE0MWJl
YWRiMmJkMmU4OTIxM2FlNmFmOWRmMTE0OTk5YTNiODQ1ZjlmM2VhNDUwYzAzBgNVHR8ELDAq
MCigJqAkhiJodHRwOi8vY3JsLnZlcmlzaWduLmNvbS9jbGFzczEuY3JsMA0GCSqGSIb3DQEB
BAUAA4GBAIuxBeIOBMHbj5yM/Vu4UJxDcz4Xtc7h0K8c6d82SiwwKLN5Gbew69PevcN6Ak+p
D8LO4NyCH8Cfu3acoT0Efi99XjWvdi2eSbDJUw6MvgJtnAfY03zM+Cf31A/1iyrvr3hD45/c
yhUNRh8f6qX1NzeKvvh5AcYD1bsi+0wnP0D8MIIDLjCCApegAwIBAgIRANJ2Lo0UDD19sqgl
Xa/uDXUwDQYJKoZIhvcNAQECBQAwXzELMAkGA1UEBhMCVVMxFzAVBgNVBAoTDlZlcmlTaWdu
LCBJbmMuMTcwNQYDVQQLEy5DbGFzcyAxIFB1YmxpYyBQcmltYXJ5IENlcnRpZmljYXRpb24g
QXV0aG9yaXR5MB4XDTk4MDUxMjAwMDAwMFoXDTA4MDUxMjIzNTk1OVowgcwxFzAVBgNVBAoT
DlZlcmlTaWduLCBJbmMuMR8wHQYDVQQLExZWZXJpU2lnbiBUcnVzdCBOZXR3b3JrMUYwRAYD
VQQLEz13d3cudmVyaXNpZ24uY29tL3JlcG9zaXRvcnkvUlBBIEluY29ycC4gQnkgUmVmLixM
SUFCLkxURChjKTk4MUgwRgYDVQQDEz9WZXJpU2lnbiBDbGFzcyAxIENBIEluZGl2aWR1YWwg
U3Vic2NyaWJlci1QZXJzb25hIE5vdCBWYWxpZGF0ZWQwgZ8wDQYJKoZIhvcNAQEBBQADgY0A
MIGJAoGBALtaRIoEFrtV/QN6ii2UTxV4NrgNSrJvnFS/vOh3Kp258Gi7ldkxQXB6gUu5SBNW
LccI4YRCq8CikqtEXKpC8IIOAukv+8I7u77JJwpdtrA2QjO1blSIT4dKvxna+RXoD4e2HOPM
xpqOf2okkuP84GW6p7F+78nbN2rISsgJBuSZAgMBAAGjfDB6MBEGCWCGSAGG+EIBAQQEAwIB
BjBHBgNVHSAEQDA+MDwGC2CGSAGG+EUBBwEBMC0wKwYIKwYBBQUHAgEWH3d3dy52ZXJpc2ln
bi5jb20vcmVwb3NpdG9yeS9SUEEwDwYDVR0TBAgwBgEB/wIBADALBgNVHQ8EBAMCAQYwDQYJ
KoZIhvcNAQECBQADgYEAiLg3O93alDcAraqf4YEBcR6Sam0v9vGd08pkONwbmAwHhluFFWoP
uUmFpJXxF31ntH8tLN2aQp7DPrSOquULBt7yVir6M8e+GddTTMO9yOMXtaRJQmPswqYXD11Y
Gkk8kFxVo2UgAP0YIOVfgqaxqJLFWGrBjQM868PNBaKQrm4xggI8MIICOAIBATCB4TCBzDEX
MBUGA1UEChMOVmVyaVNpZ24sIEluYy4xHzAdBgNVBAsTFlZlcmlTaWduIFRydXN0IE5ldHdv
cmsxRjBEBgNVBAsTPXd3dy52ZXJpc2lnbi5jb20vcmVwb3NpdG9yeS9SUEEgSW5jb3JwLiBC
eSBSZWYuLExJQUIuTFREKGMpOTgxSDBGBgNVBAMTP1ZlcmlTaWduIENsYXNzIDEgQ0EgSW5k
aXZpZHVhbCBTdWJzY3JpYmVyLVBlcnNvbmEgTm90IFZhbGlkYXRlZAIQDgmPHGHkkDjm7Ocy
sg5lkTAJBgUrDgMCGgUAoIGxMBgGCSqGSIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcN
AQkFMQ8XDTk5MDQyNjIzNTYxMlowIwYJKoZIhvcNAQkEMRYEFKZtvLPMBvW6c58NCO22YlMa
n7dPMFIGCSqGSIb3DQEJDzFFMEMwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMAcGBSsO
AwIHMA0GCCqGSIb3DQMCAgFAMA0GCCqGSIb3DQMCAgEoMA0GCSqGSIb3DQEBAQUABIGAKvEI
6APY8OxYoevk4dGtnj/Kgwn7NzADyvgm56WjIWYDmQbGGJQlrH75Cbi5uUeCcP1vp1kyEs3+
SskBHi9/pPa/fQxiaLzb+166W2fbwne6pu1cbAiM86Svp8YuKDiYDMEbtbjQDlWYJXrjc+19
cAKREOcbyiGxQEV/7cGrA/Q=
--------------msC31DF1719CA3CA2D111E3B60--


From uche.ogbuji@fourthought.com  Tue Apr 27 06:47:01 1999
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Mon, 26 Apr 1999 23:47:01 -0600
Subject: [XML-SIG] Python DOM Unification -- level
In-Reply-To: Your message of "Mon, 26 Apr 1999 15:27:53 CDT."
 <3724CC49.AAB857A5@prescod.net>
Message-ID: <199904270547.XAA09432@malatesta.local>

> Following are some meta-questions on the proposed Python DOM unification.
> 
> First, what is the appropriate level of unification? 
> 
>  * Module level:
> 
> if sys.argv[1]=="fast":
>     from xml import minidom
>     dom = minidom
> else if sys.argv[1]=="complete":
>     from xml import dom
> else if sys.argv[1]=="distributed":
>     from 4thought import dom

Hmm.  The last line would throw an exception.  We have thought a bit about 
packaging for 4DOM: currently we use "DOM" as top level, but we understand 
that this might not play nicely with other DOM libs in the path.

>  * Builder level:
> 
> if sys.argv[1]=="4thought":
>     from 4thought.dom import sax_builder()
> else:
>    from xml.dom import sax_builder()
> 
> xml.dom.FromXML( sax_builder() )
> 
>  * Document level:
> 
> if sys.argv[1]=="4thought":
>     4thought.dom.Gimme.a.document()
> else:
>     xml.dom.I.need.a.document()
> 
> document.doStuff()
> 
> My preference is for "Builder level", I think. Portable helper functions
> could go into a universal xml.dom package instead of into each package.

Agreed.  Each implementation would know how to build its own concrete objects, 
and the unified interface (if we're able to pull that off) will allow 
transparent manipulation of heterogenous nodes within an app.

-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org


From uche.ogbuji@fourthought.com  Tue Apr 27 06:56:19 1999
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Mon, 26 Apr 1999 23:56:19 -0600
Subject: [XML-SIG] Py-ish PySax Suggestion #2
In-Reply-To: Your message of "Mon, 26 Apr 1999 16:00:42 CDT."
 <3724D3FA.2E589DB6@prescod.net>
Message-ID: <199904270556.XAA09446@malatesta.local>

> I would like to suggest that we copy the *mllib start_foo convention for
> PySAX. Here's what a HandlerBase.StartElement would look like for that:
> 
>         def startElement( self, tagname, attrs ):
>                 method = getattr( self, "start_"+tagname, None)
>                 if method:
>                         method( attrs )
>                 else:
>                         self.startUnknownElement( tagname, attrs )
> 
>         def endElement( self, tagname, attrs ):
>                 method = getattr( self, "end_"+tagname, None)
>                 if method:
>                         method()
>                 else:
>                         self.startUnknownElement( tagname )
> 
> 
> def startUnknownElement( self, tagname, attrs ):
>         pass
> 
> def endUnknownElement( self, tagname ):
>         pass

I don't have a big problem with this, but I'll bet it gives fits to those 
about these parts who are very concerned with every last bit of run-time speed.

And indeed, since this is so easily achieved under the current PySAX, is there 
really a need to enforce the meta-programming?

-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org


From uche.ogbuji@fourthought.com  Tue Apr 27 07:02:57 1999
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Tue, 27 Apr 1999 00:02:57 -0600
Subject: [XML-SIG] Python DOM Unification -- level
In-Reply-To: Your message of "Mon, 26 Apr 1999 18:56:12 CDT."
 <3724FD1C.86EBD9E6@fourthought.com>
Message-ID: <199904270602.AAA09467@malatesta.local>

> I would say at the Builder Level, but handle it differently then you suggest.
> 
> if sys.argv[1] == '4th' :
>     fac = 4dom.Ext.Factory
>     builder = 4dom.Ext.Builder
> elif sys.argv[1] == 'pydom':
>     fac = pydom.factory
>     builder = pydom.builder
> else
>     fac = minidom.fac
>     builder = minidom.builder
> 
> doc = fac.CreateDocument();
> doc = builder.XMLFromURL('www.fourthought.com')

Et tu, Mikhail?  Code that won't run?  (See lines 2 and 3).  And furthermore, 
I know that we do plan to put up the XML source for www.fourthought.com one of 
these days when browsers are sane, but won't that last line produce some funky 
results just now?

> where the factory interface defines everything that is not creatable from a
> document.
> 
> interface DOMFactory {
> 
>     Document CreateDocument();
>     HTMLDocument CreateHTMLDocument();
>     DocType CreateDocType();
>     NodeList CreateNodeList(in sequence<Node>);
>     ...
> };
> 
> and builder defines an interface for creating documents from different
> streams
> 
> interface DOMBuilder {
> 
>     Document FromXMLFile(in string URL);
>     HTMLDocument FromHTMLFile(in string URL);
>     Document FromXMLString(in string XML);
>     ...
> };

Of course, some may say I'm biased, but I think this is the strongest 
proposal.  It also dovetails with those who have been calling for a PyDOM 
factory interface.

The main problem I anticipate is that Paul might consider adding a factory to 
minidom a bit contrary to the "mini" idea.

-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org


From paul@prescod.net  Tue Apr 27 10:08:20 1999
From: paul@prescod.net (Paul Prescod)
Date: Tue, 27 Apr 1999 04:08:20 -0500
Subject: [XML-SIG] Py-ish PySax Suggestion #2
References: <199904270556.XAA09446@malatesta.local>
Message-ID: <37257E84.CBE66B@prescod.net>

uche.ogbuji@fourthought.com wrote:
> 
> 
> I don't have a big problem with this, but I'll bet it gives fits to those
> about these parts who are very concerned with every last bit of run-time speed.

For better or worse I think those people have already abandoned SAX.
Actually, the proposal doesn't slow anything down: if you need the speed
of a single startElement method, you just override it and go. Existing SAX
clients should be exactly as fast as they are today.

> And indeed, since this is so easily achieved under the current PySAX, is there
> really a need to enforce the meta-programming?

It isn't so much enforcing it as making it accessible and "standard." It
can help usability to make common idioms a part of the library or even
language.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

"Microsoft spokesman Ian Hatton admits that the Linux system would have
performed better had it been tuned."
"Future press releases on the issue will clearly state that the research
was sponsored by Microsoft."
  http://www.itweb.co.za/sections/enterprise/1999/9904221410.asp


From akuchlin@cnri.reston.va.us  Tue Apr 27 14:16:41 1999
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Tue, 27 Apr 1999 09:16:41 -0400 (EDT)
Subject: [XML-SIG] Python DOM Unification -- level
In-Reply-To: <3724E2A1.62223458@prescod.net>
References: <3724CC49.AAB857A5@prescod.net>
 <14116.55422.189139.235663@amarok.cnri.reston.va.us>
 <3724E2A1.62223458@prescod.net>
Message-ID: <14117.46989.519563.210317@amarok.cnri.reston.va.us>

Paul Prescod writes:
>Shouldn't the exception objects and class constants be shared between DOM
>implementations?

	Good point; they could be, I suppose.

>Why do Node, NodeList and NamedNodeMap have to be top-level. Does it make
>sense for clients to construct them?

	For code like "if isinstance(obj, Node):..."; otherwise you'd
have no way of telling when a class instance is in fact a DOM node.  I
suppose you could do without NodeList and NamedNodeMap -- they should
simply resemble lists and dictionaries -- but Node is probably
required.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
No doubt, a scientist isn't necessarily penalized for being a complex,
versatile, eccentric individual with lots of extra-scientific interests. But
it certainly doesn't help him a bit.
    -- Stephen Toulmin


From uche.ogbuji@fourthought.com  Tue Apr 27 14:35:42 1999
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Tue, 27 Apr 1999 07:35:42 -0600
Subject: [XML-SIG] Py-ish PySax Suggestion #2
In-Reply-To: Your message of "Tue, 27 Apr 1999 04:08:20 CDT."
 <37257E84.CBE66B@prescod.net>
Message-ID: <199904271335.HAA10015@malatesta.local>

Paul Prescod:
> > I don't have a big problem with this, but I'll bet it gives fits to those
> > about these parts who are very concerned with every last bit of run-time speed.
> 
> For better or worse I think those people have already abandoned SAX.
> Actually, the proposal doesn't slow anything down: if you need the speed
> of a single startElement method, you just override it and go. Existing SAX
> clients should be exactly as fast as they are today.
> 
> > And indeed, since this is so easily achieved under the current PySAX, is there
> > really a need to enforce the meta-programming?
> 
> It isn't so much enforcing it as making it accessible and "standard." It
> can help usability to make common idioms a part of the library or even
> language.

All true.  And given that, I do think it's a useful conventional idiom for 
many SAX apps, excluding DOM building, of course.

-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org


From Fred L. Drake, Jr." <fdrake@acm.org  Tue Apr 27 15:36:47 1999
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Tue, 27 Apr 1999 10:36:47 -0400 (EDT)
Subject: [XML-SIG] Py-ish PySax Suggestion #2
In-Reply-To: <3724D3FA.2E589DB6@prescod.net>
References: <3724D3FA.2E589DB6@prescod.net>
Message-ID: <14117.52095.618817.525406@weyr.cnri.reston.va.us>

Paul Prescod writes:
 > I would like to suggest that we copy the *mllib start_foo convention for
 > PySAX. Here's what a HandlerBase.StartElement would look like for that:

  It was really fun to try to build just this on top of Java SAX; that 
was my first real experience with Java reflection!  ;-)
  I think this makes a lot of sense for use without namespaces, but
not with namespaces.  (I'm not a fan of the "" namespace.)  Perhaps
the startElement() and endElement() should be implemented as a
subclass or filter?  It may even be reasonable to have it as SAX2
"feature" that can be tested or requested.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives


From Fred L. Drake, Jr." <fdrake@acm.org  Tue Apr 27 15:39:02 1999
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Tue, 27 Apr 1999 10:39:02 -0400 (EDT)
Subject: [XML-SIG] Python DOM Unification -- level
In-Reply-To: <3724E2A1.62223458@prescod.net>
References: <3724CC49.AAB857A5@prescod.net>
 <14116.55422.189139.235663@amarok.cnri.reston.va.us>
 <3724E2A1.62223458@prescod.net>
Message-ID: <14117.52230.551462.836651@weyr.cnri.reston.va.us>

Paul Prescod writes:
 > Shouldn't the exception objects and class constants be shared between DOM
 > implementations?

  Absolutely!

 > Why do Node, NodeList and NamedNodeMap have to be top-level. Does it make
 > sense for clients to construct them?

  No, but you knew that before I did.  ;-)


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives


From mike.olson@fourthought.com  Tue Apr 27 16:05:41 1999
From: mike.olson@fourthought.com (Mike Olson)
Date: Tue, 27 Apr 1999 10:05:41 -0500
Subject: [XML-SIG] Python DOM Unification -- level
References: <3724CC49.AAB857A5@prescod.net>
 <14116.55422.189139.235663@amarok.cnri.reston.va.us>
 <3724E2A1.62223458@prescod.net> <14117.52230.551462.836651@weyr.cnri.reston.va.us>
Message-ID: <3725D245.C703951B@fourthought.com>

This is a cryptographically signed message in MIME format.

--------------ms821B3B6B4F987290D459635B
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit


"Fred L. Drake" wrote:

>
>  > Why do Node, NodeList and NamedNodeMap have to be top-level. Does it make
>  > sense for clients to construct them?
>
>   No, but you knew that before I did.  ;-)
>
>

Node, no, but NodeList and NamedNodeMap are just containers and I see no reason
why a client should not be able to create them.

Maybe they are doing some post processing ontop of the DOM but want to keep a
DOMish interface.  Then they will need to create NodeLists and NamedNodeMaps to
repackage nodes.

Mike


>
> --
> Fred L. Drake, Jr.           <fdrake@acm.org>
> Corporation for National Research Initiatives
>
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://www.python.org/mailman/listinfo/xml-sig

--
Mike Olson
Member Consultant
FourThought LLC
http://www.fourthought.com http://opentechnology.org


---

"No program is interesting in itself to a programmer. It's only interesting as
long
as there are new challenges and new ideas coming up." --- Linus Torvalds


--------------ms821B3B6B4F987290D459635B
Content-Type: application/x-pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIIKmQYJKoZIhvcNAQcCoIIKijCCCoYCAQExCzAJBgUrDgMCGgUAMAsGCSqGSIb3DQEHAaCC
CCUwggTvMIIEWKADAgECAhAOCY8cYeSQOObs5zKyDmWRMA0GCSqGSIb3DQEBBAUAMIHMMRcw
FQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UECxMWVmVyaVNpZ24gVHJ1c3QgTmV0d29y
azFGMEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9yZXBvc2l0b3J5L1JQQSBJbmNvcnAuIEJ5
IFJlZi4sTElBQi5MVEQoYyk5ODFIMEYGA1UEAxM/VmVyaVNpZ24gQ2xhc3MgMSBDQSBJbmRp
dmlkdWFsIFN1YnNjcmliZXItUGVyc29uYSBOb3QgVmFsaWRhdGVkMB4XDTk5MDMwNTAwMDAw
MFoXDTk5MDUwNDIzNTk1OVowggEKMRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UE
CxMWVmVyaVNpZ24gVHJ1c3QgTmV0d29yazFGMEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9y
ZXBvc2l0b3J5L1JQQSBJbmNvcnAuIGJ5IFJlZi4sTElBQi5MVEQoYyk5ODEeMBwGA1UECxMV
UGVyc29uYSBOb3QgVmFsaWRhdGVkMSYwJAYDVQQLEx1EaWdpdGFsIElEIENsYXNzIDEgLSBO
ZXRzY2FwZTETMBEGA1UEAxQKTWlrZSBPbHNvbjEpMCcGCSqGSIb3DQEJARYabWlrZS5vbHNv
bkBmb3VydGhvdWdodC5jb20wgZ8wDQYJKoZIhvcNAQEBBQADgY0AMIGJAoGBANKGswZUnQ/B
IfNlZWIIy6G6AkyjYgPRhXynebPtI5ARMq9xDo2zgLgWE+8QffdoZp2hUnTpm63B6cG8yqH1
PnA/7SB2roIfml1vnOwXgNuBctciTmnrac4GWgL0CM9839fJZh47QIVYPlCbOPtnvnH1NGGD
jFWAVX7vmES72Dl9AgMBAAGjggGPMIIBizAJBgNVHRMEAjAAMIGsBgNVHSAEgaQwgaEwgZ4G
C2CGSAGG+EUBBwEBMIGOMCgGCCsGAQUFBwIBFhxodHRwczovL3d3dy52ZXJpc2lnbi5jb20v
Q1BTMGIGCCsGAQUFBwICMFYwFRYOVmVyaVNpZ24sIEluYy4wAwIBARo9VmVyaVNpZ24ncyBD
UFMgaW5jb3JwLiBieSByZWZlcmVuY2UgbGlhYi4gbHRkLiAoYyk5NyBWZXJpU2lnbjARBglg
hkgBhvhCAQEEBAMCB4AwgYYGCmCGSAGG+EUBBgMEeBZ2ZDQ2NTJiZDYzZjIwNDcwMjkyOTg3
NjNjOWQyZjI3NTA2OWM3MzU5YmVkMWIwNTlkYTc1YmM0YmM5NzAxNzQ3ZGE1ZDNmMjE0MWJl
YWRiMmJkMmU4OTIxM2FlNmFmOWRmMTE0OTk5YTNiODQ1ZjlmM2VhNDUwYzAzBgNVHR8ELDAq
MCigJqAkhiJodHRwOi8vY3JsLnZlcmlzaWduLmNvbS9jbGFzczEuY3JsMA0GCSqGSIb3DQEB
BAUAA4GBAIuxBeIOBMHbj5yM/Vu4UJxDcz4Xtc7h0K8c6d82SiwwKLN5Gbew69PevcN6Ak+p
D8LO4NyCH8Cfu3acoT0Efi99XjWvdi2eSbDJUw6MvgJtnAfY03zM+Cf31A/1iyrvr3hD45/c
yhUNRh8f6qX1NzeKvvh5AcYD1bsi+0wnP0D8MIIDLjCCApegAwIBAgIRANJ2Lo0UDD19sqgl
Xa/uDXUwDQYJKoZIhvcNAQECBQAwXzELMAkGA1UEBhMCVVMxFzAVBgNVBAoTDlZlcmlTaWdu
LCBJbmMuMTcwNQYDVQQLEy5DbGFzcyAxIFB1YmxpYyBQcmltYXJ5IENlcnRpZmljYXRpb24g
QXV0aG9yaXR5MB4XDTk4MDUxMjAwMDAwMFoXDTA4MDUxMjIzNTk1OVowgcwxFzAVBgNVBAoT
DlZlcmlTaWduLCBJbmMuMR8wHQYDVQQLExZWZXJpU2lnbiBUcnVzdCBOZXR3b3JrMUYwRAYD
VQQLEz13d3cudmVyaXNpZ24uY29tL3JlcG9zaXRvcnkvUlBBIEluY29ycC4gQnkgUmVmLixM
SUFCLkxURChjKTk4MUgwRgYDVQQDEz9WZXJpU2lnbiBDbGFzcyAxIENBIEluZGl2aWR1YWwg
U3Vic2NyaWJlci1QZXJzb25hIE5vdCBWYWxpZGF0ZWQwgZ8wDQYJKoZIhvcNAQEBBQADgY0A
MIGJAoGBALtaRIoEFrtV/QN6ii2UTxV4NrgNSrJvnFS/vOh3Kp258Gi7ldkxQXB6gUu5SBNW
LccI4YRCq8CikqtEXKpC8IIOAukv+8I7u77JJwpdtrA2QjO1blSIT4dKvxna+RXoD4e2HOPM
xpqOf2okkuP84GW6p7F+78nbN2rISsgJBuSZAgMBAAGjfDB6MBEGCWCGSAGG+EIBAQQEAwIB
BjBHBgNVHSAEQDA+MDwGC2CGSAGG+EUBBwEBMC0wKwYIKwYBBQUHAgEWH3d3dy52ZXJpc2ln
bi5jb20vcmVwb3NpdG9yeS9SUEEwDwYDVR0TBAgwBgEB/wIBADALBgNVHQ8EBAMCAQYwDQYJ
KoZIhvcNAQECBQADgYEAiLg3O93alDcAraqf4YEBcR6Sam0v9vGd08pkONwbmAwHhluFFWoP
uUmFpJXxF31ntH8tLN2aQp7DPrSOquULBt7yVir6M8e+GddTTMO9yOMXtaRJQmPswqYXD11Y
Gkk8kFxVo2UgAP0YIOVfgqaxqJLFWGrBjQM868PNBaKQrm4xggI8MIICOAIBATCB4TCBzDEX
MBUGA1UEChMOVmVyaVNpZ24sIEluYy4xHzAdBgNVBAsTFlZlcmlTaWduIFRydXN0IE5ldHdv
cmsxRjBEBgNVBAsTPXd3dy52ZXJpc2lnbi5jb20vcmVwb3NpdG9yeS9SUEEgSW5jb3JwLiBC
eSBSZWYuLExJQUIuTFREKGMpOTgxSDBGBgNVBAMTP1ZlcmlTaWduIENsYXNzIDEgQ0EgSW5k
aXZpZHVhbCBTdWJzY3JpYmVyLVBlcnNvbmEgTm90IFZhbGlkYXRlZAIQDgmPHGHkkDjm7Ocy
sg5lkTAJBgUrDgMCGgUAoIGxMBgGCSqGSIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcN
AQkFMQ8XDTk5MDQyNzE1MDU0MVowIwYJKoZIhvcNAQkEMRYEFLV/71+3jfIQ9IC1wp5m1ezg
G0tvMFIGCSqGSIb3DQEJDzFFMEMwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMAcGBSsO
AwIHMA0GCCqGSIb3DQMCAgFAMA0GCCqGSIb3DQMCAgEoMA0GCSqGSIb3DQEBAQUABIGAnGOa
eUNXqozpxLVVjDe3kxdUBIVfLXV2L+TnbAPclCUysq+CUwXNLamZ6ruv1gbjwDecFNdodkuW
MvMCadiS+TPhUFMtdsq3Klrpfnf4fOWzXhyAu76Fh9XYKRVqyYmO+BMJdWLcTXoE6ADz6kTO
xkcCWHtiAGeG+Qg4inqoj0c=
--------------ms821B3B6B4F987290D459635B--


From Fred L. Drake, Jr." <fdrake@acm.org  Tue Apr 27 16:47:28 1999
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Tue, 27 Apr 1999 11:47:28 -0400 (EDT)
Subject: [XML-SIG] Python DOM Unification -- level
In-Reply-To: <3725D245.C703951B@fourthought.com>
References: <3724CC49.AAB857A5@prescod.net>
 <14116.55422.189139.235663@amarok.cnri.reston.va.us>
 <3724E2A1.62223458@prescod.net>
 <14117.52230.551462.836651@weyr.cnri.reston.va.us>
 <3725D245.C703951B@fourthought.com>
Message-ID: <14117.56336.632205.967452@weyr.cnri.reston.va.us>

Mike Olson writes:
 > Node, no, but NodeList and NamedNodeMap are just containers and I see no
 > reason why a client should not be able to create them.
 > 
 > Maybe they are doing some post processing ontop of the DOM but want to
 > keep a DOMish interface.  Then they will need to create NodeLists and
 > NamedNodeMaps to repackage nodes.

Mike,
  Do you think this would be doable in a way portable across DOM
implementations?  I've not looked at 4DOM (even though I intended to
;), so I don't know how much it differs from PyDOM under the hood.  I
would expect that if building these is important, factory methods
should be created on the Document object in the same way that there
are factory methods for elements, etc.
  It's not that I object to having the classes available, it's that I
don't see any requirement that they be available or that different DOM 
implementations share the implementation, even as a base class.
  I'm not convinced of Andrew's claim that having Node available for
type tests would be useful, either.  ;-)  That would also make it
difficult to create an all-C implementation of the DOM.  (No, I don't
have one in the works.)


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives


From mike.olson@fourthought.com  Tue Apr 27 17:59:49 1999
From: mike.olson@fourthought.com (Mike Olson)
Date: Tue, 27 Apr 1999 11:59:49 -0500
Subject: [XML-SIG] Python DOM Unification -- level
References: <3724CC49.AAB857A5@prescod.net>
 <14116.55422.189139.235663@amarok.cnri.reston.va.us>
 <3724E2A1.62223458@prescod.net>
 <14117.52230.551462.836651@weyr.cnri.reston.va.us>
 <3725D245.C703951B@fourthought.com> <14117.56336.632205.967452@weyr.cnri.reston.va.us>
Message-ID: <3725ED05.1FF4DA9A@fourthought.com>

This is a cryptographically signed message in MIME format.

--------------ms95352A88316FA873F0E7C460
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit


"Fred L. Drake" wrote:

> Mike Olson writes:
>  > Node, no, but NodeList and NamedNodeMap are just containers and I see no
>  > reason why a client should not be able to create them.
>  >
>  > Maybe they are doing some post processing ontop of the DOM but want to
>  > keep a DOMish interface.  Then they will need to create NodeLists and
>  > NamedNodeMaps to repackage nodes.
>
> Mike,
>   Do you think this would be doable in a way portable across DOM
> implementations?  I've not looked at 4DOM (even though I intended to
> ;), so I don't know how much it differs from PyDOM under the hood.  I
> would expect that if building these is important, factory methods
> should be created on the Document object in the same way that there
> are factory methods for elements, etc.
>   It's not that I object to having the classes available, it's that I
> don't see any requirement that they be available or that different DOM
> implementations share the implementation, even as a base class.
>   I'm not convinced of Andrew's claim that having Node available for
> type tests would be useful, either.  ;-)  That would also make it
> difficult to create an all-C implementation of the DOM.  (No, I don't
> have one in the works.)
>

We didn't want to pollute the Document API with all of these extra factory
methods.  We moved all of the stuff that you cannot build from a document into
out Factory interface.  we also put in all of the other node types so there is
one common factory for Nodes.  In 4DOM a document has an internal member
"factory" where it really creates all of its stuff.  This allows us to have a
"remote" factory if needed.

Note we added the idea of an HTMLDocument.  An HTMLDocument is a Document, but
added functionality to meet a bunch of the DOM imposed requirements. ie a
document must always have a head and body.  It also overrides the creatElement
to create DOM HTML classes of the required tag.

I don't think anything is gained exposing Node.  I see Andrew's pooint that at
the Node level, appendChild must check to make sure that only Nodes are being
added.  But down the hierarchy chain another check must be made, this is to
make sure that:

a) only one Element is added to a document
b) no text is added to a document
c) etc

so there is already validation that the object derives from Node.

I think the factory methods would have to be DOM implementation specific.  We
might be able to have one factory that creates Python DOM implementation
NodeList etc but I don't see much gained.

I don't think that all python implementations should share base classes and
NodeLists, et al.  Each should have thier own implementation tailored to its
purpose, ie speed, orbed, lightweight


NodeFactory.idl
#pragma prefix "fourthought.com"

#include "../../DOM.idl"
#include "../../HTML/HTML.idl"

module NodeFactoryIF {

        typedef sequence<DOMIF::Node> listofnodes;

        interface NodeFactory {

                //The user should only call these four methods
                HTMLIF::HTMLDocument createHTMLDocument();
                DOMIF::Document createDocument();
                HTMLIF::HTMLElement createHTMLElement(in HTMLIF::HTMLDocument
parent,in string tag);
                void releaseNode(in DOMIF::Node node);

                //Non public interface: user shouldn't call these
                //All require the ownerDocument, but when called from
                //Document.py, this is provided for the user
                DOMIF::DOMImplementation createDOMImplementation(in string
feature, in string version);
                DOMIF::NodeList createNodeList(in listofnodes nodes);
                DOMIF::NamedNodeMap createNamedNodeMap();
                DOMIF::Element createElement(in  DOMIF::Document ownerDocument,
in string tagName);
                DOMIF::DocumentFragment createDocumentFragment(in
DOMIF::Document ownerDocument);
                DOMIF::DocumentType createDocumentType(in DOMIF::Document
ownerDocument, in string name, in
DOMIF::NamedNodeMap entities, in DOMIF::NamedNodeMap notations);
                DOMIF::Text createTextNode(in DOMIF::Document ownerDocument, in
string data);
                DOMIF::Comment createComment(in DOMIF::Document ownerDocument,
in string data);
                DOMIF::CDATASection createCDATASection(in DOMIF::Document
ownerDocument, in string data);
                DOMIF::ProcessingInstruction createProcessingInstruction(in
DOMIF::Document ownerDocument, in string
target, in string data);
                DOMIF::Attr createAttribute(in DOMIF::Document ownerDocument,
in string name);
                DOMIF::Entity createEntity(in DOMIF::Document ownerDocument, in
string publicId, in string systemId, in
string notationName);
                DOMIF::EntityReference createEntityReference(in DOMIF::Document
ownerDocument,in string name);
                DOMIF::Notation createNotation(in DOMIF::Document
ownerDocument, in string publicId, in string systemId,
in string name);
                DOMIF::NodeIterator createNodeIterator(in DOMIF::Node
start_node);
                DOMIF::NodeIterator createSelectiveNodeIterator(in DOMIF::Node
start_node, in unsigned short
what_to_show);
                DOMIF::NodeIterator createFilteredNodeIterator(in DOMIF::Node
start_node, in DOMIF::NodeFilter filter);
                DOMIF::NodeIterator createSelectiveFilteredNodeIterator(in
DOMIF::Node start_node, in unsigned short
what_to_show, in DOMIF::NodeFilter filter);
                HTMLIF::HTMLCollection createHTMLCollection(in listofnodes
nodes);
        };
};

Mike

>
>   -Fred
>
> --
> Fred L. Drake, Jr.           <fdrake@acm.org>
> Corporation for National Research Initiatives

--
Mike Olson
Member Consultant
FourThought LLC
http://www.fourthought.com http://opentechnology.org


---

"No program is interesting in itself to a programmer. It's only interesting as
long
as there are new challenges and new ideas coming up." --- Linus Torvalds


--------------ms95352A88316FA873F0E7C460
Content-Type: application/x-pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIIKmQYJKoZIhvcNAQcCoIIKijCCCoYCAQExCzAJBgUrDgMCGgUAMAsGCSqGSIb3DQEHAaCC
CCUwggTvMIIEWKADAgECAhAOCY8cYeSQOObs5zKyDmWRMA0GCSqGSIb3DQEBBAUAMIHMMRcw
FQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UECxMWVmVyaVNpZ24gVHJ1c3QgTmV0d29y
azFGMEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9yZXBvc2l0b3J5L1JQQSBJbmNvcnAuIEJ5
IFJlZi4sTElBQi5MVEQoYyk5ODFIMEYGA1UEAxM/VmVyaVNpZ24gQ2xhc3MgMSBDQSBJbmRp
dmlkdWFsIFN1YnNjcmliZXItUGVyc29uYSBOb3QgVmFsaWRhdGVkMB4XDTk5MDMwNTAwMDAw
MFoXDTk5MDUwNDIzNTk1OVowggEKMRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UE
CxMWVmVyaVNpZ24gVHJ1c3QgTmV0d29yazFGMEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9y
ZXBvc2l0b3J5L1JQQSBJbmNvcnAuIGJ5IFJlZi4sTElBQi5MVEQoYyk5ODEeMBwGA1UECxMV
UGVyc29uYSBOb3QgVmFsaWRhdGVkMSYwJAYDVQQLEx1EaWdpdGFsIElEIENsYXNzIDEgLSBO
ZXRzY2FwZTETMBEGA1UEAxQKTWlrZSBPbHNvbjEpMCcGCSqGSIb3DQEJARYabWlrZS5vbHNv
bkBmb3VydGhvdWdodC5jb20wgZ8wDQYJKoZIhvcNAQEBBQADgY0AMIGJAoGBANKGswZUnQ/B
IfNlZWIIy6G6AkyjYgPRhXynebPtI5ARMq9xDo2zgLgWE+8QffdoZp2hUnTpm63B6cG8yqH1
PnA/7SB2roIfml1vnOwXgNuBctciTmnrac4GWgL0CM9839fJZh47QIVYPlCbOPtnvnH1NGGD
jFWAVX7vmES72Dl9AgMBAAGjggGPMIIBizAJBgNVHRMEAjAAMIGsBgNVHSAEgaQwgaEwgZ4G
C2CGSAGG+EUBBwEBMIGOMCgGCCsGAQUFBwIBFhxodHRwczovL3d3dy52ZXJpc2lnbi5jb20v
Q1BTMGIGCCsGAQUFBwICMFYwFRYOVmVyaVNpZ24sIEluYy4wAwIBARo9VmVyaVNpZ24ncyBD
UFMgaW5jb3JwLiBieSByZWZlcmVuY2UgbGlhYi4gbHRkLiAoYyk5NyBWZXJpU2lnbjARBglg
hkgBhvhCAQEEBAMCB4AwgYYGCmCGSAGG+EUBBgMEeBZ2ZDQ2NTJiZDYzZjIwNDcwMjkyOTg3
NjNjOWQyZjI3NTA2OWM3MzU5YmVkMWIwNTlkYTc1YmM0YmM5NzAxNzQ3ZGE1ZDNmMjE0MWJl
YWRiMmJkMmU4OTIxM2FlNmFmOWRmMTE0OTk5YTNiODQ1ZjlmM2VhNDUwYzAzBgNVHR8ELDAq
MCigJqAkhiJodHRwOi8vY3JsLnZlcmlzaWduLmNvbS9jbGFzczEuY3JsMA0GCSqGSIb3DQEB
BAUAA4GBAIuxBeIOBMHbj5yM/Vu4UJxDcz4Xtc7h0K8c6d82SiwwKLN5Gbew69PevcN6Ak+p
D8LO4NyCH8Cfu3acoT0Efi99XjWvdi2eSbDJUw6MvgJtnAfY03zM+Cf31A/1iyrvr3hD45/c
yhUNRh8f6qX1NzeKvvh5AcYD1bsi+0wnP0D8MIIDLjCCApegAwIBAgIRANJ2Lo0UDD19sqgl
Xa/uDXUwDQYJKoZIhvcNAQECBQAwXzELMAkGA1UEBhMCVVMxFzAVBgNVBAoTDlZlcmlTaWdu
LCBJbmMuMTcwNQYDVQQLEy5DbGFzcyAxIFB1YmxpYyBQcmltYXJ5IENlcnRpZmljYXRpb24g
QXV0aG9yaXR5MB4XDTk4MDUxMjAwMDAwMFoXDTA4MDUxMjIzNTk1OVowgcwxFzAVBgNVBAoT
DlZlcmlTaWduLCBJbmMuMR8wHQYDVQQLExZWZXJpU2lnbiBUcnVzdCBOZXR3b3JrMUYwRAYD
VQQLEz13d3cudmVyaXNpZ24uY29tL3JlcG9zaXRvcnkvUlBBIEluY29ycC4gQnkgUmVmLixM
SUFCLkxURChjKTk4MUgwRgYDVQQDEz9WZXJpU2lnbiBDbGFzcyAxIENBIEluZGl2aWR1YWwg
U3Vic2NyaWJlci1QZXJzb25hIE5vdCBWYWxpZGF0ZWQwgZ8wDQYJKoZIhvcNAQEBBQADgY0A
MIGJAoGBALtaRIoEFrtV/QN6ii2UTxV4NrgNSrJvnFS/vOh3Kp258Gi7ldkxQXB6gUu5SBNW
LccI4YRCq8CikqtEXKpC8IIOAukv+8I7u77JJwpdtrA2QjO1blSIT4dKvxna+RXoD4e2HOPM
xpqOf2okkuP84GW6p7F+78nbN2rISsgJBuSZAgMBAAGjfDB6MBEGCWCGSAGG+EIBAQQEAwIB
BjBHBgNVHSAEQDA+MDwGC2CGSAGG+EUBBwEBMC0wKwYIKwYBBQUHAgEWH3d3dy52ZXJpc2ln
bi5jb20vcmVwb3NpdG9yeS9SUEEwDwYDVR0TBAgwBgEB/wIBADALBgNVHQ8EBAMCAQYwDQYJ
KoZIhvcNAQECBQADgYEAiLg3O93alDcAraqf4YEBcR6Sam0v9vGd08pkONwbmAwHhluFFWoP
uUmFpJXxF31ntH8tLN2aQp7DPrSOquULBt7yVir6M8e+GddTTMO9yOMXtaRJQmPswqYXD11Y
Gkk8kFxVo2UgAP0YIOVfgqaxqJLFWGrBjQM868PNBaKQrm4xggI8MIICOAIBATCB4TCBzDEX
MBUGA1UEChMOVmVyaVNpZ24sIEluYy4xHzAdBgNVBAsTFlZlcmlTaWduIFRydXN0IE5ldHdv
cmsxRjBEBgNVBAsTPXd3dy52ZXJpc2lnbi5jb20vcmVwb3NpdG9yeS9SUEEgSW5jb3JwLiBC
eSBSZWYuLExJQUIuTFREKGMpOTgxSDBGBgNVBAMTP1ZlcmlTaWduIENsYXNzIDEgQ0EgSW5k
aXZpZHVhbCBTdWJzY3JpYmVyLVBlcnNvbmEgTm90IFZhbGlkYXRlZAIQDgmPHGHkkDjm7Ocy
sg5lkTAJBgUrDgMCGgUAoIGxMBgGCSqGSIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcN
AQkFMQ8XDTk5MDQyNzE2NTk0OVowIwYJKoZIhvcNAQkEMRYEFEN1/IDy3e1CsJ5I0lu5OZbb
ueHtMFIGCSqGSIb3DQEJDzFFMEMwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMAcGBSsO
AwIHMA0GCCqGSIb3DQMCAgFAMA0GCCqGSIb3DQMCAgEoMA0GCSqGSIb3DQEBAQUABIGAEEjC
AydymAhSkIi0XqXvCpulQYxV8kC7bJLdKZAU1bFgtehnlZDXFyPVy89jttSKPxD+x+HPpaiQ
HvZDXO261Brw4L3Os8FfTH+jv53Gd3udeBYZbD/bed9I6pzrdyP2/PK+yyPangWa+jpgK0F5
IeYxHh5HWoqm6vYyJWhoFxc=
--------------ms95352A88316FA873F0E7C460--


From skip@mojam.com (Skip Montanaro)  Tue Apr 27 18:23:13 1999
From: skip@mojam.com (Skip Montanaro) (skip@mojam.com (Skip Montanaro))
Date: Tue, 27 Apr 1999 13:23:13 -0400
Subject: [XML-SIG] XML package speed (or lack thereof...)?
Message-ID: <199904271723.NAA27170@cm-29-94-14.nycap.rr.com>

I'm using XML-RPC to provide an over-the-net API to Python, Perl and Java
clients on my server.  I'm currently using a hacked up version of Fredrik
Lundh's xmlrpclib module.  The hacking part involved writing a C module to
do the low-level encoding and decoding so it was fast enough for my
purposes.  This library only does XML-RPC, nothing else.

Ideally, I'd like to dump my XML-RPC-specific code in favor of something
more general, robust and better supported.  Accordingly, I downloaded the
0.5.1 version of the xml-sig package today and gave it a whirl.  After
making a couple small mods to marshal/generic/test:

    def test(load, loads, dump, dumps, test_values,
	     do_assert = 1):
	# Try all the above bits of data
	try: from cStringIO import StringIO
	except ImportError: from StringIO import StringIO
	import time

	t = time.time()
	for i in range(10):
	    for item in test_values:
		s = dumps(item)
		#print item, s
		output = loads(s)
		# Try it from a file
		file = StringIO()
		dump(item, file)
		file.seek(0)
		output2 = load(file)

		if do_assert:
		    assert item==output and item==output2 and output==output2
	t = time.time() - t
	print "total time: %.2f seconds" % t
	print "time per pass: %.2f seconds" % (t/10)

and commenting out a print statement in marshal/xmlrpc/XMLRPCUnmarshaller/
um_end_dictionary, I was able to run the test without any spurious messages.
I got the following output on my 100 MHz Pentium (Python 1.5.1, RH Linux
5.0):

    >>> xml.marshal.xmlrpc.runtests ()
    Testing XML-RPC marshalling...
    total time: 9.77 seconds
    time per pass: 0.98 seconds

This is hardly what I would call blazing speed (perhaps 30-100x slower than
what I currently get), especially considering the small size of the test
data, so I thought perhaps I was missing something - an optional C library
perhaps?  I see that Fredrik's sgmlop module was built and installed, but my
guess is that it's not being used.

Thx,

Skip Montanaro	| Mojam: "Uniting the World of Music" http://www.mojam.com/
skip@mojam.com  | Musi-Cal: http://www.musi-cal.com/
518-372-5583


From Fred L. Drake, Jr." <fdrake@acm.org  Tue Apr 27 19:04:02 1999
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Tue, 27 Apr 1999 14:04:02 -0400 (EDT)
Subject: [XML-SIG] Python DOM Unification -- level
In-Reply-To: <3725ED05.1FF4DA9A@fourthought.com>
References: <3724CC49.AAB857A5@prescod.net>
 <14116.55422.189139.235663@amarok.cnri.reston.va.us>
 <3724E2A1.62223458@prescod.net>
 <14117.52230.551462.836651@weyr.cnri.reston.va.us>
 <3725D245.C703951B@fourthought.com>
 <14117.56336.632205.967452@weyr.cnri.reston.va.us>
 <3725ED05.1FF4DA9A@fourthought.com>
Message-ID: <14117.64530.986628.424144@weyr.cnri.reston.va.us>

Mike Olson writes:
 > We didn't want to pollute the Document API with all of these extra factory
 > methods.  We moved all of the stuff that you cannot build from a document
...
 > I think the factory methods would have to be DOM implementation specific.
...
 > I don't think that all python implementations should share base classes and
 > NodeLists, et al.  Each should have thier own implementation tailored to its
 > purpose, ie speed, orbed, lightweight

Mike,
  I think we agree.  ;-)  I'm happy with using a factory object to
gain access to node construction, and don't really care that much if
it's a separate object from the document object.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives


From Jeff.Johnson@icn.siemens.com  Tue Apr 27 22:31:55 1999
From: Jeff.Johnson@icn.siemens.com (Jeff.Johnson@icn.siemens.com)
Date: Tue, 27 Apr 1999 17:31:55 -0400
Subject: [XML-SIG] DOM normalize() broken? entity refs lost?
Message-ID: <85256760.007644BA.00@li01.lm.ssc.siemens.com>


Entity references and any other tags covered by
xml.dom.writer.Walker.doOtherNode() are thrown away when written to a file using
XmlWriter or its subclass HtmlWriter.  XmlWriter does not define .doOtherNode()
so nothing gets written.  I noticed it when bullets, registration marks, and
apostrophes started disappearing from my HTML files.  I haven't tried to write
the code for XmlWriter.doOtherNode() yet, maybe you gurus could do it much
better than I can... :)

Last week I asked how to find simple strings in adjacent text nodes and was
advized to use Element.normalize().  I tried it and unless I'm doing it wrong,
it doesn't seem to work.

I've included a test script that demonstrates both problems:

#============== SCRIPT STARTS HERE ===========================
import sys, os
from xml.dom.utils import FileReader
from xml.dom.writer import HtmlWriter
from StringIO import StringIO

html = """
<HTML>
<!-- Comments blah blah blah -->
<HEAD>
<TITLE>test</TITLE>
</HEAD>
<BODY >
<P>Registered entity gets thrown away: &reg;</P>
<P>Text on multiple
lines and with extra white         space in the
raw HTML doesn't change when dom.get_documentElement().normalize() is called.
</P>
</BODY>
</HTML>
"""

fr = FileReader()
dom = fr.readStream(StringIO(html),'HTML')
dom.get_documentElement().normalize()
w = HtmlWriter()
w.write(dom)


From bslesins@best.com  Wed Apr 28 02:26:04 1999
From: bslesins@best.com (Brian Slesinsky)
Date: Tue, 27 Apr 1999 18:26:04 -0700 (PDT)
Subject: [XML-SIG] checking syntax with xmllib
Message-ID: <Pine.BSF.4.10.9904271814500.22068-100000@shell7.ba.best.com>

Hi, I tried using xmllib to check if an XML document is well-formed and
found some bugs.

If I use xmllib from Python 1.5.2, it complains about invalid characters.
However, I'm fairly sure I'm using correct UTF8 encoding (the document
contains European characters and was converted to Unicode from
ISO-8859-1). It looks like the 'illegal' regular expression in xmllib is
incorrect.

I also tried xml.parsers.xmllib from Python/XML 0.5.1, but it doesn't seem
to be doing any syntax checking at all - I tried a file with one close tag
and it didn't complain.

Here's the script I'm using to do the tests:

#!/nuvo/bin/python

import sys
from xml.parsers.xmllib import XMLParser

def check_xml(file):
    x = XMLParser()
    f = open(file)

    while 1:
        line = f.readline()
        if line=="": break
        x.feed(line)

check_xml(sys.argv[1])


- Brian Slesinsky


From akuchlin@cnri.reston.va.us  Wed Apr 28 03:41:53 1999
From: akuchlin@cnri.reston.va.us (A.M. Kuchling)
Date: Tue, 27 Apr 1999 22:41:53 -0400
Subject: [XML-SIG] DOM normalize() broken? entity refs lost?
In-Reply-To: <85256760.007644BA.00@li01.lm.ssc.siemens.com>
References: <85256760.007644BA.00@li01.lm.ssc.siemens.com>
Message-ID: <199904280241.WAA00900@207-172-184-212.s212.tnt23.brd.va.dialup.rcn.com>

Jeff.Johnson@icn.siemens.com writes:
 > XmlWriter does not define .doOtherNode()
 > so nothing gets written.  

	Eek! You're right.  Try this patch:

Index: writer.py
===================================================================
RCS file: /home/cvsroot/xml/dom/writer.py,v
retrieving revision 1.8
diff -C2 -r1.8 writer.py
*** writer.py	1999/04/08 00:14:29	1.8
--- writer.py	1999/04/28 02:29:42
***************
*** 119,123 ****
          self.stream.write(node.toxml())
  
! 
  class XmlLineariser(XmlWriter):
  
--- 119,125 ----
          self.stream.write(node.toxml())
  
!     def doOtherNode(self, node):
!         self.stream.write( node.toxml() )
!         
  class XmlLineariser(XmlWriter):
  
 > <P>Text on multiple
 > lines and with extra white         space in the
 > raw HTML doesn't change when dom.get_documentElement().normalize()

	Careful; that isn't what normalize() does.  Add another Text
node as a child of the TITLE element, to produce two Text nodes text
to each other.  dom.dump() will then output:

<DOM Document; root=<Element 'HTML'> >
 ...
   <Element 'TITLE'>
    <Text node 'test'>
    <Text node 'ADDED TEXT'>
   <Text node '\012'>

After calling normalize:
<DOM Document; root=<Element 'HTML'> >
 ...
   <Element 'TITLE'>
    <Text node 'testADDED TEXT'>
   <Text node '\012'>

See how the two text nodes have been merged?  It doesn't do anything
about whitespace.

To strip out whitespace, look at strip_whitespace or
collapse_whitespace in xml.dom.utils; after collapse_whitespace(dom,
WS_INTERNAL), runs of whitespace are collapsed down to a single space.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
Guards! Guards! Stop this madman! He's turning everyone into monkeys!
    -- A sudden intrusion, in ZOT! #1


From paul@prescod.net  Wed Apr 28 17:19:26 1999
From: paul@prescod.net (Paul Prescod)
Date: Wed, 28 Apr 1999 11:19:26 -0500
Subject: [XML-SIG] Another SAX Suggestion
References: <Pine.BSF.4.10.9904271814500.22068-100000@shell7.ba.best.com>
Message-ID: <3727350E.6B51E1ED@prescod.net>

I would like to suggest the default error handlers do something useful:

    def error(self, exception):
        "Handle a recoverable error."
        sys.stderr.write( "Error: "+ exception )

    def fatalError(self, exception):
        "Handle a non-recoverable error."
        sys.stderr.write( "Fatal Error: "+ exception )

    def warning(self, exception):
        "Handle a warning."
        sys.stderr.write( "Warning: "+ exception )

Of course if that's not what a particular implementation wants, they can
override it, but I think that the current lack of behavior is
non-intuitive. Maybe I'm corrupted by working with SGML tools but I expect
the defaults to be as above.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

"Microsoft spokesman Ian Hatton admits that the Linux system would have
performed better had it been tuned."
"Future press releases on the issue will clearly state that the research
was sponsored by Microsoft."
  http://www.itweb.co.za/sections/enterprise/1999/9904221410.asp


From Jeff.Johnson@icn.siemens.com  Wed Apr 28 18:21:04 1999
From: Jeff.Johnson@icn.siemens.com (Jeff.Johnson@icn.siemens.com)
Date: Wed, 28 Apr 1999 13:21:04 -0400
Subject: [XML-SIG] DOM normalize() broken? entity refs lost?
Message-ID: <85256761.005F477A.00@li01.lm.ssc.siemens.com>


Thanks for the entity reference fix Andrew.  It now saves "&reg;" but it still
loses things like "&#8217;".  I think this is Unicode generated from the RTF to
HTML filter I'm using, and while I can change the RTF to HTML character
translation table to convert RTF "quoteright" to "'" instead of "&#8217;", I'm
curious where the entity ref is going.  I put some debug statements in
HtmlBuilder.handle_entityref() but it never gets called.  I know there is
controversy over Unicode support but I don't know enough about it to know what
to expect in this case.

A new script is included:

import sys, os
from StringIO import StringIO

from xml.dom import utils
from xml.dom.writer import HtmlWriter, XmlWriter

html = """
<P>Don&#8217;t</P>
"""
# This works with Andrew's patch but the unicode single quote still vanishes
without a trace.
#<P>Registered &reg;</P>

fr = utils.FileReader()
dom = fr.readStream(StringIO(html),'HTML')
w = XmlWriter()
w.write(dom)


From akuchlin@cnri.reston.va.us  Wed Apr 28 18:39:42 1999
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Wed, 28 Apr 1999 13:39:42 -0400 (EDT)
Subject: [XML-SIG] Another SAX Suggestion
In-Reply-To: <3727350E.6B51E1ED@prescod.net>
References: <Pine.BSF.4.10.9904271814500.22068-100000@shell7.ba.best.com>
 <3727350E.6B51E1ED@prescod.net>
Message-ID: <14119.17665.211348.533470@amarok.cnri.reston.va.us>

Paul Prescod writes:
>I would like to suggest the default error handlers do something useful:

	Agreed; the general Python philosophy is to make noise when
something is unexpectedly, rather than making some assumption and
charging onward.  Printing an error message seems to be the right
level of noise for parsing errors; they could raise an exception and
terminate further processing (and actually I wouldn't mind that
either), but printing a message seems sufficient.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
    Principally I played pedants, idiots, old fathers, and drunkards.
    As you see, I had a narrow escape from becoming a professor.
    -- Robertson Davies, "Shakespeare over the Port"


From Lutz.Ehrlich@EMBL-Heidelberg.de  Fri Apr 30 10:56:51 1999
From: Lutz.Ehrlich@EMBL-Heidelberg.de (Lutz.Ehrlich@EMBL-Heidelberg.de)
Date: Fri, 30 Apr 1999 11:56:51 +0200 (MDT)
Subject: [XML-SIG] XQL: Somebody working on it?
Message-ID: <14121.31687.843895.101080@cuckoo.EMBL-Heidelberg.DE>

G'day all,


as I didn't find anything in the recent CVS source for the xml
package, I wondered whether somebody is currently working on
implementing XQL (http://metalab.unc.edu/xql/) ? Before I start doing
anything myself, I would like to hear your opinion about such a
thing. Would implementation be a big thing? Have you guys discussed
implementing any of the query language proposals already? 

Any comments are most welcome,

	Lutz

______________________________________________________________________

Lutz Ehrlich		 web  : http://www.embl-heidelberg.de/~ehrlich
			 email:	lutz.ehrlich@embl-heidelberg.de

European Molecular Biology Laboratory		phone: +49-6221-387-140
Meyerhofstr. 1					fax  : +49-6221-387-517
D-69012 Heidelberg, Germany				


From Jeff.Johnson@icn.siemens.com  Fri Apr 30 15:13:16 1999
From: Jeff.Johnson@icn.siemens.com (Jeff.Johnson@icn.siemens.com)
Date: Fri, 30 Apr 1999 10:13:16 -0400
Subject: [XML-SIG] unicode entitie refs
Message-ID: <85256763.004E13CB.00@li01.lm.ssc.siemens.com>


Sorry to be a pest but I never got a response on the following email and was
hoping someone had an answer as to why unicode entity refs dissapear in PyDom.

After I write this I'll start looking at the SAX code, maybe I have to install
error handlers?  Any suggestions?

Thanks,
Jeff


---------------------- Forwarded by Jeff Johnson/Service/ICN on 04/30/99 10:07
AM ---------------------------


Jeff Johnson
04/28/99 01:21 PM

To:   akuchlin@cnri.reston.va.us
cc:   xml-sig@python.org
Subject:  Re: [XML-SIG] DOM normalize() broken? entity refs lost?  (Document
      link not converted)

Thanks for the entity reference fix Andrew.  It now saves "&reg;" but it still
loses things like "&#8217;".  I think this is Unicode generated from the RTF to
HTML filter I'm using, and while I can change the RTF to HTML character
translation table to convert RTF "quoteright" to "'" instead of "&#8217;", I'm
curious where the entity ref is going.  I put some debug statements in
HtmlBuilder.handle_entityref() but it never gets called.  I know there is
controversy over Unicode support but I don't know enough about it to know what
to expect in this case.

A new script is included:

import sys, os
from StringIO import StringIO

from xml.dom import utils
from xml.dom.writer import HtmlWriter, XmlWriter

html = """
<P>Don&#8217;t</P>
"""
# This works with Andrew's patch but the unicode single quote still vanishes
without a trace.
#<P>Registered &reg;</P>

fr = utils.FileReader()
dom = fr.readStream(StringIO(html),'HTML')
w = XmlWriter()
w.write(dom)


From paul@prescod.net  Fri Apr 30 15:09:49 1999
From: paul@prescod.net (Paul Prescod)
Date: Fri, 30 Apr 1999 09:09:49 -0500
Subject: [XML-SIG] XQL: Somebody working on it?
References: <14121.31687.843895.101080@cuckoo.EMBL-Heidelberg.DE>
Message-ID: <3729B9AD.C2D86911@prescod.net>

Lutz.Ehrlich@EMBL-Heidelberg.de wrote:
> 
> G'day all,
> 
> as I didn't find anything in the recent CVS source for the xml
> package, I wondered whether somebody is currently working on
> implementing XQL (http://metalab.unc.edu/xql/) ? Before I start doing
> anything myself, I would like to hear your opinion about such a
> thing. Would implementation be a big thing? Have you guys discussed
> implementing any of the query language proposals already?

XSL implicitly depends on a query language. It isn't defined separately
from XSL but it is defined in the XSL specification. That query language
actually has W3C standadization status and is needed for the Python XSL
implementation that is under development.

XQL is sort of like that language -- but not quite, and not standardized.
I think that before XQL becomes any kind of standard it would have to be
aligned with XSL's query language. Therefore you can choose yourself
whether you want to implement it in the meantime or not. It all depends on
whether you want to work on something that will likely be obsolete in a
year or not....in the XML world a year is a lifetime so maybe that's a
good tradeoff.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

"Microsoft spokesman Ian Hatton admits that the Linux system would have
performed better had it been tuned."
"Future press releases on the issue will clearly state that the research
was sponsored by Microsoft."
  http://www.itweb.co.za/sections/enterprise/1999/9904221410.asp


From paul@prescod.net  Fri Apr 30 15:10:05 1999
From: paul@prescod.net (Paul Prescod)
Date: Fri, 30 Apr 1999 09:10:05 -0500
Subject: [XML-SIG] XQL: Somebody working on it?
References: <14121.31687.843895.101080@cuckoo.EMBL-Heidelberg.DE>
Message-ID: <3729B9BD.689AB4F8@prescod.net>

Lutz.Ehrlich@EMBL-Heidelberg.de wrote:
> 
> G'day all,
> 
> as I didn't find anything in the recent CVS source for the xml
> package, I wondered whether somebody is currently working on
> implementing XQL (http://metalab.unc.edu/xql/) ? Before I start doing
> anything myself, I would like to hear your opinion about such a
> thing. Would implementation be a big thing? Have you guys discussed
> implementing any of the query language proposals already?

XSL implicitly depends on a query language. It isn't defined separately
from XSL but it is defined in the XSL specification. That query language
actually has W3C standadization status and is needed for the Python XSL
implementation that is under development.

XQL is sort of like that language -- but not quite, and not standardized.
I think that before XQL becomes any kind of standard it would have to be
aligned with XSL's query language. Therefore you can choose yourself
whether you want to implement it in the meantime or not. It all depends on
whether you want to work on something that will likely be obsolete in a
year or not....in the XML world a year is a lifetime so maybe that's a
good tradeoff.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

"Microsoft spokesman Ian Hatton admits that the Linux system would have
performed better had it been tuned."
"Future press releases on the issue will clearly state that the research
was sponsored by Microsoft."
  http://www.itweb.co.za/sections/enterprise/1999/9904221410.asp


From wunder@infoseek.com  Fri Apr 30 16:51:19 1999
From: wunder@infoseek.com (Walter Underwood)
Date: Fri, 30 Apr 1999 08:51:19 -0700
Subject: [XML-SIG] Another SAX Suggestion
In-Reply-To: <3727350E.6B51E1ED@prescod.net>
References: <Pine.BSF.4.10.9904271814500.22068-100000@shell7.ba.best.com>
Message-ID: <3.0.5.32.19990430085119.00ad0c50@corp>

At 11:19 AM 4/28/99 -0500, Paul Prescod wrote:
>I would like to suggest the default error handlers do something useful:
>
>    def error(self, exception):
>        "Handle a recoverable error."
>        sys.stderr.write( "Error: "+ exception )

Since we write servers, we consider output to stderr from a library
to be a defect. Anybody else remember "RANGE ERROR" from the
C math library?

I had to rip out some stderr writes from pyexpat, too.

I wouldn't mind having a stderr error handler provided as part 
of the module, with sample code that uses that error handler.

Also along this line, does the SAX adaptor for expat catch all
exceptions raised in a handler? The Expat core doesn't know how
to propagate exceptions, so they need to be caught and reported 
locally. This is an interesting behavior difference between SAX
over different parser implementations (a pure-Python parser would
propagate the exceptions).

Sorry for the ignorance of SAX details -- our XML support shipped 
last September and I haven't gone back and re-coded to the portable 
interface.

wunder


--
Walter R. Underwood
wunder@infoseek.com
wunder@best.com (home)
http://software.infoseek.com/cce/ (my product)
http://www.best.com/~wunder/
1-408-543-6946