From akuchlin@cnri.reston.va.us Thu Apr 1 01:36:07 1999 From: akuchlin@cnri.reston.va.us (A.M. Kuchling) Date: Wed, 31 Mar 1999 20:36:07 -0500 Subject: [XML-SIG] PyXML 0.5.1 prerelease 1 Message-ID: <199904010136.UAA14036@207-172-38-113.s113.tnt8.ann.va.dialup.rcn.com> I've put up a pre-release of version 0.5.1 of the XML package. Please try it out and report any minor errors, glitches, or installation nits. After one or two iterations, I'll remove the "pre-release" designation and announce it more widely. It's available in .tgz and .zip format: http://www.python.org/sigs/xml-sig/files/xml-0.5.1pre1.tgz http://www.python.org/sigs/xml-sig/files/xml051pre1.zip (Also available at the python.org mirrors, of course.) I haven't written up a list of the changes yet, but will do that for the next pre-release. -- A.M. Kuchling http://starship.python.net/crew/amk/ For non-deterministic read "Inhabited by pixies." -- Anonymous From gstein@lyra.org Thu Apr 1 02:51:12 1999 From: gstein@lyra.org (Greg Stein) Date: Wed, 31 Mar 1999 18:51:12 -0800 Subject: [XML-SIG] updated "quick parser" ... qp_xml.py Message-ID: <3702DF20.5C10EB14@lyra.org> This is a multi-part message in MIME format. --------------6625E7BC3519C1A232100425 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hey there... From that speed test thing that I posted a few days ago, I extracted an actual module. At the same time, I also simplified some of the namespace stuff and corrected several bugs w.r.t. default namespaces. As I mentioned, this guys is about 12x faster than using the DOM to parse XML (when both are using pyexpat). It also handles namespaces and xml:lang properly. Comments/patches are encouraged. thx -g -- Greg Stein, http://www.lyra.org/ --------------6625E7BC3519C1A232100425 Content-Type: text/plain; charset=us-ascii; name="qp_xml.py" Content-Disposition: inline; filename="qp_xml.py" Content-Transfer-Encoding: 7bit # # qp_xml: Quick Parsing for XML # import string try: import pyexpat except ImportError: from xml.parsers import pyexpat error = __name__ + '.error' # # The parsing class. Instantiate and pass a string/file to .parse() # class Parser: def __init__(self): self.reset() def reset(self): self.root = None self.cur_elem = None self.error = None def find_prefix(self, prefix): elem = self.cur_elem while elem: if elem.ns_scope.has_key(prefix): return elem.ns_scope[prefix] elem = elem.parent if prefix == '': return '' # empty URL for "no namespace" return None def process_prefix(self, ob, use_default): idx = string.find(ob.name, ':') if idx == -1: if use_default: ob.ns = self.find_prefix('') else: ob.ns = '' # no namespace elif string.lower(ob.name[:3]) == 'xml': ob.ns = '' # name is reserved by XML. don't break out a NS. else: ob.ns = self.find_prefix(ob.name[:idx]) ob.name = ob.name[idx+1:] if ob.ns is None: self.error = 'namespace prefix not found' return def start(self, name, attrs): if self.error: return elem = _element(name=name, lang=None, parent=None, children=[], ns_scope={}, attrs=[], first_cdata='', following_cdata='') if self.cur_elem: elem.parent = self.cur_elem elem.parent.children.append(elem) self.cur_elem = elem else: self.cur_elem = self.root = elem # scan for namespace declarations (and xml:lang while we're at it) for i in range(0, len(attrs), 2): name = attrs[i] value = attrs[i+1] if name == 'xmlns': elem.ns_scope[''] = value elif name[:6] == 'xmlns:': elem.ns_scope[name[6:]] = value elif name == 'xml:lang': elem.lang = value else: attr = _attribute(name=name, value=value) elem.attrs.append(attr) # inherit xml:lang from parent if elem.lang is None and elem.parent: elem.lang = elem.parent.lang # process prefix of the element name self.process_prefix(elem, 1) # process attributes' namespace prefixes for attr in elem.attrs: self.process_prefix(attr, 0) def end(self, name): if self.error: return parent = self.cur_elem.parent del self.cur_elem.ns_scope del self.cur_elem.parent self.cur_elem = parent def cdata(self, data): if self.error: return elem = self.cur_elem if elem.children: last = elem.children[-1] last.following_cdata = last.following_cdata + data else: elem.first_cdata = elem.first_cdata + data def parse(self, input): self.reset() p = pyexpat.ParserCreate() p.StartElementHandler = self.start p.EndElementHandler = self.end p.CharacterDataHandler = self.cdata try: if type(input) == type(''): rv = p.Parse(input, 1) else: while 1: s = input.read(_BLOCKSIZE) if not s: rv = p.Parse('', 1) break rv = p.Parse(s, 0) if rv == 0 or self.error: break if rv == 0: s = pyexpat.ErrorString(p.ErrorCode) raise error, 'expat parsing error: ' + s if self.error: raise error, self.error finally: _clean_tree(self.root) return self.root # # handy function for dumping a tree that is returned by Parser # def dump(f, root): f.write('\n') namespaces = _collect_ns(root) _dump_recurse(f, root, namespaces, 1) f.write('\n') # # This function returns the element's CDATA. Note: this is not recursive -- # it only returns the CDATA immediately within the element, excluding the # CDATA in child elements. # def textof(elem): s = elem.first_cdata for child in elem.children: s = s + child.following_cdata return s ######################################################################### # # private stuff for qp_xml # _BLOCKSIZE = 16384 # chunk size for parsing input class _blank: def __init__(self, **kw): self.__dict__.update(kw) class _element(_blank): pass class _attribute(_blank): pass def _clean_tree(elem): elem.parent = None del elem.parent map(_clean_tree, elem.children) def _collect_recurse(elem, dict): dict[elem.ns] = None for attr in elem.attrs: dict[attr.ns] = None for child in elem.children: _collect_recurse(child, dict) def _collect_ns(elem): "Collect all namespaces into a NAMESPACE -> PREFIX mapping." d = { '' : None } _collect_recurse(elem, d) del d[''] # make sure we don't pick up no-namespace entries keys = d.keys() for i in range(len(keys)): d[keys[i]] = i return d def _dump_recurse(f, elem, namespaces, dump_ns=0): if elem.ns: f.write('' + elem.first_cdata) for child in elem.children: _dump_recurse(f, child, namespaces) f.write(child.following_cdata) if elem.ns: f.write('' % (namespaces[elem.ns], elem.name)) else: f.write('' % elem.name) else: f.write('/>') --------------6625E7BC3519C1A232100425-- From Jeff.Johnson@icn.siemens.com Thu Apr 1 20:38:15 1999 From: Jeff.Johnson@icn.siemens.com (Jeff.Johnson@icn.siemens.com) Date: Thu, 1 Apr 1999 15:38:15 -0500 Subject: [XML-SIG] HtmlBuilder - uses sgmllib, can it use sax/pyexpat? Message-ID: <85256746.007108C1.00@li01.lm.ssc.siemens.com> Now that I've delivered my Beta CD's to reproduction, I can take a breath and try to optimize my conversion programs. I was reading Greg Stein's quick XML parser and started to see if I could use it. That was when I realized that most of what I do is read HTML files via xml.dom.html_builder.HtmlBuilder and it uses sgmllib. Assuming that pure python sgmllib is slower than pyexpat which uses C code, I wondered if there was a way to make HtmlBuilder use SAX and the default pyexpat parser. After taking a *very* quick look at the SAX and sgmllib parser interfaces, it seems like a trivial matter to modify HtmlBuilder to use SAX. Is this true and would it be faster? I know very little about these parsers so forgive me if my suggestion is just plain stupid :) To Greg: Most of my code uses DOM so I'm not sure if I could use your parser. Would it be possible to add a DOM interface (or subset) to the objects it creates? To Andrew: I've found a bug in the XML 0.5.1 package: The xml/CREDITS file lists me (which I was pleasantly surprised to see) and ONLY me. I figure the guys that wrote the library (you included) might also be included in the credits. Thanks for putting me in there though :) Cheers, Jeff From gstein@lyra.org Thu Apr 1 21:00:39 1999 From: gstein@lyra.org (Greg Stein) Date: Thu, 01 Apr 1999 13:00:39 -0800 Subject: [XML-SIG] Re: HtmlBuilder - uses sgmllib, can it use sax/pyexpat? References: <85256746.007108C1.00@li01.lm.ssc.siemens.com> Message-ID: <3703DE77.1BB9DA6A@lyra.org> Jeff.Johnson@icn.siemens.com wrote: >... > To Greg: Most of my code uses DOM so I'm not sure if I could use your parser. > Would it be possible to add a DOM interface (or subset) to the objects it > creates? It would be possible, but it is important to note that DOM compatibility was specifically excluded from its design principles. That's how come it can go so much faster :-). Basically, it just presents an alternative data representation for XML. Actually, if it *just* exported the API, but no change was made to how the structure is built, then it would probably work fine. Note that Andrew is testing a similar technique for DOM building: skip the API and smack the underlying data structure. Ooh. And I just saw a way to make qp_xml a bit faster. The attribute handling shouldn't create separate objects. I should create a mapping of (ns, name) -> value. That will help during lookup, too. Andrew: I was thinking this might be a nice alternative mechanism that can go into the XML package. Where would it go? Maybe call it xml.parsers.quick or something. Of course, it isn't as quick as plain pyexpat :-), but then it also isn't a parser in the same sense as those. Under util? Cheers, -g -- Greg Stein, http://www.lyra.org/ From Jeff.Johnson@icn.siemens.com Mon Apr 5 18:18:17 1999 From: Jeff.Johnson@icn.siemens.com (Jeff.Johnson@icn.siemens.com) Date: Mon, 5 Apr 1999 13:18:17 -0400 Subject: [XML-SIG] dom.utils.FileReader & HtmlBuilder Message-ID: <8525674A.005EB60E.00@li01.lm.ssc.siemens.com> Could we have FileReader.readHtml() ignore mismatched end tags by default? At the moment, there is no way to ignore them at all using FileReader. One of the problems with FileReader is that there aren't a lot of ways to customize it without subclassing it. Since it is made to be extremely simple to use, I figure it should fix up mismatched end tags by default. Is the fixup for the parser not being freed still required? Has that been fixed? def readHtml(self,stream,ignore_mismatched_end_tags=1): from xml.dom import html_builder b = html_builder.HtmlBuilder(ignore_mismatched_end_tags) b.feed(stream.read()) b.close() doc = b.document # There was some bug that prevents the builder from # freeing itself (maybe it has already been fixed?). # The next two lines break its references to the DOM # tree so that it can be freed. b.document = None b.current_element = None return doc Thanks, Jeff From Jeff.Johnson@icn.siemens.com Tue Apr 6 19:07:11 1999 From: Jeff.Johnson@icn.siemens.com (Jeff.Johnson@icn.siemens.com) Date: Tue, 6 Apr 1999 14:07:11 -0400 Subject: [XML-SIG] raising exceptions in dom.core Message-ID: <8525674B.00632EE5.00@li01.lm.ssc.siemens.com> The following code shows two class based exceptions but in the first, the message is passed along as an argument to 'raise' while in the second, the message is given in the constructor of the exception. Should this be changed to use the constructor in both cases? if self.readonly: raise NoModificationAllowedException, "Read-only node "+repr(self) self._checkChild(newChild, self) if newChild._document != self._document: raise WrongDocumentException("newChild %s created from a " "different document" % (repr(newChild),) ) From Jeff.Johnson@icn.siemens.com Wed Apr 7 18:22:27 1999 From: Jeff.Johnson@icn.siemens.com (Jeff.Johnson@icn.siemens.com) Date: Wed, 7 Apr 1999 13:22:27 -0400 Subject: [XML-SIG] xml.dom.writer doesn't work anymore Message-ID: <8525674C.005F1478.00@li01.lm.ssc.siemens.com> The following line: self.file.write(re.sub('\n+', '\n', s)) was removed from: class OutputStream: def write(self, s): #print 'write', `s` self.file.write(re.sub('\n+', '\n', s)) if s and s[-1] == '\n': self.new_line = 1 else: self.new_line = 0 I figure it was removed to get rid of the re.sub() but not the self.file.write() itself :) Currently, HtmlWriter and XmlWriter just create 0 byte files... This is from the XML 0.5.1 zip file... Cheers, Jeff From akuchlin@cnri.reston.va.us Thu Apr 8 01:27:36 1999 From: akuchlin@cnri.reston.va.us (A.M. Kuchling) Date: Wed, 7 Apr 1999 20:27:36 -0400 Subject: [XML-SIG] xml.dom.writer doesn't work anymore In-Reply-To: <8525674C.005F1478.00@li01.lm.ssc.siemens.com> References: <8525674C.005F1478.00@li01.lm.ssc.siemens.com> Message-ID: <199904080027.UAA00447@207-172-56-204.s204.tnt12.ann.va.dialup.rcn.com> Jeff.Johnson@icn.siemens.com writes: > The following line: > self.file.write(re.sub('\n+', '\n', s)) > was removed from This is what's called a brown-bag bug (because it makes the person who made want to wear a bag over their head). Fixed in the CVS. Has anyone noted other problems with the pre-release of 0.5.1? If not, I'll make new .tgz and .zip files with the above correction, and call it 0.5.1 final. -- A.M. Kuchling http://starship.python.net/crew/amk/ If it wasn't for the fact that a monster called the Head was plunging a metal pipe up his nose preparatory to sucking his brains out, Michael Smith could almost laugh. -- Opening sentence of ENIGMA #2: "The Truth" From akuchlin@cnri.reston.va.us Thu Apr 8 01:33:46 1999 From: akuchlin@cnri.reston.va.us (A.M. Kuchling) Date: Wed, 7 Apr 1999 20:33:46 -0400 Subject: [XML-SIG] raising exceptions in dom.core In-Reply-To: <8525674B.00632EE5.00@li01.lm.ssc.siemens.com> References: <8525674B.00632EE5.00@li01.lm.ssc.siemens.com> Message-ID: <199904080033.UAA00457@207-172-56-204.s204.tnt12.ann.va.dialup.rcn.com> Jeff.Johnson@icn.siemens.com writes: > The following code shows two class based exceptions but in the first, the > message is passed along as an argument to 'raise' while in the second, the > message is given in the constructor of the exception. Should this > be changed to use the constructor in both cases? It doesn't really matter; "raise exception, argument" is equivalent to "raise exception(argument)". See GvR's essay on exceptions at http://www.python.org/doc/essays/stdexceptions.html . For consistency, the DOM code should probably pick one of the two forms and stick with it; the exception(argument) form is probably the one to choose. Added to the TODO list. -- A.M. Kuchling http://starship.python.net/crew/amk/ "I didn't know that there was a downstairs, here." "There's a downstairs in everybody. That's where we live." -- Lyta and the youngest of the Three, in SANDMAN #58: "The Kindly Ones:2" From hgv@nsg0.network.com Thu Apr 8 18:24:02 1999 From: hgv@nsg0.network.com (Harry Varnis) Date: Thu, 08 Apr 1999 12:24:02 -0500 Subject: [XML-SIG] dtd error handling Message-ID: <370CE632.2741CCB3@network.com> Sorry if this isn't an appropriate forum for this, but here goes... I can't seem to get my ErrorHandler to be used for dtd errors. I'm using SAX + validating xmlproc. My ErrorHandler gets xml document errors OK, but for dtd errors, the methods of xmlproc's default Application get used. I've tried to sort through the module code (xml-0.5) but I quickly got tangled up :-) Can anyone help, please? Thanks, Harry Varnis Here is a traceback and some code snippets: Traceback (innermost last): File "/usr/local/apache/fastcgi-bin/serviceapp.py", line 236, in ? app.load(path) File "/usr/local/apache/fastcgi-bin/serviceapp.py", line 75, in load self.servicedata = servicedataparse.fromFile(f) File "/home/hgv/SSM/servicedataparse.py", line 103, in fromFile p.parseFile(file) File "/usr/lib/python1.5/site-packages/xml/sax/drivers/drv_xmlproc.py", line 2 9, in parseFile self.parser.read_from(file) File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlval.py", line 12 0, in read_from self.parser.read_from(file) File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlutils.py", line 143, in read_from self.feed(buf) File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlutils.py", line 189, in feed self.do_parse() File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlproc.py", line 2 88, in do_parse self.parse_doctype() File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlproc.py", line 6 57, in parse_doctype self.app.handle_doctype(rootname,pub_id,sys_id) File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlval.py", line 28 8, in handle_doctype p.parse_resource(sys_id) File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlutils.py", line 71, in parse_resource self.report_error(3000,sysID) File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlutils.py", line 374, in report_error self.err.fatal(msg) File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlapp.py", line 13 4, in fatal sys.exit(1) SystemExit: 1 class ServiceDataDocumentHandler(saxlib.HandlerBase): def __init__(self): saxlib.HandlerBase.__init__(self) self.serviceData = None def startElement(self, name, attrs): . . def endElement(self, name): . . def characters(self, ch, start, length): . . def error(self, exception): message = "Recoverable error: %s" % str(exception) . . def fatalError(self, exception): message = "Non-recoverable error: %s" % str(exception) . . raise exception def warning(self, exception): message = "Warning: %s" % str(exception) . . def fromFile(file): p = saxexts.XMLValParserFactory.make_parser() h = ServiceDataDocumentHandler() p.setDocumentHandler(h) p.setErrorHandler(h) p.setDTDHandler(h) p.parseFile(file) p.close() return h.serviceData From larsga@ifi.uio.no Fri Apr 9 21:43:58 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 09 Apr 1999 22:43:58 +0200 Subject: [XML-SIG] SAX2: Parser properties Message-ID: The first three properties come from the JavaSAX proposal, while the last one was invented by yours truly. http://xml.org/sax/properties/namespace-sep (write-only) Set the separator to be used between the URI part of a name and the local part of a name when namespace processing is being performed (see the http://xml.org/sax/features/namespaces feature). By default, the separator is a single space. This property may not be set while a parse is in progress (throws a SAXNotSupportedException). http://xml.org/sax/properties/dom-node (read-only) Get the DOM node currently being visited, if the SAX parser is iterating over a DOM tree. If the parser recognises and supports this property but is not currently visiting a DOM node, it should return null (this is a good way to check for availability before the parse begins). This property doesn't make much sense for Python, but I see no point in leaving it out, either. http://xml.org/sax/properties/xml-string (read-only) Get the literal string of characters associated with the current event. If the parser recognises and supports this property but is not currently parsing text, it should return null (this is a good way to check for availability before the parse begins). I stole this idea from Expat. In addition, I think PySAX needs the following property: http://python.org/sax/properties/data-encoding (read/write) This property can be used to control which character encoding is used for data events that come from the parser. In Java this is not an issue since all strings are Unicode, but in Python it is. Expat reports UTF-8, while xmlproc/xmllib just pass on whatever they're given. Do we need a special SAXEncodingNotSupportedException for this? Otherwise it may be impossible to tell whether the parser doesn't support this at all or whether it just doesn't support this particular encoding. --Lars M. From larsga@ifi.uio.no Fri Apr 9 21:44:50 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 09 Apr 1999 22:44:50 +0200 Subject: [XML-SIG] SAX2: Handler classes Message-ID: This list is just copied from the Java proposal. Does anyone think we should skip any of these or add any new ones? http://xml.org/sax/handlers/lexical Receive callbacks for comments, CDATA sections, and (possibly) entity references. http://xml.org/sax/handlers/dtd-decl Receive callbacks for element, attribute, and (possibly) parsed entity declarations. http://xml.org/sax/handlers/namespace Receive callbacks for the start and end of the scope of each namespace declaration. --Lars M. From Fred L. Drake, Jr." References: Message-ID: <14094.27329.189328.339983@weyr.cnri.reston.va.us> Lars Marius Garshol writes: > http://xml.org/sax/handlers/lexical > Receive callbacks for comments, CDATA sections, and (possibly) > entity references. Undecided; there are times when I think it would be nice to have these things, especially when trying to make minimal edits. > http://xml.org/sax/handlers/dtd-decl > Receive callbacks for element, attribute, and (possibly) parsed > entity declarations. > > http://xml.org/sax/handlers/namespace > Receive callbacks for the start and end of the scope of each > namespace declaration. Yes to both. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From Fred L. Drake, Jr." References: Message-ID: <14094.27411.350840.911404@weyr.cnri.reston.va.us> Lars Marius Garshol writes: > http://python.org/sax/properties/data-encoding (read/write) ... > Do we need a special SAXEncodingNotSupportedException for this? > Otherwise it may be impossible to tell whether the parser doesn't Yes; this needs to be available for reporting to the user. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From dieter@handshake.de Sat Apr 10 19:42:49 1999 From: dieter@handshake.de (Dieter Maurer) Date: Sat, 10 Apr 1999 20:42:49 +0200 Subject: [XML-SIG] [Ann] XSL-Pattern 0.03 released Message-ID: <199904101842.UAA01329@lindm.dm> I have released version 0.03 of my XSL-Pattern package. It implements the pattern sublanguage of the XSL working draft specification as of Dec 16, 1998. The package provides pattern matching and selection on HTML/XML/SGML document trees. Changes: * sevaral bugs fixed: - "test patterns with value" threw an exception - "ancestor in match pattern" threw an exception - ...//OtherNode startet above rather than at OtherNode when used as match pattern. (was correct as select pattern). * pattern objects now have a patternstring attribute; it is the string the object has been built from. * allows for customized pattern factories. This is interesting, if you want to use the parser infrastructure to build customized parsers. Such parsers build customized XSL pattern objects (by means of the factory). They can e.g. change the matching algorithm or work on a sequence of SAX events rather than DOM trees for selection. More information and download: URL:http://www.handshake.de/~dieter/pyprojects/xslpattern.html - Dieter From paul@prescod.net Sun Apr 11 05:19:56 1999 From: paul@prescod.net (Paul Prescod) Date: Sat, 10 Apr 1999 23:19:56 -0500 Subject: [XML-SIG] DOM API Message-ID: <371022EC.2E0A1F6@prescod.net> Am I right that there is a semi-offical, portably implemented SAX API for Python but there is no such beast for the DOM? * Is it reasonable to unify a subset of their interfaces? * Could 4XSL be written to use that interface so that it would work with both DOM implementations or do performance issues make that impossible? -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco By lumping computers and televisions together, as if they exerted a single malign influence, pessimists have tried to argue that the electronic revolution spells the end of the sort of literate culture that began with Gutenberg’s press. On several counts, that now seems the reverse of the truth. http://www.economist.com/editorial/freeforall/19-12-98/index_xm0015.html From Jean-Michel.Bruel@univ-pau.fr Wed Apr 14 17:13:23 1999 From: Jean-Michel.Bruel@univ-pau.fr (Jean-Michel BRUEL) Date: Wed, 14 Apr 1999 18:13:23 +0200 (MET DST) Subject: [XML-SIG] [CFP] <>'99 Message-ID: <199904141613.SAA26301@crisv4.univ-pau.fr> [apologies if you receive multiple copies of this announcement] ================================================================= 3rd Call for Papers <>'99 ================================================================= Second International Conference on the Unified Modeling Language October 28-30, 1999, Fort Collins, Colorado, USA (just before OOPSLA) ================================================================= http://www.cs.colostate.edu/UML99 ================================================================= Important dates (deadlines are hard!): Deadline for abstract 05 May 1999 Deadline for submission 15 May 1999 Notification to authors 15 July 1999 Final version of accepted papers 25 August 1999 Submissions: Submit your 10-15 page manuscript electronically in Postscript or pdf using the Springer LNCS style. Details are available at the conference web page. The <>'99 proceedings will be published by Springer-Verlag in the LNCS series. Further Information: Robert B. France E-mail: france@cs.colostate.edu Computer Science Department Tel: 970-491-6356 Colorado State University Fax: 970-491-2466 Fort Collins, CO 80523, USA Bernhard Rumpe E-mail: rumpe@in.tum.de Institut fuer Informatik Tel: 0049-89-289-28129 T. Universitaet Muenchen Fax: 0049-89-289-28183 80290 Muenchen, Germany Sponsored by IEEE Computer Society Technical Committee on Complexity in Computing In Cooperation with ACM SIGSOFT With the Support of OMG From akuchlin@cnri.reston.va.us Wed Apr 14 17:42:44 1999 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Wed, 14 Apr 1999 12:42:44 -0400 (EDT) Subject: [XML-SIG] PyXML 0.5.1final available, and T-shirts Message-ID: <199904141642.MAA10932@amarok.cnri.reston.va.us> I've put up the final release of PyXML 0.5.1: http://www.python.org/sigs/xml-sig/files/xml-0.5.1.tgz http://www.python.org/sigs/xml-sig/files/xml051.zip I won't start posting announcements until tomorrow; today is a busy day. On an unrelated note, I'd like to get a cool T-shirt design that links Python and XML. This is sparked by the T-shirt I got from filling out IBM's XML survey some months ago, which says ", you're it!". So, does anyone have a suggestion for a Python/XML design? -- A.M. Kuchling http://starship.python.net/crew/amk/ Whatever women do they must do twice as well as men to be thought half as good... luckily, it's not difficult. -- Charlotte Whitton From sean@digitome.com Thu Apr 15 10:53:34 1999 From: sean@digitome.com (Sean Mc Grath) Date: Thu, 15 Apr 1999 10:53:34 +0100 Subject: [XML-SIG] PyXML 0.5.1final available, and T-shirts In-Reply-To: <199904141642.MAA10932@amarok.cnri.reston.va.us> Message-ID: <3.0.6.32.19990415105334.0098ea10@gpo.iol.ie> [Andrew Kuchling] >On an unrelated note, I'd like to get a cool T-shirt design that links >Python and XML. This is sparked by the T-shirt I got from filling out >IBM's XML survey some months ago, which says ", you're it!". So, >does anyone have a suggestion for a Python/XML design? > How about:- "Algorithms + Data Structures = Programs" (Nicklaus Wirth) "Python + XML = Programs" (Andrew Kuchling) Or how about:- "Python gives XML something to do" (This is a reworking of Jon Bosaks famous remark that XML gives Java something to do) From akuchlin@cnri.reston.va.us Thu Apr 15 16:50:29 1999 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Thu, 15 Apr 1999 11:50:29 -0400 (EDT) Subject: [XML-SIG] T-shirts In-Reply-To: <3.0.6.32.19990415105334.0098ea10@gpo.iol.ie> References: <199904141642.MAA10932@amarok.cnri.reston.va.us> <3.0.6.32.19990415105334.0098ea10@gpo.iol.ie> Message-ID: <14102.1720.394944.571290@amarok.cnri.reston.va.us> Sean Mc Grath writes: > "Algorithms + Data Structures = Programs" (Nicklaus Wirth) > "Python + XML = Programs" (Andrew Kuchling) Problem: I never said that. :) That does spark a thought, though; poking through the Python-quotes file, "PYTHON = (P)rogrammers (Y)earning (T)o (H)omestead (O)ur (N)oosphere." from one of your old .sigs is pretty good, though not XML-specific. The two XML-related quotes from Paul Prescod aren't really suitable and they're too long for T-shirts, anyway. Here's an idea derived from the XML/SGML use of the word "element". (I vaguely recall someone proposing this at IPC7; anyone remember who?) The design looks like a corner of the periodic table of the elements. The columns, instead of being titled "Group III", "Group VIII", etc. are labeled with various DTDs and standards; the elements in each column are then various element names from that DTD. In the middle is a large 2x2 square with a big red "Py" in it; below it we might put regular-sized boxes with "Jv", "Pl", "Tcl", in them. Something like: XML HTML MathML ... --- ---- ---- +-----------+ Dtd Em |Py | Cn | | Wfc H1 + + Fn | | Pi Cite +-----------+ Eq ... Jv Pl Tcl We can argue about *which* DTDs and element names later... If the design is 8 inches wide, then each column is 1.6 inches wide, which is hopefully large enough to make it readable. That's important; for example, the IBM shirt isn't really readable because the design is only about 10 cm across; those FSF shirts that include the whole preamble to the GPL on the back suffer from the same illegibility. -- A.M. Kuchling http://starship.python.net/crew/amk/ Well, there are these two people here, Sir. The man says he drank wine with you somewhere called Babylon, and the lady... she's making little frogs. -- The receptionist, in SANDMAN #43: "Brief Lives:3" From uche.ogbuji@fourthought.com Sat Apr 17 15:25:42 1999 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Sat, 17 Apr 1999 08:25:42 -0600 Subject: [XML-SIG] DOM API In-Reply-To: Your message of "Sat, 10 Apr 1999 23:19:56 CDT." <371022EC.2E0A1F6@prescod.net> Message-ID: <199904171425.IAA03919@malatesta.local> > Am I right that there is a semi-offical, portably implemented SAX API for > Python but > there is no such beast for the DOM? SAX is mostly portably implemented because of LMG's work on the drivers. > * Is it reasonable to unify a subset of their interfaces? This does make sense. The first question would be philisohical: should such a unified interface stick closely to the W3C's IDL, or should it be more faithful to Python (i.e. returning PyLists instead of NodeList objects). This is the main difference between the two Python DOM implementation. We could build an adapter accordingly (most of the work is already don with DOM.Ext.NodeList2PyList, etc), but I'd like to hear people's opinions first. > * Could 4XSL be written to use that interface so that it would work with > both DOM implementations or do performance issues make that impossible? If such an interface was agreed upon, it would make sense to write 4DOM accordingly. I've already had to port LMG's xll module to 4DOM, and even though the differences are subtle, porting can still be a bit of a chore. A standard interface would help. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From larsga@ifi.uio.no Sat Apr 17 16:41:34 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 17 Apr 1999 17:41:34 +0200 Subject: [XML-SIG] SAX2: General issues In-Reply-To: <199903280047.RAA09559@malatesta.local> References: <199903280047.RAA09559@malatesta.local> Message-ID: * Lars Marius Garshol | | The last question is, which package shold we place the new stuff in? | xml.sax2? xml.sax? * uche ogbuji | | Well, I know that on xml-dev, there's a lot of talk about not | stomping all over SAX 1.0, but IMO, once the drivers are ported, | there are not likely to be a lot of people depending on SAX 1.0, and | even for those who don't want to break things by changing, they can | always just stick to the older XML packages. I agree with this, and I also think that those who use a SAX 1.0 interface also can use SAX 2 with no modifications at all. At least we should try to make it so. (Except for the fixes, but those are pretty marginal.) | In other words, I think we should use | | xml.sax | | even for SAX2. Agreed. --Lars M. From larsga@ifi.uio.no Sat Apr 17 16:44:51 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 17 Apr 1999 17:44:51 +0200 Subject: [XML-SIG] SAX2: Handler classes In-Reply-To: <14094.27329.189328.339983@weyr.cnri.reston.va.us> References: <14094.27329.189328.339983@weyr.cnri.reston.va.us> Message-ID: * Lars Marius Garshol | | http://xml.org/sax/handlers/lexical | Receive callbacks for comments, CDATA sections, and (possibly) | entity references. * Fred L. Drake | | Undecided; there are times when I think it would be nice to have | these things, especially when trying to make minimal edits. Personally, I think we should have this, partly since it's needed for full DOM support. Also, some applications will need this. And in any case support for it will be optional. --Lars M. From larsga@ifi.uio.no Sat Apr 17 16:54:13 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 17 Apr 1999 17:54:13 +0200 Subject: [XML-SIG] SAX2: Parser properties In-Reply-To: <14094.27411.350840.911404@weyr.cnri.reston.va.us> References: <14094.27411.350840.911404@weyr.cnri.reston.va.us> Message-ID: * Lars Marius Garshol | | http://python.org/sax/properties/data-encoding (read/write) | Do we need a special SAXEncodingNotSupportedException for this? | Otherwise it may be impossible to tell whether the parser doesn't * Fred L. Drake | | Yes; this needs to be available for reporting to the user. I agree. I've added this to my draft now. --Lars M. From larsga@ifi.uio.no Sat Apr 17 17:05:20 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 17 Apr 1999 18:05:20 +0200 Subject: [XML-SIG] SAX2: LexicalHandler Message-ID: This handler is supposed to be used by applications that need information about lexical details in the document such as comments and entity boundaries. Most applications won't need this, but the DOM will find it useful. Support for this handler will be optional. This handler has the handerID http://xml.org/sax/handlers/lexical. class LexicalHandler: def xmlDecl(self, version, encoding, standalone): """All three parameters are strings. encoding and standalone are not specified on the XML declaration, their values will be None.""" def startDTD(self, root, publicID, systemID): """This event is reported when the DOCTYPE declaration is encountered. root is the name of the root element type, while the two last parameters are the public and system identifiers of the external DTD subset.""" def endDTD(self): "This event is reported after the DTD has been parsed." def startEntity(self, name): """Reports the beginning of a new entity. If the entity is the external DTD subset the name will be '[dtd]'.""" def endEntity(self, name): pass def startCDATA(self): pass def endCDATA(self): pass From larsga@ifi.uio.no Sat Apr 17 17:06:12 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 17 Apr 1999 18:06:12 +0200 Subject: [XML-SIG] SAX2: Attribute extensions Message-ID: This posting specifies two interfaces for information needed by the DOM (and possibly also others) and also for full XML 1.0 conformance. I'm not really sure whether we should actually use all of this, so opinions are welcome. class AttributeList2: def isSpecified(self,attr): """Returns true if the attribute was explicitly specified in the document and false otherwise. attr can be the attribute name or its index in the AttributeList.""" def getEntityRefList(self,attr): """This returns the EntityRefList (see below) for an attribute, which can be specified by name or index.""" The class below is inteded to be used for discovering entity reference boundaries inside attribute values. This is needed because the XML 1.0 recommendation requires parsers to report unexpanded entity references, also inside attribute values. Whether this is really something we want is another matter. class EntityRefList: def getLength(self): "Returns the number of entity references inside this attribute value." def getEntityName(self, ix): "Returns the name of entity reference number ix (zero-based index)." def getEntityRefStart(self, ix): """Returns the index of the first character inside the attribute value that stems from entity reference number ix.""" def getEntityRefEnd(self, ix): "Returns the index of the last character in entity reference ix." One redeeming feature of this interface is that it lives entirely outside the attribute value, and so can be ignored entirely by those who are not interested. From stuart.hungerford@webone.com.au Sun Apr 18 14:00:44 1999 From: stuart.hungerford@webone.com.au (Stuart Hungerford) Date: Sun, 18 Apr 1999 23:00:44 +1000 Subject: [XML-SIG] Literate XML? Message-ID: <000701be899b$79288f30$0301a8c0@restless.com> Hi all, Even though (as I keep repeating to myself) "a markup language is not a programming language", I believe there are issues that affect XML content creators as much as programmers. Over time, a large body of folklore, rules, heuristics has been developed for making programs "readable". This covers issues like choice of names, layout, indenting, comment styles and content etc. etc. Can anyone tell me if there is a similar body of experience for markup languages--particularly XML? I understand that a lot of XML may be automatically generated and processed, but for the rest of the time, does anyone have any experiences on making XML text readable? There is no spoon... It's the smell... From stuart.hungerford@webone.com.au Sun Apr 18 14:05:20 1999 From: stuart.hungerford@webone.com.au (Stuart Hungerford) Date: Sun, 18 Apr 1999 23:05:20 +1000 Subject: [XML-SIG] Looking for namespace examples... Message-ID: <000d01be899c$1d810210$0301a8c0@restless.com> Two messages in one day! It must be a full moon or something. This one is a bit more prosaic: I'm looking for some realistic examples of DTD's and XML documents that make use of namespaces. My understanding is that a validating parser will not treat namespace prefixes as "special" in any way in a DTD. I've seen short examples where the xmlns:foo attribute is defined as a FIXED attribute (in David Megginson's "19 questions" document), and now I'm a bit confused. Can anyone point me to some good learning examples? Thanks, Stu From jday@picard.csihq.com Sun Apr 18 15:17:15 1999 From: jday@picard.csihq.com (John Day) Date: Sun, 18 Apr 1999 10:17:15 -0400 Subject: [XML-SIG] Literate XML? In-Reply-To: <000701be899b$79288f30$0301a8c0@restless.com> Message-ID: <3.0.6.32.19990418101715.01340d90@mail.csihq.com> A similar issue could be made for RTF (or any other text-based, "human readable" encoding including, I guess, assembly language mnemonics for computer programming). Nobody (except me maybe) writes with RTF tags. (I needed to learn it in order to write a parser for it). I wrote a pretty-printer for the RTF so I could read it better, but RTF is sensitive to inserted newlines etc and the output was ruined except for viewing. I think most people who use XML (like the millions of people who use RTF) will never see the XML tags in their raw format. Some nice friendly, bi-directional authoring tool (word processor) will allow us to "see what we got" and make changes to it. Tags will become little icons that you drag out of various DTD objects which we have OPENed or we can do a NEW DTD and set various properties. It all get converted to more or less usable (note I didn't say flawless) XML or whatever. The underlying semantics will be preserved and made understandable. It will all boil down to how much trust we can place in such tools. Most of us didn't trust compilers a decade or so ago and wrote all of our 'critical' code in assembler. How many of us still code in assembler? We have learned to trust the compilers. Though they probably are not absolutely flawless, on average they're better than most of us. But to answer your question, write XML just like you would write your favorite HLL code: balanced indents, lots of white space and breaks to catch the eye. You might want to write a pretty printer, so you can read anybody's code without having to rewrite it yourself. (Maybe someone has already written a Python pretty printer). -jday At 11:00 PM 4/18/99 +1000, you wrote: >Hi all, > >Even though (as I keep repeating to myself) "a markup >language is not a programming language", I believe there >are issues that affect XML content creators as much as >programmers. > >Over time, a large body of folklore, rules, heuristics has >been developed for making programs "readable". This >covers issues like choice of names, layout, indenting, >comment styles and content etc. etc. > >Can anyone tell me if there is a similar body of experience >for markup languages--particularly XML? > >I understand that a lot of XML may be automatically >generated and processed, but for the rest of the time, >does anyone have any experiences on making XML >text readable? > > > > There is no spoon... > > > >It's the smell... > > > > > > >_______________________________________________ >XML-SIG maillist - XML-SIG@python.org >http://www.python.org/mailman/listinfo/xml-sig > > > From kevin_ng@xoommail.com Mon Apr 19 07:45:10 1999 From: kevin_ng@xoommail.com (Kevin Ng) Date: Sun, 18 Apr 1999 23:45:10 -0700 Subject: [XML-SIG] bug report(+fix) : Python/XML release 0.5.1 Message-ID: <199904190645.XAA09897@www2.xoommail.com> One of the demos supplied, xml-0.5.1/demo/quotes/qtfmt.py, the line p=saxexts.XMLParserFactory.make_parser("pyexpat") raises an exception as saxexts.py cannot import the required module, I fixed it by changing the above line to : p=saxexts.XMLParserFactory.make_parser("xml.sax.drivers.drv_pyexpat") and the demo works ok. Rgds Kevin I use Linux at home. ______________________________________________________ Get your free web-based email at http://www.xoom.com Birthday? Anniversary? Send FREE animated greeting cards for any occassion at http://greetings.xoom.com From gstein@lyra.org Mon Apr 19 07:51:43 1999 From: gstein@lyra.org (Greg Stein) Date: Sun, 18 Apr 1999 23:51:43 -0700 Subject: [XML-SIG] DOM API References: <199904171425.IAA03919@malatesta.local> Message-ID: <371AD27F.7E0334A0@lyra.org> uche.ogbuji@fourthought.com wrote: >... > > * Is it reasonable to unify a subset of their interfaces? > > This does make sense. The first question would be philisohical: should such a > unified interface stick closely to the W3C's IDL, or should it be more > faithful to Python (i.e. returning PyLists instead of NodeList objects). This > is the main difference between the two Python DOM implementation. We could > build an adapter accordingly (most of the work is already don with > DOM.Ext.NodeList2PyList, etc), but I'd like to hear people's opinions first. Speaking of DOM implementations, I had posted a couple weeks ago about including my qp_xml.py module in the XML distribution. It effectively presents another DOM for Python users to consume XML input (it does NOT handle output, tho). Didn't hear back on that, though... does anybody have any feelings one way or another about including the module? I think it is quite nice for lightweight XML parsing. I haven't ever found a need for the W3C DOM (since I simply need a Python representation of the input, and all output is via "print"), so I'm presuming others will find this useful. Thoughts? thx -g -- Greg Stein, http://www.lyra.org/ From larsga@ifi.uio.no Mon Apr 19 08:40:54 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 19 Apr 1999 09:40:54 +0200 Subject: [XML-SIG] DOM API In-Reply-To: <371AD27F.7E0334A0@lyra.org> References: <199904171425.IAA03919@malatesta.local> <371AD27F.7E0334A0@lyra.org> Message-ID: * Greg Stein | | Didn't hear back on that, though... does anybody have any feelings | one way or another about including the module? I think it makes sense to have something a bit more lightweight and easier to use than the DOM. However, why not build it on top of SAX instead of pyexpat? No reason to restrict ourselves to just one parser, is there? --Lars M. From gstein@lyra.org Mon Apr 19 08:43:19 1999 From: gstein@lyra.org (Greg Stein) Date: Mon, 19 Apr 1999 00:43:19 -0700 Subject: [XML-SIG] Looking for namespace examples... References: <000d01be899c$1d810210$0301a8c0@restless.com> Message-ID: <371ADE97.71B6ECD5@lyra.org> Stuart Hungerford wrote: >... > This one is a bit more prosaic: I'm looking for some > realistic examples of DTD's and XML documents that > make use of namespaces. Not very realistic, but it provides numerous examples: the XML Namespaces specification. http://www.w3.org/TR/REC-xml-names/ A realistic application of namespaces can be seen in the WebDAV specification: ftp://ftp.isi.edu/in-notes/rfc2518.txt > My understanding is that a validating parser will not > treat namespace prefixes as "special" in any way in > a DTD. If a validating parser does not understand namespaces, then it will not be able to validate an XML document that uses them. For example, it sees "" and "" as different elements, and no fudging of the "foo" prefix will fix that. The only workaround is to use a default namespace for the WHOLE document so that the DTD refers to and the document uses (and the element in the doc falls into the appropriate namespace via an xmlns="..." attribute). Note that this implies only one namespace per document. > I've seen short examples where the xmlns:foo > attribute is defined as a FIXED attribute (in David > Megginson's "19 questions" document), and now > I'm a bit confused. I'm not familiar with DTD terminology, so I don't know what FIXED is attempting to state. > Can anyone point me to some good learning > examples? Hopefully the two docs above will provide ample information. I posted a module to this list a couple weeks ago which will properly and quickly parse XML documents with namespaces (I don't believe the xml package has a parser/DOM capable of doing so, although it appears Python's xmllib.py can (albeit slowly)). My module is available at: http://www.lyra.org/greg/python/qp_xml.py Note that it is based on top of pyexpat, so you'll need that module, too. Cheers, -g -- Greg Stein, http://www.lyra.org/ From larsga@ifi.uio.no Mon Apr 19 09:13:04 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 19 Apr 1999 10:13:04 +0200 Subject: [XML-SIG] Looking for namespace examples... In-Reply-To: <371ADE97.71B6ECD5@lyra.org> References: <000d01be899c$1d810210$0301a8c0@restless.com> <371ADE97.71B6ECD5@lyra.org> Message-ID: * Greg Stein | | [ns & validation] The only workaround is to use a default namespace | for the WHOLE document so that the DTD refers to and the | document uses (and the element in the doc falls into the | appropriate namespace via an xmlns="..." attribute). Note that this | implies only one namespace per document. You can also use FIXED attribute declarations and, by implication, always use the same prefix for the same namespace. This essentially leaves you with the first Namespace WD, although in a different syntax. * Stuart Hungerford | | I've seen short examples where the xmlns:foo attribute is defined as | a FIXED attribute (in David Megginson's "19 questions" document), | and now I'm a bit confused. * Greg Stein | | I'm not familiar with DTD terminology, so I don't know what FIXED is | attempting to state. That the element will always have the attribute with the specified value, whether the user bothered to explicitly add it in the document or not. | Hopefully the two docs above will provide ample information. I posted a | module to this list a couple weeks ago which will properly and quickly | parse XML documents with namespaces (I don't believe the xml package has | a parser/DOM capable of doing so, although it appears Python's xmllib.py | can (albeit slowly)). Both xmllib and xmlproc can handle namespaces. In xmlproc this requires you to use an extra module, which comes with the parser. --Lars M. From gstein@lyra.org Mon Apr 19 09:14:39 1999 From: gstein@lyra.org (Greg Stein) Date: Mon, 19 Apr 1999 01:14:39 -0700 Subject: [XML-SIG] DOM API References: <199904171425.IAA03919@malatesta.local> <371AD27F.7E0334A0@lyra.org> Message-ID: <371AE5EF.13BA1B8C@lyra.org> Lars Marius Garshol wrote: > > * Greg Stein > | > | Didn't hear back on that, though... does anybody have any feelings > | one way or another about including the module? > > I think it makes sense to have something a bit more lightweight and > easier to use than the DOM. However, why not build it on top of SAX > instead of pyexpat? No reason to restrict ourselves to just one > parser, is there? No particular reason, although it will be somewhat slower if based on SAX. I see in drv_pyexpat.py that the startElement handler does a good bit of work before getting to the "real" start handler. It would be nice to skip that :-) (honestly, though, I don't know what kind of overhead it creates). It might be nice to switch it to SAX and bench the pure pyexpat version against the SAX version. I do agree that SAX-based would be the Right Thing, but I'm also willing to trade that for speed since people can always use the DOM if they need to use a different, underlying parser (such as xmlproc). Cheers, -g -- Greg Stein, http://www.lyra.org/ From larsga@ifi.uio.no Mon Apr 19 09:32:49 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 19 Apr 1999 10:32:49 +0200 Subject: [XML-SIG] DOM API In-Reply-To: <371AE5EF.13BA1B8C@lyra.org> References: <199904171425.IAA03919@malatesta.local> <371AD27F.7E0334A0@lyra.org> <371AE5EF.13BA1B8C@lyra.org> Message-ID: * Lars Marius Garshol | | I think it makes sense to have something a bit more lightweight and | easier to use than the DOM. However, why not build it on top of SAX | instead of pyexpat? No reason to restrict ourselves to just one | parser, is there? * Greg Stein | | No particular reason, although it will be somewhat slower if based | on SAX. It will, so maybe we should consider making two builders? | I see in drv_pyexpat.py that the startElement handler does a good | bit of work before getting to the "real" start handler. It would be | nice to skip that :-) (honestly, though, I don't know what kind of | overhead it creates). If you have a lot of attributes I guess it will be slow, but I think applications using your qp_xml will essentially have to redo that work (and quite possibly in a less efficient manner), since they can't just do a simple lookup to get the attribute values. So your qp_xml would be nicer if it had a hash of attributes instead of a list, and applications based on it would very likely be faster. Also, I think it might make sense to modify pyexpat to create a hash in the PyAPI wrapping instead of a list as it does now. That would most likely be both the fastest and the nicest solution. | It might be nice to switch it to SAX and bench the pure pyexpat | version against the SAX version. Feel free. I don't have the time, I'm afraid. | I do agree that SAX-based would be the Right Thing, but I'm also | willing to trade that for speed since people can always use the DOM | if they need to use a different, underlying parser (such as | xmlproc). Or sgmlop, or htmllib, or sgmllib. Or, when I get round to it, SP or Java parsers under JPython. Maybe also RXP. --Lars M. From fredrik@pythonware.com Mon Apr 19 09:58:46 1999 From: fredrik@pythonware.com (Fredrik Lundh) Date: Mon, 19 Apr 1999 10:58:46 +0200 Subject: [XML-SIG] DOM API References: <199904171425.IAA03919@malatesta.local> <371AD27F.7E0334A0@lyra.org> <371AE5EF.13BA1B8C@lyra.org> Message-ID: <01f101be8a42$d665e830$f29b12c2@pythonware.com> Greg wrote: > No particular reason, although it will be somewhat slower if based on > SAX. I see in drv_pyexpat.py that the startElement handler does a good > bit of work before getting to the "real" start handler. It would be nice > to skip that :-) (honestly, though, I don't know what kind of overhead > it creates). > > It might be nice to switch it to SAX and bench the pure pyexpat version > against the SAX version. > > I do agree that SAX-based would be the Right Thing, but I'm also willing > to trade that for speed since people can always use the DOM if they need > to use a different, underlying parser (such as xmlproc). one could imagine that once we've settled on an API, there could be different implementations of the tree builder... perhaps the "qp API" could be turned into a "standard python light-weight dom-like interface"? and to get that process started, maybe you could post an interface summary? Cheers /F fredrik@pythonware.com http://www.pythonware.com From gstein@lyra.org Mon Apr 19 09:47:39 1999 From: gstein@lyra.org (Greg Stein) Date: Mon, 19 Apr 1999 01:47:39 -0700 Subject: [XML-SIG] DOM API References: <199904171425.IAA03919@malatesta.local> <371AD27F.7E0334A0@lyra.org> <371AE5EF.13BA1B8C@lyra.org> Message-ID: <371AEDAB.25CE6C7C@lyra.org> Lars Marius Garshol wrote: >... > * Greg Stein > | > | No particular reason, although it will be somewhat slower if based > | on SAX. > > It will, so maybe we should consider making two builders? I have no motivation to do so :-), but will certainly accept the changes from somebody who is. > | I see in drv_pyexpat.py that the startElement handler does a good > | bit of work before getting to the "real" start handler. It would be > | nice to skip that :-) (honestly, though, I don't know what kind of > | overhead it creates). > > If you have a lot of attributes I guess it will be slow, but I think > applications using your qp_xml will essentially have to redo that work > (and quite possibly in a less efficient manner), since they can't just > do a simple lookup to get the attribute values. > > So your qp_xml would be nicer if it had a hash of attributes instead > of a list, and applications based on it would very likely be faster. Yes, I thought of this one, but looking at the code, I see that I haven't actually done that yet. heh. I'm out for about two weeks, but will change this when I return. I intend to do { (URI, name) : value }. > Also, I think it might make sense to modify pyexpat to create a hash > in the PyAPI wrapping instead of a list as it does now. That would > most likely be both the fastest and the nicest solution. Yup. Should ask Jack about his intentions here. Keep it close to Expat, or provide a little more Python-ish version. There is also the backwards-compat issue :-) > | It might be nice to switch it to SAX and bench the pure pyexpat > | version against the SAX version. > > Feel free. I don't have the time, I'm afraid. Not me. As I said... I'm not motivated to do so :-). I believe that multi-parser support is handled by DOM. If you want quick and light-weight, then use qp_xml and pyexpat. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Mon Apr 19 10:28:29 1999 From: gstein@lyra.org (Greg Stein) Date: Mon, 19 Apr 1999 02:28:29 -0700 Subject: [XML-SIG] qp_xml API (was: DOM API) References: <199904171425.IAA03919@malatesta.local> <371AD27F.7E0334A0@lyra.org> <371AE5EF.13BA1B8C@lyra.org> <01f101be8a42$d665e830$f29b12c2@pythonware.com> Message-ID: <371AF73D.52254043@lyra.org> Fredrik Lundh wrote: > one could imagine that once we've settled on an API, > there could be different implementations of the tree > builder... Seems reasonable. > perhaps the "qp API" could be turned into a "standard > python light-weight dom-like interface"? and to get that > process started, maybe you could post an interface > summary? All right. Below is the summary. This is also the first opportunity for public review, so I will welcome any suggestions for change. qp_xml.error: a string for exceptions. [ed. this "should" become a class] qp_xml.Parser: the parser class. Typical use is: instantiate and call the parse() method. The class is not thread-safe, but one-per-thread is fine. Parser.parse(input): input may be a string or an object supporting the "read" method (e.g. a file or httplib.HTTPResponse (from my new httplib module)). The input must represent a complete XML document. It will be fully parsed and a lightweight representation will be returned. This method may be called any number of times (for multiple documents). The returned object is an instance of qp_xml._element. _element.name: element ("tag") name _element.ns: a Python string. The namespace URI this element's name belongs to, or the empty string for "no namespace". _element.lang: the xml:lang value that applies to this element's attributes and content. It is inherited from the parent, pulled from this element's attributes, or is None if no xml:lang is in scope. _element.children: a Python list of the child elements, in order _element.attrs: ### currently a list of objects representing attributes, each object containing ns, name, value attributes. this will change to a mapping of { (URI, name) : value }. ### _element.first_cdata: a Python string which contains the element's contents that are between the start tag and the first child element (if present, otherwise the contents between the start/end tags). This will be the empty string in both cases: and . _element.following_cdata: a Python string containing the PARENT element's content which follows this element's end tag (up to the next child element of the parent, or the parent's end tag). qp_xml.dump(f, element): uses f.write() to dump the element as XML. Namespaces and xml:lang values will be inserted. Automatic selection of namespace prefixes will be used as appropriate. qp_xml.textof(element): return this element's contents (non-recursively). The *_cdata fields are reasonably "interesting" ... Here is a sample of a few elements and how the cdata fields are filled in: elem1.first_cdata contents elem2.first_cdata contents elem2.following_cdata contents elem3.following_cdata contents The textof(elem1) function will return elem1.first_cdata + elem2.following_cdata + elem3.following_cdata. The *_cdata fields preserve whitespace. Commentary: Note that clients only need to import qp_xml, instantiate qp_xml.Parser(), and call parse() (which returns an object). They only deal with one object type in the return value (qp_xml._element), and they directly access the fields in it. The object defines no methods. Most clients will use .name, .attrs, and .children. qp_xml.textof(elem) will return the element's text contents. Certain clients may use .ns to test if the element is in the namespace they are looking for; a few clients will use .lang to interpret attribute values and element contents. Cheers, -g -- Greg Stein, http://www.lyra.org/ From paul@prescod.net Mon Apr 19 18:17:19 1999 From: paul@prescod.net (Paul Prescod) Date: Mon, 19 Apr 1999 12:17:19 -0500 Subject: [XML-SIG] DOM API References: <199904171425.IAA03919@malatesta.local> Message-ID: <371B651E.A1771FEB@prescod.net> uche.ogbuji@fourthought.com wrote: > > This does make sense. The first question would be philisohical: should such a > unified interface stick closely to the W3C's IDL, or should it be more > faithful to Python (i.e. returning PyLists instead of NodeList objects). This > is the main difference between the two Python DOM implementation. We could > build an adapter accordingly (most of the work is already don with > DOM.Ext.NodeList2PyList, etc), but I'd like to hear people's opinions first. Do we really have to choose? If, for example, a NodeList object can act as a Python sequence then don't we have the best of both worlds? I mean if you really need a PyList then you can use "map" to generate one. I would like to think that Python is sufficiently flexible that most of these choices could be made in a DOM compatible AND Python compatible way. The downside of doing both is that a Java or C++ implementation of a "raw" DOM accessed over CORBA or COM would not be compatible -- but we could write Python wrappers that would make them so. -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "The Excursion [Sport Utility Vehicle] is so large that it will come equipped with adjustable pedals to fit smaller drivers and sensor devices that warn the driver when he or she is about to back into a Toyota or some other object." -- Dallas Morning News From Tim Lavoie Mon Apr 19 18:46:24 1999 From: Tim Lavoie (Tim Lavoie) Date: Mon, 19 Apr 1999 12:46:24 -0500 Subject: [XML-SIG] XBEL questions Message-ID: <19990419124624.A15472@beyondtv.net> I've just started tinkering with the XBEL package and its sample scripts, converting from Netscape (4.5) to XBEL format. The script needed tags converted to upper-case to recognize what Communicator had written, no big deal. What did puzzle me was the output; the DTD lists tags in lower case, but the bookmark.py script generated everything in upper case. Since XML is case-sensitive, isn't this wrong? The other thing I noticed is that the output gags the xmlwf test program which accompanies James Clark's expat parser. The offending line contains a URL with multiple CGI parameters, with the error message pointing to the second "=" character. This character follows the second parameter name, which as in all HTML is preceded by a "&" character. Could the problem be that tag contents need to be encoded first? The tag looks like: http://foo.domain/cgi/some.cgi?Appl=param1&Section=param2 Cheers, Tim From larsga@ifi.uio.no Mon Apr 19 20:42:05 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: Mon, 19 Apr 1999 21:42:05 +0200 (MET DST) Subject: [XML-SIG] xmlproc: Version 0.61 released\! Message-ID: <199904191942.VAA08945@ifi.uio.no> Changes since version 0.60: - the parser is now even faster, especially when validating. The parser should now be several times faster for very large DTDs. - various minor bug fixes, plus an embarrasing one in xvcmd.py - some API extensions: - catalog.CatalogParser now accepts an error language parameter - catalog.xmlproc_catalog now accepts an optional error handler parameter - added a utils module with a ready-made error handler that prints error messages to a file-like object - added a new method get_valid_elements to xmldtd.ElementType, so that it's now possible to find out which elements are allowed in a given state (or point) in the content model of an element This version is mainly released to fix the bug in xvcmd.py, which was too glaring to be overlooked. Experiments with DTD caching have been performed and it has turned out to be feasible, but surprisingly subtle. The speed benefits also seem to be disappointingly small. If anyone really wants this feature, let me know and I'll implement it. --Lars M. From paul@prescod.net Mon Apr 19 19:51:46 1999 From: paul@prescod.net (Paul Prescod) Date: Mon, 19 Apr 1999 13:51:46 -0500 Subject: [XML-SIG] DOM API References: <199904171425.IAA03919@malatesta.local> <371AD27F.7E0334A0@lyra.org> <371AE5EF.13BA1B8C@lyra.org> <01f101be8a42$d665e830$f29b12c2@pythonware.com> Message-ID: <371B7B42.64E93DE4@prescod.net> Fredrik Lundh wrote: > > perhaps the "qp API" could be turned into a "standard > python light-weight dom-like interface"? and to get that > process started, maybe you could post an interface > summary? I'm going to propose instead a light-weight DOM subset. I would rather not require PyXML users to memorize two different APIs depending on whether they doing light-weight work or heavy-weight work. Apart from my decision to suggest a DOM subset, I have made my subset a little more functional in some places and a little less in others. My bias is to expose *more* of the underlying XML structure (processing instructions, attributes) and relegate handling for lang and namespace to the more complex APIs (or extensions to this API). -- error (like qp_xml.error) Parser (like qp_xml.Parser) Parser.parse(input) (like qp_xml.parse but returns a document object) Node.ChildNodes (a sequence of nodes property) Node.NodeType (an integer a la DOM property) Document.DocumentElement (an element node property) Element.Attributes (a map of names to attribute objects property) Element.GetAttribute (returns an attribute's value) Element.TagName Element.PreviousSibling Element.NextSibing CharacterData.Data (a PyString property) Attribute.Name Attribute.Value ProcessingInstruction.Target (string property) ProcessingInstruction.Data (string property) -- Note that I use the words "sequence" and "map" in their Python sense above. Either a PyList or a NodeList Object could both be a sequence. Either a PyDict or a NamedNodeList Object could be a map. -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "The Excursion [Sport Utility Vehicle] is so large that it will come equipped with adjustable pedals to fit smaller drivers and sensor devices that warn the driver when he or she is about to back into a Toyota or some other object." -- Dallas Morning News From Jeffrey Chang Tue Apr 20 00:50:55 1999 From: Jeffrey Chang (Jeffrey Chang) Date: Mon, 19 Apr 1999 16:50:55 -0700 (PDT) Subject: [XML-SIG] ElementType.content_model interpretation of '*' Message-ID: I am using xmlproc.dtdparser.DTDParser and xmlproc.xmldtd.CompleteDTD to parse and store the contents of a DTD file (xmlproc v0.60). I have a question about the the interpretation of the contents within a DTD element. I load a DTD definition into a variable 'd'. The definition contains an element: Then, when I look at the content model of test: >>> d.elems['test'].content_model { # I've reformatted this for readability 'start': 1L, 1L: [(6L, 'a')], 4L: [(4L, 'b')], 6L: [(4L, 'b')], 'final': 4L } According to this content model, 'test' must contain 1 'a' and at least 1 'b' before reaching the final state. I believe the 'b' should be optional, and would have expected a content model more like: 'start': 1L, 1L: [(4L, 'a')], 4L: [(4L, 'b')], 'final': 4L I also tested this with the following element: In this case, I get a content model that looks reasonable: { 'start': 1L 1L: [(2L, 'a')], 2L: [(4L, 'b')], 4L: [(4L, 'b')], 'final': 4L, } Please let me know if my interpretation of the XML specs, or the content_model data structure is incorrect. BTW, Lars, thanks very much for xmlproc! It is much-needed tool. Jeff From gstein@lyra.org Tue Apr 20 08:51:35 1999 From: gstein@lyra.org (Greg Stein) Date: Tue, 20 Apr 1999 00:51:35 -0700 (PDT) Subject: [XML-SIG] DOM API In-Reply-To: <371B7B42.64E93DE4@prescod.net> Message-ID: On Mon, 19 Apr 1999, Paul Prescod wrote: > I'm going to propose instead a light-weight DOM subset. I would rather not > require PyXML users to memorize two different APIs depending on whether > they doing light-weight work or heavy-weight work. Apart from my decision > to suggest a DOM subset, I have made my subset a little more functional in > some places and a little less in others. My bias is to expose *more* of > the underlying XML structure (processing instructions, attributes) and > relegate handling for lang and namespace to the more complex APIs (or > extensions to this API). euh... I can definitely state that in the applications that I've been working with, that PIs are bogus, but namespaces are absolutely required. (that's how my code came to be!) A general comment about your "subset" -- it is still heavyweight! Details below... > Parser.parse(input) (like qp_xml.parse but returns a document object) How is a "document" different in your mind, than an element that happens to be the root of a tree? I don't understand from your post. IMO, if you wnat simple, then just give the user a tree... that's all the dumb XML is anyhow. > Node.ChildNodes (a sequence of nodes property) > Node.NodeType (an integer a la DOM property) NodeType is bogus. It should be absolutely obvious from the context what a Node is. If you have so many objects in your system that you need NodeType to distinguish them, then you are certainly not a light-weight solution. > Document.DocumentElement (an element node property) If Document has no other properties, then it is totally bogus. Just return the root Element. Why the hell return an object with a single property that refers to another object? Just return that object! > Element.Attributes (a map of names to attribute objects property) > Element.GetAttribute (returns an attribute's value) If you want light-weight, then GetAttribute is bogus given that the same concept is easily handled via the .Attributes value. Why introduce a method to simply do Element.Attributes.get(foo) ?? > Element.TagName > Element.PreviousSibling > Element.NextSibing These Sibling things mean one of two things: 1) you have introduced loops in your data structure 2) you have introduced the requirement for the proxy crap that the current DOM is dealing with (the Node vs _nodeData thing). (1) is mildly unacceptable in a light-weight solution (you don't want people to do a quick parse of data, and then require them to follow it up with .close()). (2) throws the whole notion of "light" out the window. You no longer have a simple, direct model of the parsed XML data. > CharacterData.Data (a PyString property) How do you get one of these objects? As soon as you say that an Element.ChildNodes can return one of these, then you have complicated the model. To keeps things simple, .ChildNodes should return objects of the *same* type. Otherwise, all the clients are going to need to test the contents. Clients will also have a hard time finding the right data. Case in point: I wrote a first draft davlib.py against the DOM. Damn it was a serious bitch to simply extract the CDATA contents of an element! Moreover, it was also a total bitch to simply say "give me the child elements". Of course, that didn't work since the DOM insisted on returning a list of a mix of CDATA and elements. The whole notion of mixing "node types" in a list is completely bogus if you want direct simplicity in a model. It is one of my biggest problems with the DOM thing. Some yahoos over in the XML DOM world want all this nifty OO crap, yet they have built something that is hardly usable in a practical application. Ergo, we have all kinds of filters and walking solutions just to deal with mapping the complicated DOM structure into something that is even marginally useful. IMO, the XML DOM model is a neat theoretical expression of OO modelling of an XML document. For all practical purposes, it is nearly useless. (again: IMO) ... I mean hey: does anybody actually use the DOM to *generate* XML? Screw that -- I use "print". I can't imagine generating XML using the DOM. Complicated and processing intensive. Sorry to go off here, but the DOM really bugs me. I think it is actually a net-negative for the XML community to deal with the beast. I would love to be educated on the positive benefits for expressing an XML document thru the DOM model. > Attribute.Name > Attribute.Value Use a mapping. Toss the intermediate object. If you just have name and value, then you don't need separate objects. Present the attributes as a mapping. > ProcessingInstruction.Target (string property) > ProcessingInstruction.Data (string property) I have yet to see a specification related to XML that depends on PIs. Until that happens, then I don't see how these are relevant. Cheers, -g -- Greg Stein, http://www.lyra.org/ From fredrik@pythonware.com Tue Apr 20 10:40:32 1999 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 20 Apr 1999 11:40:32 +0200 Subject: [XML-SIG] DOM API References: Message-ID: <008b01be8b11$e19fe550$f29b12c2@pythonware.com> Greg wrote: > On Mon, 19 Apr 1999, Paul Prescod wrote: > > I'm going to propose instead a light-weight DOM subset. I would rather not > > require PyXML users to memorize two different APIs depending on whether > > they doing light-weight work or heavy-weight work. the downside with Paul's line of reasoning is that it makes it impossible to come up with something that is light-weight also from the CPU's perspective... not good. > euh... I can definitely state that in the applications that I've been > working with, that PIs are bogus, but namespaces are absolutely required. > (that's how my code came to be!) as far as I can tell, *all* upcoming XML standards use namespaces. for a layman like me, they're pretty much part of the standard, so having them in the core API is a good thing... ... > Case in point: I wrote a first draft davlib.py against the DOM. Damn it > was a serious bitch to simply extract the CDATA contents of an element! > Moreover, it was also a total bitch to simply say "give me the child > elements". Of course, that didn't work since the DOM insisted on returning > a list of a mix of CDATA and elements. > > The whole notion of mixing "node types" in a list is completely bogus if > you want direct simplicity in a model. well, our internal coreXML system returns a list consisting of Element and and plain old strings (for CDATA). the Element class has helpers to deal with elements that contain only strings, and elements that contain only child elements. most code use these helpers, and auto- matically flags "bad" XML documents. I'm not yet convinced that your solution is easier to use -- but I might change my mind... just give me some time to think about it. > It is one of my biggest problems with the DOM thing. Some yahoos > over in the XML DOM world want all this nifty OO crap, yet they > have built something that is hardly usable in a practical application. > IMO, the XML DOM model is a neat theoretical expression of OO > modelling of an XML document. For all practical purposes, it is > nearly useless. Am I the only one who think this year's W3C april's fool joke was really scary... > I mean hey: does anybody actually use the DOM to *generate* XML? > Screw that -- I use "print". I can't imagine generating XML using the DOM. > Complicated and processing intensive. ... as an aside, here's an excerpt from Garnet, using our light-weight XML builder... root is a parent element, package is an "archive handler" that takes care of "external entities" (if XML had been designed by real programmers, it would have supported binary data from the start ;-) def dump(self, root, package=None): stack = root.addelement("stack") if self.pcs: stack.addelement("pcs", self.pcs.tag) for i in self.stack: item = stack.addelement("item") title = i.gettitle() if title: item.addelement("title", title) extent = string.join(map(str, i.getextent())) item.addelement("extent", extent) i.dump(item, package) doing this with print statements is quite a bit more error prone. this model is also interface-driven -- there's nothing in here that deals directly with the file format. ... I want something really light-weight, and highly pythonish, and I don't care the slightest about TLA compatibility. the "qp" API is pretty close to what I want, but I think I can make it even simpler. more on that later. Cheers /F From Fred L. Drake, Jr." References: <371B7B42.64E93DE4@prescod.net> Message-ID: <14108.31202.340223.456144@weyr.cnri.reston.va.us> Greg Stein writes: > an XML document. For all practical purposes, it is nearly useless. (again: > IMO) ... I mean hey: does anybody actually use the DOM to *generate* XML? > Screw that -- I use "print". I can't imagine generating XML using the DOM. Perhaps I missed some context. I use the DOM to edit structured data; the input is essentially LaTeX, and the output is SGML/XML. I perform fairly large, structured edits before writing the data back out. I agree the DOM would be painful for generating a small amount of XML from a source structured very differently from the output, but my application leads me to believe that though the DOM is fairly tedious, it gives me the ability to control the output in ways that support my nit-picky approach to documents. Don't get me wrong: I'm sure a substantially better API could be designed, especially if it wasn't intended to translate cleanly (or at all) to languages other than Python. But I don't have time or interest in that; the DOM works well enough in it's absense. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From Jeff.Johnson@icn.siemens.com Tue Apr 20 16:00:41 1999 From: Jeff.Johnson@icn.siemens.com (Jeff.Johnson@icn.siemens.com) Date: Tue, 20 Apr 1999 11:00:41 -0400 Subject: [XML-SIG] How can I search for a string of text Message-ID: <85256759.0051CA54.00@li01.lm.ssc.siemens.com> Hello everyone, I need to remove a string from my HTML files but I don't know the best way to find it. There are usually line feeds in the HTML between the string so the string does not appear as one DOM text node. Does anyone know the best way to find contiguous text that spans multiple DOM nodes? While I'm at it, is there a good way to remove blank lines from ?ML files? As I read and rewrite my XML files, I find that extra line feeds accumulate. I've tried a few different approaches but have never been fully satisfied with them. Thanks much, Jeff From paul@prescod.net Tue Apr 20 15:47:36 1999 From: paul@prescod.net (Paul Prescod) Date: Tue, 20 Apr 1999 09:47:36 -0500 Subject: [XML-SIG] DOM API References: Message-ID: <371C9388.D0690375@prescod.net> Greg Stein wrote: > A general comment about your "subset" -- it is still heavyweight! It wasn't clear what I was optimizing for: performance or simplicity. They aren't always the same thing. > euh... I can definitely state that in the applications that I've been > working with, that PIs are bogus, but namespaces are absolutely required. > (that's how my code came to be!) > I have yet to see a specification related to XML that depends on PIs. > Until that happens, then I don't see how these are relevant. http://www.w3.org/TR/REC-xml http://www.w3.org/TR/xml-stylesheet http://www.w3.org/TR/NOTE-dcd http://www.w3.org/TR/NOTE-ddml Well let's put it this way: XML 1.0 uses PIs. So does the stylesheet binding extension (for CSS and XSL). I don't doubt that namespaces are important but they can easily be viewed as an extension of (or layer on top of) the minimal API. > How is a "document" different in your mind, than an element that happens > to be the root of a tree? I don't understand from your post. IMO, if you > wnat simple, then just give the user a tree... that's all the dumb XML is > anyhow. Consider the "canonical Web-enabled XML document": There are four objects there. If we want it to be a tree we need a wrapper object that contains them. You could argue that in the lightweight API the version and doctype information could disappear but surely we want to allow people to figure out what stylesheets are attached to their documents! > NodeType is bogus. It should be absolutely obvious from the context what a > Node is. If you have so many objects in your system that you need NodeType > to distinguish them, then you are certainly not a light-weight solution. XML is a dynamically typed language, like XML. If I have a mix of elements, characters and processing instructions then I need some way of differentiating them. I don't feel like it is the place of an API to decide that XML is a strongly typed language and silently throw away important information from the document. > > Document.DocumentElement (an element node property) > > If Document has no other properties, then it is totally bogus. Just return > the root Element. Why the hell return an object with a single property > that refers to another object? Just return that object! Document should also have ChildNodes. > If you want light-weight, then GetAttribute is bogus given that the same > concept is easily handled via the .Attributes value. Why introduce a > method to simply do Element.Attributes.get(foo) ?? GetAttribute is simpler, more direct and maybe more efficient in some cases. It works with simple strings and not attribute objects. > > Element.TagName > > Element.PreviousSibling > > Element.NextSibing > > These Sibling things mean one of two things: > > 1) you have introduced loops in your data structure > 2) you have introduced the requirement for the proxy crap that the current > DOM is dealing with (the Node vs _nodeData thing). > > (1) is mildly unacceptable in a light-weight solution (you don't want > people to do a quick parse of data, and then require them to follow it up > with .close()). I don't see this as a big deal. This is an efficiency versus simplicity issue. These functions are extremely convenient in a lot of situations. > Case in point: I wrote a first draft davlib.py against the DOM. Damn it > was a serious bitch to simply extract the CDATA contents of an element! XML is a dynamically typed language. "I've implemented Java and now I'm trying to implement Python and I notice that you guys through these PyObject things around and they make my life harder. I'm going to dump them from my implementation." > Moreover, it was also a total bitch to simply say "give me the child > elements". Of course, that didn't work since the DOM insisted on returning > a list of a mix of CDATA and elements. It told you what was in your document. If you want to include helper functions to do this stuff then I say fine: but if you want to throw away the real structure of the document then I don't think that that is appropriate. > IMO, the XML DOM model is a neat theoretical expression of OO modelling of > an XML document. For all practical purposes, it is nearly useless. (again: > IMO) ... I mean hey: does anybody actually use the DOM to *generate* XML? > Screw that -- I use "print". I can't imagine generating XML using the DOM. > Complicated and processing intensive. I'm not sure what your point is here. I wouldn't use the DOM *or* qp_xml to generate XML in most cases. As you point out "print" or "file.write" is sufficient in most applications. This has nothing to do with the DOM and everything to do with the fact that writing to a file is inherently a streaming operation so a tree usually gets in the way. > Sorry to go off here, but the DOM really bugs me. I think it is actually a > net-negative for the XML community to deal with the beast. I would love to > be educated on the positive benefits for expressing an XML document thru > the DOM model. I think that the DOM is broken for a completely different set of reasons than you do. But the DOM is also hugely popular and more widely implemented than many comparable APIs in other domains. I'm told that Microsoft's DOM impelementation is referenced in dozens of their products and throughout many upcoming technologies. Despite its flaws, the DOM is an unqualified success and some people like it more than XML itself. They are building DOM interfaces to non-XML data! > Use a mapping. Toss the intermediate object. If you just have name and > value, then you don't need separate objects. Present the attributes as a > mapping. In this case I am hamstrung by DOM compatibility. This is a small price to pay as long as we keep the simpler GetAttribute methods. The only reason to get the attribute objects is when you want to iterate over all attributes which is probably relatively rare. -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "The Excursion [Sport Utility Vehicle] is so large that it will come equipped with adjustable pedals to fit smaller drivers and sensor devices that warn the driver when he or she is about to back into a Toyota or some other object." -- Dallas Morning News From paul@prescod.net Tue Apr 20 17:11:15 1999 From: paul@prescod.net (Paul Prescod) Date: Tue, 20 Apr 1999 11:11:15 -0500 Subject: [XML-SIG] DOM API References: <008b01be8b11$e19fe550$f29b12c2@pythonware.com> Message-ID: <371CA723.FAEBF6AA@prescod.net> Fredrik Lundh wrote: > > the downside with Paul's line of reasoning is that it makes it > impossible to come up with something that is light-weight > also from the CPU's perspective... not good. That isn't true. I tend to think that usability is more important than performance but if we decide to optimize for performance then we can make a DOM-compatible API that is as fast as "qp". I mean the only thing that is harder to implement in the miniDOM is siblings -- where I chose convenience over efficiency. We can make the opposite choice. In fact, I think that the namespace and language support in qp already makes it relatively "heavyweight". > I want something really light-weight, and highly pythonish, and I > don't care the slightest about TLA compatibility. It isn't a question of TLA compatibility. It's about using the data models used everywhere else in the world. Python conforms to posix conventions for file and socket operations, C conventions for string interpolation, Perl conventions for regular expressions, Unix conventions for globbing and so forth. If I wanted idiosyncratic invented-just-for-us interfaces I would go and use Perl. To me, this is the central issue: to me, the Guido's genious lies in the fact that he usually chooses adapt something before re-inventing it. This makes learning Python easy. "Oh yeah, I recognize that from the other languages I use." Well, SAX and DOM are what the other languages use. Anyhow, "qp" is hardly more "Pythonic" than in the lightweight DOM API. The following_cdata stuff is not like any API I've ever seen in Python or elsewhere. The call for "pythonic-ness" is mostly a strawman. The DOM works better in Python than in almost any other language: Nodelists are lists, NamedNodeLists are maps, object types are instance classes, lists can be heterogenous, etc. -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "The Excursion [Sport Utility Vehicle] is so large that it will come equipped with adjustable pedals to fit smaller drivers and sensor devices that warn the driver when he or she is about to back into a Toyota or some other object." -- Dallas Morning News From jday@picard.csihq.com Tue Apr 20 17:53:44 1999 From: jday@picard.csihq.com (John Day) Date: Tue, 20 Apr 1999 12:53:44 -0400 Subject: [XML-SIG] How can I search for a string of text Message-ID: <4.2.0.32.19990420125203.00a45bf0@mail.csihq.com> Use a SAX interface to access the characters of the text file. Sounds like you might know the enclosing tag names too ( ... ), so you might be able to narrow the search somewhat. In any case, SAX will present the characters as a stream for filtering. -jday At 11:00 AM 4/20/99 -0400, you wrote: >Hello everyone, >I need to remove a string from my HTML files but I don't know the best way to >find it. There are usually line feeds in the HTML between the string so the >string does not appear as one DOM text node. Does anyone know the best way to >find contiguous text that spans multiple DOM nodes? >While I'm at it, is there a good way to remove blank lines from ?ML files? >As I >read and rewrite my XML files, I find that extra line feeds accumulate. I've >tried a few different approaches but have never been fully satisfied with >them. >Thanks much, >Jeff _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://www.python.org/mailman/listinfo/xml-sig From akuchlin@cnri.reston.va.us Thu Apr 22 00:01:46 1999 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Wed, 21 Apr 1999 19:01:46 -0400 (EDT) Subject: [XML-SIG] How can I search for a string of text In-Reply-To: <85256759.0051CA54.00@li01.lm.ssc.siemens.com> References: <85256759.0051CA54.00@li01.lm.ssc.siemens.com> Message-ID: <14110.22394.584206.529707@amarok.cnri.reston.va.us> Jeff.Johnson@icn.siemens.com writes: >I need to remove a string from my HTML files but I don't know the best way to >find it. There are usually line feeds in the HTML between the string so the >string does not appear as one DOM text node. Does anyone know the best way to >find contiguous text that spans multiple DOM nodes? The normalize() method on an Element node consolidates the subtree so there are no adjacent Text nodes, merging Text nodes that are next to each other into a single node. So you could do document.rootElement.normalize(), and then rely on the string being contained within one node. That won't catch tricky cases -- do you need to find it if an entity expands to the string, or to part of the string? if the string had a PI in the middle of it, would it still count as a match? -- but it'll certainly help with the simple case. -- A.M. Kuchling http://starship.python.net/crew/amk/ It is not that I wanted to know a great deal, in order to acquire what is now called expertise, and which enables one to become an expert-tease to people who don't know as much as you do about the tiny corner you have made your own. -- Robertson Davies, _The Rebel Angels_ From mike.olson@fourthought.com Thu Apr 22 00:36:45 1999 From: mike.olson@fourthought.com (Mike Olson) Date: Wed, 21 Apr 1999 18:36:45 -0500 Subject: [XML-SIG] DOM API References: <008b01be8b11$e19fe550$f29b12c2@pythonware.com> <371CA723.FAEBF6AA@prescod.net> Message-ID: <371E610D.BA7861AE@fourthought.com> This is a cryptographically signed message in MIME format. --------------msE1C4377628C886A3F4622D9F Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit First, for those of you that think PyDOM is heavy weight you obviously haven't treid 4DOM :) But, then that is its purpose. We built it to follow the W3C spec to the letter so it is big and can be not very user friendly at times. However, it is completely usable over an ORB or COM with clients in Java, C++, COBOL whatever. It fits our needs nicely. And as Paul mentions, with somany other people starting to use it I imagine the CORBA functionality will be usefull to others as well. I see PyDOM as its light weight brother. It gets all of its speed increase from not having to wrap every list in a NodeList, and not having to wrap every dict in a NamedNodeMap. Last I looked it also does not create some of the "not so important" nodes such as Attrs. Do we really need a third interface? qp sounds like a great optimization if you don't care about the original document structure or PIs. (you could never get 4XSL to use this interface). In some cases you may care about the document structure but not care about name spaces, or attributes, or elements... I think if we try to make it any more lite weight we will only satisfy 1/2 of us because we will need to start dropping "non important" parts of the original document. If you really need more speed, then grap SAX or expat and go. or post the super modifed class as libraries, not standard APIs. Back to the subject at hand. I think that PyDOM and 4DOM could have a standard interface for applications built on top ie 4XSL. I think we would have to come both ways a little though. Ex. I think it would be very hard to get rid of the idea of a Attr class from 4DOM, while I think we could very easily extend the NodeList class to support native python list manipulation. This would only be the case if the application is never intended for use over an ORB (cannot call __getitem__ over an ORB very easily). If we do this though, I don't see that alot will be gained. You can swap in and out DOM implementations when ever you like, but you would not be able to use the 4DOM implementation and expect 4XSL to function over an ORB. If applications that use the DOM use 4DOMS new "quick" API then the speed will be about equivelent to that of PyDOM (probably still a bit slower but not by much). Your choice would be down to, do I like Andy better, or Mike...:) Later Paul Prescod wrote: > Fredrik Lundh wrote: > > > > the downside with Paul's line of reasoning is that it makes it > > impossible to come up with something that is light-weight > > also from the CPU's perspective... not good. > > That isn't true. I tend to think that usability is more important than > performance but if we decide to optimize for performance then we can make > a DOM-compatible API that is as fast as "qp". I mean the only thing that > is harder to implement in the miniDOM is siblings -- where I chose > convenience over efficiency. We can make the opposite choice. > > In fact, I think that the namespace and language support in qp already > makes it relatively "heavyweight". > > > I want something really light-weight, and highly pythonish, and I > > don't care the slightest about TLA compatibility. > > It isn't a question of TLA compatibility. It's about using the data models > used everywhere else in the world. Python conforms to posix conventions > for file and socket operations, C conventions for string interpolation, > Perl conventions for regular expressions, Unix conventions for globbing > and so forth. If I wanted idiosyncratic invented-just-for-us interfaces I > would go and use Perl. > > To me, this is the central issue: to me, the Guido's genious lies in the > fact that he usually chooses adapt something before re-inventing it. This > makes learning Python easy. "Oh yeah, I recognize that from the other > languages I use." Well, SAX and DOM are what the other languages use. > > Anyhow, "qp" is hardly more "Pythonic" than in the lightweight DOM API. > The following_cdata stuff is not like any API I've ever seen in Python or > elsewhere. The call for "pythonic-ness" is mostly a strawman. The DOM > works better in Python than in almost any other language: Nodelists are > lists, NamedNodeLists are maps, object types are instance classes, lists > can be heterogenous, etc. > > -- > Paul Prescod - ISOGEN Consulting Engineer speaking for only himself > http://itrc.uwaterloo.ca/~papresco > > "The Excursion [Sport Utility Vehicle] is so large that it will come > equipped with adjustable pedals to fit smaller drivers and sensor > devices that warn the driver when he or she is about to back into a > Toyota or some other object." -- Dallas Morning News > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Mike Olson Member Consultant FourThought LLC http://www.fourthought.com http://opentechnology.org --- "No program is interesting in itself to a programmer. It's only interesting as long as there are new challenges and new ideas coming up." --- Linus Torvalds --------------msE1C4377628C886A3F4622D9F Content-Type: application/x-pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIIKmQYJKoZIhvcNAQcCoIIKijCCCoYCAQExCzAJBgUrDgMCGgUAMAsGCSqGSIb3DQEHAaCC CCUwggTvMIIEWKADAgECAhAOCY8cYeSQOObs5zKyDmWRMA0GCSqGSIb3DQEBBAUAMIHMMRcw FQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UECxMWVmVyaVNpZ24gVHJ1c3QgTmV0d29y azFGMEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9yZXBvc2l0b3J5L1JQQSBJbmNvcnAuIEJ5 IFJlZi4sTElBQi5MVEQoYyk5ODFIMEYGA1UEAxM/VmVyaVNpZ24gQ2xhc3MgMSBDQSBJbmRp dmlkdWFsIFN1YnNjcmliZXItUGVyc29uYSBOb3QgVmFsaWRhdGVkMB4XDTk5MDMwNTAwMDAw MFoXDTk5MDUwNDIzNTk1OVowggEKMRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UE CxMWVmVyaVNpZ24gVHJ1c3QgTmV0d29yazFGMEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9y ZXBvc2l0b3J5L1JQQSBJbmNvcnAuIGJ5IFJlZi4sTElBQi5MVEQoYyk5ODEeMBwGA1UECxMV UGVyc29uYSBOb3QgVmFsaWRhdGVkMSYwJAYDVQQLEx1EaWdpdGFsIElEIENsYXNzIDEgLSBO ZXRzY2FwZTETMBEGA1UEAxQKTWlrZSBPbHNvbjEpMCcGCSqGSIb3DQEJARYabWlrZS5vbHNv bkBmb3VydGhvdWdodC5jb20wgZ8wDQYJKoZIhvcNAQEBBQADgY0AMIGJAoGBANKGswZUnQ/B IfNlZWIIy6G6AkyjYgPRhXynebPtI5ARMq9xDo2zgLgWE+8QffdoZp2hUnTpm63B6cG8yqH1 PnA/7SB2roIfml1vnOwXgNuBctciTmnrac4GWgL0CM9839fJZh47QIVYPlCbOPtnvnH1NGGD jFWAVX7vmES72Dl9AgMBAAGjggGPMIIBizAJBgNVHRMEAjAAMIGsBgNVHSAEgaQwgaEwgZ4G C2CGSAGG+EUBBwEBMIGOMCgGCCsGAQUFBwIBFhxodHRwczovL3d3dy52ZXJpc2lnbi5jb20v Q1BTMGIGCCsGAQUFBwICMFYwFRYOVmVyaVNpZ24sIEluYy4wAwIBARo9VmVyaVNpZ24ncyBD UFMgaW5jb3JwLiBieSByZWZlcmVuY2UgbGlhYi4gbHRkLiAoYyk5NyBWZXJpU2lnbjARBglg hkgBhvhCAQEEBAMCB4AwgYYGCmCGSAGG+EUBBgMEeBZ2ZDQ2NTJiZDYzZjIwNDcwMjkyOTg3 NjNjOWQyZjI3NTA2OWM3MzU5YmVkMWIwNTlkYTc1YmM0YmM5NzAxNzQ3ZGE1ZDNmMjE0MWJl YWRiMmJkMmU4OTIxM2FlNmFmOWRmMTE0OTk5YTNiODQ1ZjlmM2VhNDUwYzAzBgNVHR8ELDAq MCigJqAkhiJodHRwOi8vY3JsLnZlcmlzaWduLmNvbS9jbGFzczEuY3JsMA0GCSqGSIb3DQEB BAUAA4GBAIuxBeIOBMHbj5yM/Vu4UJxDcz4Xtc7h0K8c6d82SiwwKLN5Gbew69PevcN6Ak+p D8LO4NyCH8Cfu3acoT0Efi99XjWvdi2eSbDJUw6MvgJtnAfY03zM+Cf31A/1iyrvr3hD45/c yhUNRh8f6qX1NzeKvvh5AcYD1bsi+0wnP0D8MIIDLjCCApegAwIBAgIRANJ2Lo0UDD19sqgl Xa/uDXUwDQYJKoZIhvcNAQECBQAwXzELMAkGA1UEBhMCVVMxFzAVBgNVBAoTDlZlcmlTaWdu LCBJbmMuMTcwNQYDVQQLEy5DbGFzcyAxIFB1YmxpYyBQcmltYXJ5IENlcnRpZmljYXRpb24g QXV0aG9yaXR5MB4XDTk4MDUxMjAwMDAwMFoXDTA4MDUxMjIzNTk1OVowgcwxFzAVBgNVBAoT DlZlcmlTaWduLCBJbmMuMR8wHQYDVQQLExZWZXJpU2lnbiBUcnVzdCBOZXR3b3JrMUYwRAYD VQQLEz13d3cudmVyaXNpZ24uY29tL3JlcG9zaXRvcnkvUlBBIEluY29ycC4gQnkgUmVmLixM SUFCLkxURChjKTk4MUgwRgYDVQQDEz9WZXJpU2lnbiBDbGFzcyAxIENBIEluZGl2aWR1YWwg U3Vic2NyaWJlci1QZXJzb25hIE5vdCBWYWxpZGF0ZWQwgZ8wDQYJKoZIhvcNAQEBBQADgY0A MIGJAoGBALtaRIoEFrtV/QN6ii2UTxV4NrgNSrJvnFS/vOh3Kp258Gi7ldkxQXB6gUu5SBNW LccI4YRCq8CikqtEXKpC8IIOAukv+8I7u77JJwpdtrA2QjO1blSIT4dKvxna+RXoD4e2HOPM xpqOf2okkuP84GW6p7F+78nbN2rISsgJBuSZAgMBAAGjfDB6MBEGCWCGSAGG+EIBAQQEAwIB BjBHBgNVHSAEQDA+MDwGC2CGSAGG+EUBBwEBMC0wKwYIKwYBBQUHAgEWH3d3dy52ZXJpc2ln bi5jb20vcmVwb3NpdG9yeS9SUEEwDwYDVR0TBAgwBgEB/wIBADALBgNVHQ8EBAMCAQYwDQYJ KoZIhvcNAQECBQADgYEAiLg3O93alDcAraqf4YEBcR6Sam0v9vGd08pkONwbmAwHhluFFWoP uUmFpJXxF31ntH8tLN2aQp7DPrSOquULBt7yVir6M8e+GddTTMO9yOMXtaRJQmPswqYXD11Y Gkk8kFxVo2UgAP0YIOVfgqaxqJLFWGrBjQM868PNBaKQrm4xggI8MIICOAIBATCB4TCBzDEX MBUGA1UEChMOVmVyaVNpZ24sIEluYy4xHzAdBgNVBAsTFlZlcmlTaWduIFRydXN0IE5ldHdv cmsxRjBEBgNVBAsTPXd3dy52ZXJpc2lnbi5jb20vcmVwb3NpdG9yeS9SUEEgSW5jb3JwLiBC eSBSZWYuLExJQUIuTFREKGMpOTgxSDBGBgNVBAMTP1ZlcmlTaWduIENsYXNzIDEgQ0EgSW5k aXZpZHVhbCBTdWJzY3JpYmVyLVBlcnNvbmEgTm90IFZhbGlkYXRlZAIQDgmPHGHkkDjm7Ocy sg5lkTAJBgUrDgMCGgUAoIGxMBgGCSqGSIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcN AQkFMQ8XDTk5MDQyMTIzMzY0OFowIwYJKoZIhvcNAQkEMRYEFMuz5l/ulfETz2WQ8N/AosaL ECkbMFIGCSqGSIb3DQEJDzFFMEMwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMAcGBSsO AwIHMA0GCCqGSIb3DQMCAgFAMA0GCCqGSIb3DQMCAgEoMA0GCSqGSIb3DQEBAQUABIGAoo4T rR3aACiAiRozvrP6Ok+JAI+I29iNHzO0A/wu1mbvzVNg8SUsxaJ2zydxmSmu+XYoVEuKF6JZ zr13w9spDmjh70QoM4syYa/zfHfoRgPXXM2vnAItdCM+A4ZdpK5o1pL9QXlQhaHJDMFO4mbb ZbBRp0c8mcXyIvokJ1lRPrk= --------------msE1C4377628C886A3F4622D9F-- From akuchlin@cnri.reston.va.us Thu Apr 22 03:28:50 1999 From: akuchlin@cnri.reston.va.us (A.M. Kuchling) Date: Wed, 21 Apr 1999 22:28:50 -0400 Subject: [XML-SIG] XBEL questions In-Reply-To: <19990419124624.A15472@beyondtv.net> References: <19990419124624.A15472@beyondtv.net> Message-ID: <199904220228.WAA13517@207-172-49-2.s2.tnt14.ann.va.dialup.rcn.com> Tim Lavoie writes: > I've just started tinkering with the XBEL package and its sample > scripts, converting from Netscape (4.5) to XBEL format. The script > needed tags converted to upper-case to recognize what Communicator had > written, no big deal. What did puzzle me was the output; the DTD lists > tags in lower case, but the bookmark.py script generated everything in > upper case. Since XML is case-sensitive, isn't this wrong? How were you running it? bookmark.py doesn't have a block of code that runs if __name__ == '__main__', so you can't run bookmark.py directly; you must have been running ns_parse.py, and that seems to produce lower-case output as it should. I also don't understand the requirement for uppercase input, because in ns_parse.py the startElement() and endElement() both convert the element name to lowercase. Can you provide the exact command you ran, and perhaps a sample bookmark file as well? (Privately to me is fine.) > message pointing to the second "=" character. This character follows > the second parameter name, which as in all HTML is preceded by a "&" > character. Could the problem be that tag contents need to be encoded > first? The tag looks like: Good catch; one of the dump_xbel() methods should have read escape(href) instead of just href. I've fixed this in the CVS tree; thanks! -- A.M. Kuchling http://starship.python.net/crew/amk/ Been there, Remiel. Done that, wore the tee-shirt, ate the burger, bought the original cast album, choreographed the legions of the damned and orchestrated the screaming... -- Lucifer, in SANDMAN #60: "The Kindly Ones:4" From uche.ogbuji@fourthought.com Thu Apr 22 06:08:50 1999 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Wed, 21 Apr 1999 23:08:50 -0600 Subject: [XML-SIG] DOM API In-Reply-To: Your message of "Mon, 19 Apr 1999 12:17:19 CDT." <371B651E.A1771FEB@prescod.net> Message-ID: <199904220508.XAA03662@malatesta.local> > > This does make sense. The first question would be philisohical: should such a > > unified interface stick closely to the W3C's IDL, or should it be more > > faithful to Python (i.e. returning PyLists instead of NodeList objects). This > > is the main difference between the two Python DOM implementation. We could > > build an adapter accordingly (most of the work is already don with > > DOM.Ext.NodeList2PyList, etc), but I'd like to hear people's opinions first. > > Do we really have to choose? If, for example, a NodeList object can act as > a Python sequence then don't we have the best of both worlds? I mean if > you really need a PyList then you can use "map" to generate one. Dieter Maurer already pointed out to me that my memory was fuzzy, and that PyDOM already provides combined (NodeList and PyList) interfaces. The main problem with 4DOM's overloading NodeList with PyList behavior, besides our desire to remain close to the spec except in clearly-marked exceptions, is the fact that you can't invoke methods of the form "__method__ " across an ORB. In fact, strictly speaking, you can't encode them into IDL. I know that this brings up yet again the question of why we insist on ORB-enabling 4DOM, but it has to do with much of the work we have been doing with 4DOM: some for clients, and some hopefully to become separate open products soon. Interfacing to object-database adapters, for instance, is a lot easier if one can directly take advantage of the ODMG-OMG bindings. We have considered a lightweight 4DOM that isn't so ORB-fanatic, but we don't really have the time for this, and besides, PyDOM fills that niche quite well. I know, I know, the problem remains that PyDOM and 4DOM are different enough to complicate portable Python DOM applications. I hope this conversation can lead us to a way about that. > I would like to think that Python is sufficiently flexible that most of > these choices could be made in a DOM compatible AND Python compatible way. Python is, of course, but not CORBA, helas. > The downside of doing both is that a Java or C++ implementation of a "raw" > DOM accessed over CORBA or COM would not be compatible -- but we could > write Python wrappers that would make them so. Well, we do provide the NodeListToPylist and PyListToNodeList, and we do often use these wrappers in our own apps. It appears you're asking for more, though. BTW, we're preparing a pretty neat demo of interchanging DOM between Java and Python over an ORB with 4DOM on the Python side. We've already found that you can do powerful things with this arrangement. Now if only Fnorb would play more nicely with other ORBs via IIOP, or if ILU would be less insistent on explicit server-side reference of objects. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From paul@prescod.net Thu Apr 22 05:54:54 1999 From: paul@prescod.net (Paul Prescod) Date: Wed, 21 Apr 1999 23:54:54 -0500 Subject: [XML-SIG] DOM API References: <008b01be8b11$e19fe550$f29b12c2@pythonware.com> <371CA723.FAEBF6AA@prescod.net> <371E610D.BA7861AE@fourthought.com> Message-ID: <371EAB9E.5B70AEE@prescod.net> Mike Olson wrote: > > I see PyDOM as its light weight brother. It gets all of its speed increase > from not having to wrap every list in a NodeList, and not having to wrap every > dict in a NamedNodeMap. Last I looked it also does not create some of the > "not so important" nodes such as Attrs. If those are created lazily then there is no runtime cost if you don't use them. > In some cases you may care about the document structure but not care about > name spaces, or attributes, or elements... I think if we try to make it any > more lite weight we will only satisfy 1/2 of us because we will need to start > dropping "non important" parts of the original document. Yes, this is what worries me. XML is XML. If you want to have a parser flag to turn off parts of XML then that's cool but a standard API should not throw away document content without a flag. > If you really need > more speed, then grap SAX or expat and go. That's what I was thinking earlier today: if these applications need speed so much then why are they using a tree API at all? My rule of thumb is: filter==fast, tree==convenient. That's why I instictively put in sibling pointers in my "minidom". > Back to the subject at hand. I think that PyDOM and 4DOM could have a > standard interface for applications built on top ie 4XSL. I think we would > have to come both ways a little though. Ex. I think it would be very hard to > get rid of the idea of a Attr class from 4DOM, while I think we could very > easily extend the NodeList class to support native python list manipulation. If you make 4DOM more Pythonish and PyDOM gets closer to DOM conformance then it seems to me that everybody wins unless some outright incompatibilities are found. > This would only be the case if the application is never intended for use over > an ORB (cannot call __getitem__ over an ORB very easily). > > If we do this though, I don't see that alot will be gained. You can swap in > and out DOM implementations when ever you like, but you would not be able to > use the 4DOM implementation and expect 4XSL to function over an ORB. Sorry, I don't get that. > If > applications that use the DOM use 4DOMS new "quick" API then the speed will be > about equivelent to that of PyDOM 4DOM has a quick API? Or are you saying that the Python-ish extensions would *be* the quick API? Does 4DOM still require the installation of an ORB? Your choice would be down to, do I like Andy better, or Mike...:) This part I understand. I like Mike. We're talking about Michael Jordan, right? Anyhow, the main thing I prefer about PyDOM is not performance but the fact that my customers don't have to install an ORB to use it. --- I don't quite follow the last paragraph but I'll take a stab: * we could make an API that used Python-ish features without being incompatible with the DOM. * 4DOM could add the Python-ish features and PyDOM could fix any outright incompatibilities (e.g. attrs) * maybe we could add a few convenience functions to make life easier (e.g. getText, getChildElements) * programs that used some DOM features would be incompatible with PyDOM because it would have optimized some of them away. * programs that used the Python-ish features would not work over an ORB. Actually, I don't buy that last point (maybe its a straw person). We can easily make a bridge that adds the Pythonish features to objects on the other side of an ORB (4DOM, Java, whatever). Of course when you are using 4DOM in the same process space you wouldn't use the bridge. In summary, I think that unifying the APIs of these libraries is the right thing to do and will give real benefits. -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "The Excursion [Sport Utility Vehicle] is so large that it will come equipped with adjustable pedals to fit smaller drivers and sensor devices that warn the driver when he or she is about to back into a Toyota or some other object." -- Dallas Morning News From Greg Stein Thu Apr 22 06:52:23 1999 From: Greg Stein (Greg Stein) Date: Wed, 21 Apr 1999 22:52:23 -0700 (PDT) Subject: [XML-SIG] qp API In-Reply-To: <371EAB9E.5B70AEE@prescod.net> Message-ID: All right.... it seems apparent that something like the qp-api that I proposed (in response to Fredrik) isn't going to really satisfy a number of people for a "lightweight" API. It seems that a tendency exists to push towards the DOM facilities. What is the approach from here? Can we really examine the qp-api interface with the intent of a lightweight system? Actually: that is a good point.... what is "lightweight" ? I define that as something that is fast, has a small set of objects, and has a small interface (few objects/methods). A question was asked: do we need Yet Another Interface? I believe that we do. IMO, the qp interface is very well tuned towards apps being able to interpret what is really going on when an XML doc arrives (yes, within certain constraints). IMO, the DOM is great for translations of input XML to output XML. But someting like qp is handy for grabbing input and dealing with it (I was never able to really do that well with the DOM). Cheers, -g -- Greg Stein, http://www.lyra.org/ From uche.ogbuji@fourthought.com Thu Apr 22 07:35:32 1999 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Thu, 22 Apr 1999 00:35:32 -0600 Subject: [XML-SIG] DOM API In-Reply-To: Your message of "Wed, 21 Apr 1999 23:54:54 CDT." <371EAB9E.5B70AEE@prescod.net> Message-ID: <199904220635.AAA03840@malatesta.local> > 4DOM has a quick API? Or are you saying that the Python-ish extensions > would *be* the quick API? Does 4DOM still require the installation of an > ORB? No. We removed that restriction several versions ago. You can just use the "make orbless" configuration to run without an ORB. > Anyhow, the main thing I prefer about PyDOM is not performance but the > fact that my customers don't have to install an ORB to use it. Well, by that criteria, now you have a choice. I have to second your summary to Fred. I'm not as hung up with performance. We've used 4DOM for some heavy lifting (not to mention Mike's diversion writing a graphics-heavy Web-based solitaire game in Fnorb: we need to find him more to do). We tend to run into bottlenecks elsewhere before the DOM. > * we could make an API that used Python-ish features without being > incompatible with the DOM. The only problem for 4DOM is those double-underscore methods. Maybe there's a way to wrapper this that escapes me. > * 4DOM could add the Python-ish features and PyDOM could fix any outright > incompatibilities (e.g. attrs) > > * maybe we could add a few convenience functions to make life easier > (e.g. getText, getChildElements) I have no problem with this. We already add many such convenient methods to our 4DOM Ext package: GetElementsByTagName, GetElementsById, Strip, etc. These mostly use the DOM level 2 NodeIterator stuff, BTW. > * programs that used some DOM features would be incompatible with PyDOM > because it would have optimized some of them away. As long as it's documented, I don't see this as a problem. > * programs that used the Python-ish features would not work over an ORB. > > Actually, I don't buy that last point (maybe its a straw person). We can > easily make a bridge that adds the Pythonish features to objects on the > other side of an ORB (4DOM, Java, whatever). Of course when you are using > 4DOM in the same process space you wouldn't use the bridge. Yes, but some Pythonish features would be quite a bear to get by an IDL compiler, and it's nice being able to (theoretically) just plug into any Java/C++/etc. module across the ORB without first adding a trickly "Pythonic" adapter. This adapter would also have to be re-written for each remote implementation. > In summary, I think that unifying the APIs of these libraries is the right > thing to do and will give real benefits. Well, we're certainly interested, and I think that if we can sort out the double-underscore thing, we're most of the way there. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From stuart.hungerford@webone.com.au Thu Apr 22 13:32:56 1999 From: stuart.hungerford@webone.com.au (Stuart Hungerford) Date: Thu, 22 Apr 1999 22:32:56 +1000 Subject: [XML-SIG] New XSL draft -- any Python plans? Message-ID: <000c01be8cbc$4056c040$0301a8c0@restless.com> Hi all, There's a new draft of the XSL proposal available (19990421) and it includes some interesting features for calling "functions" in other notations--a wonderful use for Python if I've understood the proposal right. Can anyone working on Python XSL tools (e.g. 4XSL) tell us your plans on supporting the new draft? Stu From uche.ogbuji@fourthought.com Thu Apr 22 14:14:19 1999 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Thu, 22 Apr 1999 07:14:19 -0600 Subject: [XML-SIG] ANN: 4XSL 0.6.1 Message-ID: <199904221314.HAA04564@malatesta.local> 4XSL 0.6.0 was. as we found out, not very usable outside our purposes. We've been more careful testing and packaging 4XSL 0.6.1: * We've added command line options to specify the style-sheets, ignore PI-specified style-sheets, and validate the XML file * Now debugging info is only printed if you set a special environment variable * All XSL templates except for xsl:counter and its dependents have been implemented. * We've applied the usual and numerous bug-fixes Thanks for all who gave feed-back, and were patient with 4XSL 0.6.0. We're announcing this version only to the python-xml list for now until we complete xsl:counter and update against the latest XSL(T) draft. However, please feel free to distribute, discuss and use it as you wish, according to the license (which is unchanged from 0.6.0 and much like Python). =============================================================================== 4XSL is an XSL processor written in Python, using 4DOM. This is really an alpha-level release, although we have used it successfully to render our Web site (www.FourThought.com), which quite thoroughly exercises the features. You can download 4XSL file from ftp:///starship.python.net/pub/crew/uche/4XSL/4XSL-0.6.1.tar.gz See the README in the archive to get started. Feedback welcome (to 4Web@fourthought.com). All templates except for xsl:counter and dependents are supported, and the full set of patterns. Thanks for all the interest from this group. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From uche.ogbuji@fourthought.com Thu Apr 22 14:22:24 1999 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Thu, 22 Apr 1999 07:22:24 -0600 Subject: [XML-SIG] Re: ANN: 4XSL 0.6.1 Message-ID: <199904221322.HAA04593@malatesta.local> Please note that the 4XSL 0.6.1 package comes bundled with, and requires 4DOM 0.7.2. You do not need to install a CORBA environment to install 4DOM versions after 0.7.0. 4DOM 0.7.2 is not yet widely distributed, but it has been tested internally, against 4XSL and other DOM applications. Many optimizations have been applied since 4DOM 0.7.0. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From Jeff.Johnson@icn.siemens.com Thu Apr 22 16:10:37 1999 From: Jeff.Johnson@icn.siemens.com (Jeff.Johnson@icn.siemens.com) Date: Thu, 22 Apr 1999 11:10:37 -0400 Subject: [XML-SIG] How can I search for a string of text Message-ID: <8525675B.00536154.00@li01.lm.ssc.siemens.com> normalize... haven't tried that one before. Sounds like it might do the trick. The string in question should not have any entities or elements mixed into it so I don't need to worry about that. I'll give it a shot. Thanks :) Jeff.Johnson@icn.siemens.com writes: >I need to remove a string from my HTML files but I don't know the best way to >find it. There are usually line feeds in the HTML between the string so the >string does not appear as one DOM text node. Does anyone know the best way to >find contiguous text that spans multiple DOM nodes? >"Andrew M. Kuchling" writes: > The normalize() method on an Element node consolidates the >subtree so there are no adjacent Text nodes, merging Text nodes that >are next to each other into a single node. So you could do >document.rootElement.normalize(), and then rely on the string being >contained within one node. That won't catch tricky cases -- do you >need to find it if an entity expands to the string, or to part of the >string? if the string had a PI in the middle of it, would it still >count as a match? -- but it'll certainly help with the simple case. From paul@prescod.net Thu Apr 22 16:58:13 1999 From: paul@prescod.net (Paul Prescod) Date: Thu, 22 Apr 1999 10:58:13 -0500 Subject: [XML-SIG] DOM API References: <199904220635.AAA03840@malatesta.local> Message-ID: <371F4715.2971DF2D@prescod.net> uche.ogbuji@fourthought.com wrote: > > No. We removed that restriction several versions ago. You can just > use the "make orbless" configuration to run without an ORB. Neat. And presumably after I do a "make orbless" I can ship the resulting package so that it doesn't have to be re-made on the client side. Maybe you guys should make that the default so that it works like other Python-written DOMs that do not have to be configured. > Yes, but some Pythonish features would be quite a bear to get by an IDL > compiler, and it's nice being able to (theoretically) just plug into any > Java/C++/etc. module across the ORB without first adding a trickly "Pythonic" > adapter. This adapter would also have to be re-written for each remote > implementation. This is the central issue. Why would it have to be rewritten for each remote implementation? Presumably we expect all CORBA/DOM compliant implementations to supply the same interface. Can't we wrap that interface so that the same wrappers should work for all of them. I mean I know that some things like document creation and parsing are non-standard but we should be able to uniformly wrap the rest. -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "The Excursion [Sport Utility Vehicle] is so large that it will come equipped with adjustable pedals to fit smaller drivers and sensor devices that warn the driver when he or she is about to back into a Toyota or some other object." -- Dallas Morning News From paul@prescod.net Thu Apr 22 17:04:03 1999 From: paul@prescod.net (Paul Prescod) Date: Thu, 22 Apr 1999 11:04:03 -0500 Subject: [XML-SIG] ANN: 4XSL 0.6.1 References: <199904221314.HAA04564@malatesta.local> Message-ID: <371F4873.D68D7EB2@prescod.net> uche.ogbuji@fourthought.com wrote: > > We're > announcing this version only to the python-xml list for now until > we complete xsl:counter and update against the latest XSL(T) draft. Which is it? Counters were removed from XSLT. :) For those who are interested: E. Changes from Previous Public Working Draft The following is a summary of changes since the previous public working draft. Select patterns, string expressions and boolean expressions have been combined and generalized into an expression language with multiple data types (see [6 Expressions and Patterns]). xsl:strip-space and xsl:preserve-space have an elements attribute which specifies a list of element types, rather than a element attribute specifying a single element type. The id() function has been split into id() and idref(). xsl:id has been replaced by the xsl:key element (see [6.4.1 Declaring Keys]), and associated key() and keyref() functions. The doc() and docref() have been added to support multiple source documents. Namespace wildcards (ns:*) have been added. ancestor() and ancestor-or-self() have been replaced by a more general facility for addressing different axes. Positional qualifiers (first-of-type(), first-of-any(), last-of-type(), last-of-any()) have been replaced by the position() and last() functions and numeric expressions inside []. Counters have been removed. An expr attribute has been added to xsl:number which in conjunction with the position() allows numbering of sorted node lists. Multiple adjacent uses of [] are allowed. Macros and templates have been unified by allowing templates to be named and have parameters. xsl:constant have been replaced by xsl:variable which allows variables to be typed and local. The default for priority on xsl:template has changed (see [7.4 Conflict Resolution for Template Rules]). An extension mechanism has been added (see [6.4.2 Declaring Extension Functions]). The namespace URIs have been changed. xsl:copy-of has been added (see [9.5 Copying]). A error recovery mechanism to allow forwards-compatibility has been added (see [3 Forwards-compatible Processing]). A namespace attribute has been added to xsl:element and xsl:attribute. -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "The Excursion [Sport Utility Vehicle] is so large that it will come equipped with adjustable pedals to fit smaller drivers and sensor devices that warn the driver when he or she is about to back into a Toyota or some other object." -- Dallas Morning News From dieter@handshake.de Thu Apr 22 20:58:22 1999 From: dieter@handshake.de (Dieter Maurer) Date: Thu, 22 Apr 1999 19:58:22 +0000 (/etc/localtime) Subject: [XML-SIG] New XSL draft -- any Python plans? In-Reply-To: <000c01be8cbc$4056c040$0301a8c0@restless.com> References: <000c01be8cbc$4056c040$0301a8c0@restless.com> Message-ID: <14111.32515.571734.849028@lindm.dm> Stuart Hungerford writes: > Can anyone working on Python XSL tools (e.g. 4XSL) tell us your > plans on supporting the new draft? I plan to suuport the new draft in XSL-Pattern 0.4. However, it will only be available in some weeks. - Dieter From dieter@handshake.de Thu Apr 22 21:56:17 1999 From: dieter@handshake.de (Dieter Maurer) Date: Thu, 22 Apr 1999 20:56:17 +0000 (/etc/localtime) Subject: [XML-SIG] DOM API In-Reply-To: <199904220508.XAA03662@malatesta.local> References: <371B651E.A1771FEB@prescod.net> <199904220508.XAA03662@malatesta.local> Message-ID: <14111.35252.250202.333651@lindm.dm> uche.ogbuji@fourthought.com writes: > The main problem with 4DOM's overloading NodeList with PyList behavior, > besides our desire to remain close to the spec except in clearly-marked > exceptions, is the fact that you can't invoke methods of the form "__method__ > " across an ORB. In fact, strictly speaking, you can't encode them into IDL. ILU has a nice extension called "custom surrogates". It allows the raw CORBA objects to be wrapped by custom objects providing additional functionality, e.g. Python list or dictionary emulation. I do not know, how wide spread this (or a similar) feature is in other ORB's. It is very useful, though. - Dieter From paul@prescod.net Thu Apr 22 22:37:05 1999 From: paul@prescod.net (Paul Prescod) Date: Thu, 22 Apr 1999 16:37:05 -0500 Subject: [XML-SIG] ANN: Minidom 0.6 Message-ID: <371F9681.D0C314AC@prescod.net> This is a multi-part message in MIME format. --------------5BA374B008AE171CF0613077 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Attached is a minature, lightweight subset of the DOM with a few extensions for namespace handling. (I guess an extended subset is a contradiction in terms but you get the idea!) I propose that * this become part of the xml package * we consider the DOM-creation functions and namespaces extensions for adoption in a standard Python DOM API * DOM-haters try this out and clearly describe where it falls down in their applications * we try to figure out the right set of convenience functions to make the DOM more palatable for everybody (if possible). -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "The Excursion [Sport Utility Vehicle] is so large that it will come equipped with adjustable pedals to fit smaller drivers and sensor devices that warn the driver when he or she is about to back into a Toyota or some other object." -- Dallas Morning News --------------5BA374B008AE171CF0613077 Content-Type: text/plain; charset=us-ascii; name="minidom.py" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="minidom.py" """ minidom.py -- a lightweight DOM implementation based on SAX. Version 0.6 Usage: ====== dom = DOMFromString( string ) dom = DOMFromURL( URL, SAXbuilder=None ) dom = DOMFromFile( file, SAXbuilder=None ): Actually, the three constructor methods work with PyDOM as well as minidom. Use xml.dom.sax_builder.SaxBuilder() for PyDOM. Classes: ======= The main classes are Document, Element and Text Document: childNodes: heterogenous Python list documentElement: root element Element: # main properties tagName: element type name (with colon, if it has one) childNodes: heterogenous Python list # attribute getting methods getAttribute( "foo" ): string value of foo attribute getAttribute( "foo", "someURI" ): string value of foo attribute in namespace named by URI # namespaces stuff: prefix: type name prefix localName: type name following colon uri: uri associated with prefix #advanced attribute stuff attributes: returns attribute mapping object Text: data: get the text data Todo: ===== * convenience methods for getting elements and text. * more testing * bring some of the writer an linearizer code into conformance with this interface """ from xml.sax import saxexts from xml.sax.saxlib import HandlerBase import string from StringIO import StringIO import dom.core class Node: inGetAttr=None def __getattr__( self, key ): if self.inGetAttr: raise AttributeError, key elif key[0:4]=="get_": return (lambda self=self, key=key: getattr( self, key[4:] )) else: raise AttributeError, key # self.inGetAttr=1 # func = getattr( self, "get_"+key ) # del self.inGetAttr # return func() class Document( Node ): nodeType=dom.core.DOCUMENT_NODE def __init__( self ): self.childNodes=[] self.documentElement=None __URI=0 __VALUE=1 __PREFIX=0 __LOCAL=1 def _qname2String( key ): if key[__PREFIX]: return string.join( key, ":" ) else: return key[__LOCAL] def _getVal( val ): return val[__VALUE] class Attribute(Node): def __init__( self, name, value ): self.name=name self.value=value class AttributeList: def __init__( self, attrs ): self.__attrs=attrs def items( self ): names = map( _qname2String, self.__attrs.keys() ) values = map( _getVal, self.__attrs.values() ) return map( None, names, map( Attribute, names, values ) ) def keys( self ): return map( _qname2String, self.__attrs.keys() ) def values( self ): return map( _getVal, self.__attrs.values() ) def __getitem__( self, attname ): if type( attname )==types.String: parts = string.split( attname, ":") if len(parts)==1: tup = self.__attrs[(None,parts[0])] else: tup = self.__attrs[tuple(parts)] return tup[__VALUE] elif type(attname)==types.TupleType and len( attname ) == 2: local,uri=attname for key,val in self.__attrs.items(): if val[__URI]==uri and key[__LOCAL]==local: return val[__VALUE] raise KeyError, attname else: raise TypeError, attname class Element( Node ): nodeType=dom.core.ELEMENT_NODE def __init__( self, tagName ): self.tagName = tagName self.childNodes=[] self.__attrs=None def getAttribute( self, attname, uri=None ): if uri: return self.__attrs[(attname,uri)] else: return self.__attrs[attname] class Comment( Node ): nodeType=dom.core.COMMENT_NODE def __init__(self, data ): self.data=data class ProcessingInstruction( Node ): nodeType=dom.core.PROCESSING_INSTRUCTION_NODE def __init__(self, target, data ): self.target = target self.data = data class Text( Node ): nodeType=dom.core.TEXT_NODE def __init__(self, data ): self.data = data class Error( Node ): def __init__(self, *args ): self.message = string.join( map( repr, args ) ) def __repr__( self ): return self.message class SaxBuilder( HandlerBase ): def __init__(self ): HandlerBase.__init__(self) self.cur_node = self.document = Document() self.cur_node.namespace={"xml": "http://www.w3.org/XML/1998/namespace", None:None, "xmlns":None} self.cur_node.parent=None def addChild( self, node ): self.cur_node.childNodes.append( node ) def nssplit( self, qname ): if string.find( qname, ":" )!=-1: prefix,local = string.split( qname, ":" ) else: prefix,local = None,qname node = self.cur_node while node: if node.namespace.has_key(prefix): uri = node.namespace[prefix] return (prefix,local,uri) node=node.parent raise Error, "Namespace def not found for "+prefix def handleAttrs( self, attrs ): outattrs = {} handleLater = [] for (attrname,value) in attrs.items(): if attrname[0:6]=="xmlns:": prefix,local=string.split( attrname, ":" ) outattrs[(prefix,local)]=(None,value) self.cur_node.namespace[local]=value elif attrname=="xmlns": prefix,local=(None,"xmlns") outattrs[(prefix,local)]=(None,value) self.cur_node.namespace[None]=value else: handleLater.append( (attrname, value ) ) for (attrname,value) in handleLater: (prefix,local,uri)=self.nssplit( attrname ) outattrs[(prefix, local)]=(uri,value) return outattrs def startElement( self, tagname , attrs={} ): node = Element( tagname ) self.addChild( node ) node.parent = self.cur_node self.cur_node = node self.cur_node.namespace = {None:None,"xmlns":None} node.attributes = AttributeList( self.handleAttrs( attrs ) ) node.tagname = tagname (node.prefix, node.localName, node.uri)= self.nssplit( tagname ) def endElement( self, name, attrs={} ): del self.cur_node.namespace node = self.cur_node self.cur_node = node.parent del node.parent def comment( self, s): self.addChild( Comment( s ) ) def processingInstruction( self, target, data ): node = ProcessingInstruction( target, data ) self.addChild( node ) def characters( self, chars, start, length ): node = Text( chars[start:start+length] ) self.addChild( node ) def endDocument( self ): assert( not self.cur_node.parent ) del self.cur_node.parent for node in self.cur_node.childNodes: if node.nodeType==dom.core.ELEMENT_NODE: self.document.documentElement = node if not self.document.documentElement: raise Error, "No document element" del self.cur_node.namespace # public constructors def DOMFromString( string ): return DOMFromFile( StringIO( string ) ) def DOMFromURL( URL, builder=None ): builder = builder or SaxBuilder() p=saxexts.make_parser() p.setDocumentHandler( builder ) p.parse( URL ) return builder.document def DOMFromFile( file, builder=None ): builder = builder or SaxBuilder() p=saxexts.make_parser() p.setDocumentHandler( builder ) p.parseFile( file ) return builder.document if __name__=="__main__": import sys, os file = os.path.join( os.path.dirname( sys.argv[0] ), "test/quotes.xml" ) docs=[] docs.append( DOMFromURL( file ) ) docs.append( DOMFromFile( open( file ) ) ) docs.append( DOMFromString( open( file ).read() ) ) from xml.dom.writer import XmlWriter import xml.dom.sax_builder # test against PyDOM docs.append( DOMFromURL( file, xml.dom.sax_builder.SaxBuilder() ) ) outputs=[] for doc in docs: outputs.append( StringIO() ) XmlWriter(outputs[-1]).walk( doc ) for output in outputs[1:]: assert output.getvalue() == outputs[0].getvalue() print output.getvalue() # I don't like modules that export their imported modules for key,value in locals().items(): if `type( value )` =="": del locals()[key] del key, value --------------5BA374B008AE171CF0613077-- From paul@prescod.net Fri Apr 23 17:55:08 1999 From: paul@prescod.net (Paul Prescod) Date: Fri, 23 Apr 1999 11:55:08 -0500 Subject: [XML-SIG] qp API References: Message-ID: <3720A5EC.D79A464@prescod.net> Greg Stein wrote: > > Actually: that is a good point.... what is "lightweight" ? I define that > as something that is fast, has a small set of objects, and has a small > interface (few objects/methods). > > A question was asked: do we need Yet Another Interface? I believe that we > do. IMO, the qp interface is very well tuned towards apps being able to > interpret what is really going on when an XML doc arrives (yes, within > certain constraints). IMO, the DOM is great for translations of input XML > to output XML. But someting like qp is handy for grabbing input and > dealing with it (I was never able to really do that well with the DOM). I hear three different issues: * performance * size of interface * walking-around convenience I think that a lightweight DOM implementation can go a long way toward meeting these requirements. Performance: If we take out parent and sibling pointers, I see know reason that a DOM implementation should be more than a few percent slower than qp_xml. In the minidom implementation I am working on, 60% of the code and probably a big chunk of the runtime is dedicated to the stupid^H^H^H^H^H^H inconvenient namespace processing. If we're both doing namespace processing we will both incur that overhead. Even with namespaces, whole thing is less than 300 lines of code! Size of interface: Minidom has 3 builder methods (building from strings, files and filenames) and 6 runtime classes -- only one of which is even mildly complex (again, because of namespace handling) If you are handling simple documents without PIs and comments then you only need to deal with three classes: document, element and text. In other words the interface that most people will use is rather small. convenience: We can add convenience functions that allow people with different interests to find the information that they need in an XML document. Convenience functions add to the interface but they don't really affect performance much. So far I have identified a need to be able to iterate over the elements, find a child element by its type name and deeply conctenate the text of a node. What else? -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "The Excursion [Sport Utility Vehicle] is so large that it will come equipped with adjustable pedals to fit smaller drivers and sensor devices that warn the driver when he or she is about to back into a Toyota or some other object." -- Dallas Morning News From paul@prescod.net Fri Apr 23 18:06:22 1999 From: paul@prescod.net (Paul Prescod) Date: Fri, 23 Apr 1999 12:06:22 -0500 Subject: [XML-SIG] PySAX more pythonish Message-ID: <3720A88E.C041B08B@prescod.net> I would like the attributes parameter to startElement to be defaulted in all SAX implementations. I would also like a new method called "text" similar to the one implemented by xml.dom.sax_builder. "text" just takes a text string instead of a string and offsets. That's a little more pythonish for both the caller and callback. The default DocumentHandler would re-route "characters" to "text". Someone who needed the (potentially) more efficient behavior of "characters" could override the implementation and re-route text to characters instead. What do you think? -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco Company spokeswoman Lana Simon stressed that Interactive Yoda is not a Furby. Well, not exactly. "This is an interactive toy that utilizes Furby technology," Simon said. "It will react to its surroundings and will talk." - http://www.wired.com/news/news/culture/story/19222.html From Fred L. Drake, Jr." References: <3720A88E.C041B08B@prescod.net> Message-ID: <14112.51082.402231.654127@weyr.cnri.reston.va.us> Paul Prescod writes: > I would like the attributes parameter to startElement to be defaulted in > all SAX implementations. ... > I would also like a new method called "text" similar to the one > implemented by xml.dom.sax_builder. "text" just takes a text string Sounds good to me! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From Fred L. Drake, Jr." References: <3720A5EC.D79A464@prescod.net> Message-ID: <14112.51262.675379.668123@weyr.cnri.reston.va.us> Paul Prescod writes: > If we take out parent and sibling pointers, I see know reason that a DOM > implementation should be more than a few percent slower than qp_xml. In I'd like to see the parent pointer kept, but I'm also fine with an explicit destroy() or close() method instead of those damnable proxies. I haven't actually needed sibling pointers, so I'm not sure I care about them. They can be computed easily enough if someone wants the data on an "occaisional" basis. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From akuchlin@cnri.reston.va.us Fri Apr 23 21:01:02 1999 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Fri, 23 Apr 1999 16:01:02 -0400 (EDT) Subject: [XML-SIG] qp API In-Reply-To: <14112.51262.675379.668123@weyr.cnri.reston.va.us> References: <3720A5EC.D79A464@prescod.net> <14112.51262.675379.668123@weyr.cnri.reston.va.us> Message-ID: <14112.52252.324863.576300@amarok.cnri.reston.va.us> Fred L. Drake writes: > I'd like to see the parent pointer kept, but I'm also fine with an >explicit destroy() or close() method instead of those damnable >proxies. What problems do the proxies present? It would be possible to remove them and use an explicit destroy() if they present technical problems of their own. > I haven't actually needed sibling pointers, so I'm not sure I care >about them. They can be computed easily enough if someone wants the >data on an "occaisional" basis. If you have parent and child pointers, you don't need sibling pointers since you just go up to the parent & retrieve its children. I haven't really formed an opinion about the Minidom module. On the one hand, I don't like adding an interface that resembles another interface; too many similar choices can be confusing. (But if PyDOM is upward-compatible with Minidom, that may not be a problem.) On the other hand, PyDOM *is* quite heavyweight, and I can understand the desire for something similar. Can people please give their opinions about this? (I do like the convenience functions like DOMFromString; something similar should definitely be added, perhaps to dom.utils.) -- A.M. Kuchling http://starship.python.net/crew/amk/ I don't believe in an afterlife, so I don't have to spend my whole life fearing hell, or fearing heaven even more. For whatever the tortures of hell, I think the boredom of heaven would be even worse. -- Isaac Asimov 1920-1992 RIP From Fred L. Drake, Jr." References: <3720A5EC.D79A464@prescod.net> <14112.51262.675379.668123@weyr.cnri.reston.va.us> <14112.52252.324863.576300@amarok.cnri.reston.va.us> Message-ID: <14112.54696.546630.72873@weyr.cnri.reston.va.us> I wrote: > I'd like to see the parent pointer kept, but I'm also fine with an >explicit destroy() or close() method instead of those damnable >proxies. Andrew M. Kuchling writes: > What problems do the proxies present? It would be possible to > remove them and use an explicit destroy() if they present technical They require a lot of object creation, and slow things down a lot for tree walking and generally ensuring you have sufficiently "current" references. > If you have parent and child pointers, you don't need sibling > pointers since you just go up to the parent & retrieve its children. That's what I meant about them being easily computable. > PyDOM is upward-compatible with Minidom, that may not be a problem.) > On the other hand, PyDOM *is* quite heavyweight, and I can understand > the desire for something similar. Can people please give their > opinions about this? I think sufficient compatibility can be kept. While what I've been doing isn't performance critical, it can be a real nuissance. I'd like it to be fast for the same reasons I want a compiler to be fast: sometimes I'm actually waiting in blocking mode. ;-( I may have a more interesting need for performance in the future, but I'm not sure yet. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From paul@prescod.net Fri Apr 23 21:53:06 1999 From: paul@prescod.net (Paul Prescod) Date: Fri, 23 Apr 1999 15:53:06 -0500 Subject: [XML-SIG] qp API References: <3720A5EC.D79A464@prescod.net> <14112.51262.675379.668123@weyr.cnri.reston.va.us> Message-ID: <3720DDB2.2483DD7F@prescod.net> "Fred L. Drake" wrote: > > Paul Prescod writes: > > If we take out parent and sibling pointers, I see know reason that a DOM > > implementation should be more than a few percent slower than qp_xml. In > > I'd like to see the parent pointer kept, but I'm also fine with an > explicit destroy() or close() method instead of those damnable > proxies. The problem with close() is that it is O(N) with the size of your document, isn't it? I'm on the fence about parent pointers...maybe they should be a construction option. They would be off by default. > I haven't actually needed sibling pointers, so I'm not sure I care > about them. They can be computed easily enough if someone wants the > data on an "occaisional" basis. True. -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco Company spokeswoman Lana Simon stressed that Interactive Yoda is not a Furby. Well, not exactly. "This is an interactive toy that utilizes Furby technology," Simon said. "It will react to its surroundings and will talk." - http://www.wired.com/news/news/culture/story/19222.html From Fred L. Drake, Jr." References: <3720A5EC.D79A464@prescod.net> <14112.51262.675379.668123@weyr.cnri.reston.va.us> <3720DDB2.2483DD7F@prescod.net> Message-ID: <14112.60197.369126.212959@weyr.cnri.reston.va.us> Paul Prescod writes: > The problem with close() is that it is O(N) with the size of your > document, isn't it? I'm on the fence about parent pointers...maybe they > should be a construction option. They would be off by default. O(N) is right, but the constant is small enough to make up for it with any measure of real work going on while the tree is live. Having it be optional would be quite sufficient for me. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From paul@prescod.net Fri Apr 23 22:04:25 1999 From: paul@prescod.net (Paul Prescod) Date: Fri, 23 Apr 1999 16:04:25 -0500 Subject: [XML-SIG] qp API References: <3720A5EC.D79A464@prescod.net> <14112.51262.675379.668123@weyr.cnri.reston.va.us> <14112.52252.324863.576300@amarok.cnri.reston.va.us> Message-ID: <3720E059.28CD59BC@prescod.net> "Andrew M. Kuchling" wrote: > > If you have parent and child pointers, you don't need sibling > pointers since you just go up to the parent & retrieve its children. Well, yes and no. If you have 10,000 nodes how do you get the next and previous node easily? (easily is the key word here) > I haven't really formed an opinion about the Minidom module. > On the one hand, I don't like adding an interface that resembles > another interface; too many similar choices can be confusing. (But if > PyDOM is upward-compatible with Minidom, that may not be a problem.) I certainly intend for minidom to be a subset of PyDOM and 4DOM. Any extensions I made should be interpreted as suggestions for extensions to PyDOM and 4DOM. > (I do like the convenience functions like DOMFromString; > something similar should definitely be added, perhaps to dom.utils.) Why not in "dom" itself? I don't see them as utilities but as the fundamental, commonly used entry points to DOM functionality. -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco Company spokeswoman Lana Simon stressed that Interactive Yoda is not a Furby. Well, not exactly. "This is an interactive toy that utilizes Furby technology," Simon said. "It will react to its surroundings and will talk." - http://www.wired.com/news/news/culture/story/19222.html From paul@prescod.net Fri Apr 23 22:16:05 1999 From: paul@prescod.net (Paul Prescod) Date: Fri, 23 Apr 1999 16:16:05 -0500 Subject: [XML-SIG] DOM API References: <3720A5EC.D79A464@prescod.net> <14112.51262.675379.668123@weyr.cnri.reston.va.us> <14112.52252.324863.576300@amarok.cnri.reston.va.us> Message-ID: <3720E315.A4532E83@prescod.net> "Andrew M. Kuchling" wrote: > > (I do like the convenience functions like DOMFromString; > something similar should definitely be added, perhaps to dom.utils.) As I said in my other messages, I want minidom to be a of PyDOM and 4DOM and hopefully the start of a common API. In that vein, minidom makes some decisions and extensions that we should discuss: dom = DOMFromString( string, SAXbuilder=None ) dom = DOMFromURL( URL, SAXbuilder=None ) dom = DOMFromFile( file, SAXbuilder=None ) The default SAXBuilder would probably be the PyDOM or minidom builder. Minidom uses mixed lower-first for property names. For compatibility with PyDOM, properties can be requested through get_ methods. My question is: do we really need get_ methods? They don't seem very Pythonish to me. Or maybe we can use them as implementation mechanism (_get_) but not expose them to the client. I prefer the class-specific properties to the weird generic ones: tagName to nodeName, value to nodeValue and so forth. Obviously PyDOM and 4DOM would implement both but I don't see any reason to support that redundancy in minidom. I made some namespace extensions because we can't wait forever to do namespace support. getAttribute( "foo", "http://www.blah.bar" ) Looks up the obvious attribute. element.localName gets the second have of the element type name. element.uri gets the URI associated with the prefix. element.prefix gets the element's prefix. I don't think that the namespaces view that prefixes are irrelevant should obviate the XML 1.0 view that they are NOT. Even if we accept the namespaces view of the world entirely, prefixes are chosen to be mmenonmic so they shouldn't be discared by software. element.attributes returns an attribute mapping object that I think behaves exactly like PyDOMs except for namespace support: x.attributes["foo", "http://www.blah.bar"] This also works, however: x.attributes["bar:foo"] (just as in PyDOM) Namespace attributes ARE maintained as attributes. keys(), items() and values() should be the same as PyDOM. I should unify my Error class with PyDOM's. I am considering the following enhancements: element.elements: returns a list of element children. element.getText: returns a list of deep list of data from the text nodes. Do your own string.join to choose an appropriate join character. element.getChild("FOO") returns the first child (not descendant) element with specified element type name. element.getChild("FOO", "http://...") does the obvious thing. element.getChild( "#PCDATA" ) gets a list of child text nodes. -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco Company spokeswoman Lana Simon stressed that Interactive Yoda is not a Furby. Well, not exactly. "This is an interactive toy that utilizes Furby technology," Simon said. "It will react to its surroundings and will talk." - http://www.wired.com/news/news/culture/story/19222.html From ke@gnu.franken.de Sat Apr 24 06:26:12 1999 From: ke@gnu.franken.de (Karl Eichwalder) Date: 24 Apr 1999 07:26:12 +0200 Subject: [XML-SIG] xml-0.5.1: LICENCE (xmlarch) Message-ID: There's room for interpretation. The LICENCE file says: xmlarch: -------------------------------------------------------------------- Copyright (C) 1998 by Geir O. Grønmo, grove@infotek.no Free for commercial and non-commercial use. -------------------------------------------------------------------- But arch/xmlarch.py says (as it stands, this implies, that it's "unfree" under certain circumstances): Copyright (C) 1998 by Geir O. Grønmo, grove@infotek.no It is free for non-commercial use, if you modify it please let me know. aTdHvAaNnKcSe for clarification! -- Karl Eichwalder From gstein@lyra.org Sat Apr 24 11:21:02 1999 From: gstein@lyra.org (Greg Stein) Date: Sat, 24 Apr 1999 03:21:02 -0700 Subject: [XML-SIG] DOM Considered Harmful :-) Message-ID: <37219B0E.3123A201@lyra.org> All right... I've been slow to respond and only minimal because I was out this past week (but still had minimal access). I'm leaving in about three hours to Mexico... I'll have zero access for a week. Of course, this means that I have the privilege of posting something highly controversial with the hope that an argument will continue for the next seven days and I can rejoin it at that time :-) Okay... seriously, though, I'd like to state my opposition to a DOM, a subset, or a DOM-like API for a "lightweight" XML parsing solution. Here are my assumptions/requirements/etc: 1) lightweight means: a) fast as possible b) conceptually simple for the user c) narrow interface (somewhat related to (b)) 2) 1b, 1c imply simple doc, so a non-DOM interface is not a hurdle 3) this API is only for consuming XML 4) it is fine to "fall back" to the DOM if the lightweight API doesn't meet a client's needs a) corollary: the ability to swap in alternative parsers is not required b) corollary: ORB compat is not required c) corollary: stylistic compatibility (with other language's XML libraries) is not required A couple items have been discussed on the list which I'd like to call out and respond to: 1) the DOM concept of node types IMO, this is one of the most broken things about the DOM. The child nodes end up being some random mixture of various element types. Any client trying to deal with this must *test* each node before they use it to see if they're looking at the right thing. This is very troublesome. As a real-world example, when I coded davlib against the DOM and I needed the first (only) child element of my element, there was no easy and evident approach to this. I knew that child element would be a , but what happened was that Text nodes were mixed in. "oh, well do a findByTagName" or whatever. That wouldn't help on the next case, where I needed each of the child elements for the . Also, look at that answer: "use findByTagName" ... that is simply a mechanism to get around the fact that the DOM has introduced a hard-to-use structure. Paul recently followed up to his original proposal with another proposal to add new methods to his element objects. Specifically, the getChild() method -- again, this was introduced *solely* due to the fact that the DOM has a heterogeneous list of children. The client must apply various filters and other processing to get useful information. The system must apply tests "is this the right node type?" here and there. In one of Paul's original responses to my post, he listed "convenience" as part of the definition of "lightweight". It sure is, but his response to making a DOM subset convenient was to introduce helper functions. I think this is quite broken. As a comparison, the qp_xml module returns an element that has *only* elements for children. There is no filtering or other things to get past. The list items are *known*. The text is stored outside of that list so that you don't have to manually separate the two all the time. Essentially, qp_xml is easy/convenient *inherently* rather than patched-up via convenience functions. In summary, I maintain that any DOM-style system is not inherently simple or easy to use because of its heterogeneous node lists. I further believe that something like qp_xml is much nicer all around because its simplicity/ease/etc originates right from the bottom, rather than being hidden behind a second layer of API. Disclaimer: qp_xml does have a convenience function (the textof() function, which could/should be a method instead). The existence of the function is based solely on the underlying representation of text contents, where that design was chosen to be able to retain the document structure (insofar as elems/text are retained). 2) the close() method and parent/sibling relationships Adding parent/sibling relationships introduces loops unless you use proxies or introduce a close() method (if there is another way, then I'd like to learn it). Proxies are out for efficiency reasons -- objects get constructed every time you simply want to peek into the data structure. While the complexity is (mostly) hidden from the client, it is still there. You don't end up with simple data structures... instead, you get a lot of "mechanism" in there to deal with intercepting accesses so that you can create a proxy to bundle up the necessary data. A close() type method introduces other problems. If you aren't careful, then it is easy to leak the entire parse tree. What happens if you pass a subset of the tree to another subsystem? You will have one of two problems: 1) the client avoids calling close() so the subsystem can use parent references (this leaks the whole tree); or 2) the client calls close() so the subsystem only retains its subtree, but now its (expected) parent/sibling relationsips no longer work. It has a set of objects that don't fully respond to their published API. Other alternatives: ways to detach the subtree or specifying that the elements have two defined states (with and without parent/sibling relationships). Gee... now we're getting into complex APIs for the client to deal with. I'm tremendously in favor of the model returned by qp_xml. You get a set of simple objects that have no methods. They are really just attribute retainers. Inside these, you have a *Python* list of children, and a *Python* mapping of attributes. Nothing fancy. Simple and easy. Note: personally, I believe that the client can operate quite fine without parent or sibling pointers. If a function needs an element's parent, then whoever passed the element should pass the parent, too. From a conceptual level, I am also a bit shaky on an element knowing anything about its parents or siblings. It would seem that anything dealing with a particular element should do so in a context-free manner. Note 2: if you really need parents/siblings (i.e. it is difficult to structure your app to avoid them), then you can always fall back to the DOM. Okay... now a couple other issues: * processing instructions. (thanx Paul for the links) I looked at the three specs that Paul linked (didn't need the XML spec.. I knew what they were! :-). Two of them, the DDML and DCD specs, use PIs only as a means of checking the conformance of a document. The document can be parsed and handled with or without the PIs. The third: style sheets. Ick. The PI contains actual data, rather than conformance issues. I note that a Rationale has been appended to the spec. I bet that was added because the PI is used for more than document processing (i.e. it alters semantics). A minimal approach to PIs might be to include *only* the PIs that occur in the prolog into a list. Since the xml-stylesheet PI can only occur in the prolog, this approach would pick them up. (not that I like it though :-) * note to Paul: the code you posted is broken :-). You apply the default namespace to attributes that have no prefix. The XML Namespaces spec states that no prefix on an attribute means "no namespace". You also fail to distinguish between "no namespace" in the original state of beginning to parse, and when somebody resets it using xmlns="". In addition, you reset the default namespace to "no namespace" inside each startElement. [ and a Q: why do you have the "xmlns" prefix defined in startElement? ] [ design comment: I don't think you want to retain prefixes... if clients believe they can use a prefix that you provide, then problems will develop *very* quickly. If the client isn't careful, they could end up with conflicting prefixes. trust me on this one... mod_dav has a *bitch* of a time dealing with namespace prefixes. I highly recommend that you drop them; similarly, I believe you should filter the xmlns* attributes. ] [ design comment: you should probably index your attributes by (uri, local) rather than prefix. the client does not know the prefixes ahead of time, so they will be unable to fetch the attributes. ] * comments on loss of information (also on "why not use SAX?") The tree form is very useful. Without it, then an application would need to implement a state machine to effectively process parsed XML. Seeing a element means nothing in itself. When you post-process the tree and step down thru the tree, the parent will place you into the proper state. For programmers/clients, the tree model is also very handy. It exists *outside* of the parsing event. Clients may not be able to structure their responses to the input to be part of the parsing event stream. Regarding loss of info: for many applications, the client only needs to know the contents. The finer details of the document structure are pointless. These applications are typically using XML as a data transfer mechanism, rather than a layout mechanism. DAV and XML-RPC are two examples. PIs and comments are not useful. I'll send some individual replies to the other emails. This email, however, is my overall summary and argument against DOM-like APIs. I maintain that an API such as that provided by qp_xml is very useful for a particular class of applications. Further, I maintain that it would be a Good Thing to include qp_xml (or whatever name and with whatever API/code tweaks) be included into the XML distribution. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Sat Apr 24 11:41:26 1999 From: gstein@lyra.org (Greg Stein) Date: Sat, 24 Apr 1999 03:41:26 -0700 Subject: [XML-SIG] DOM API References: <371C9388.D0690375@prescod.net> Message-ID: <37219FD6.73061D0@lyra.org> Paul Prescod wrote: > ... > http://www.w3.org/TR/REC-xml > http://www.w3.org/TR/xml-stylesheet > http://www.w3.org/TR/NOTE-dcd > http://www.w3.org/TR/NOTE-ddml > > Well let's put it this way: XML 1.0 uses PIs. XML 1.0 *defines* PIs. That is very different. > So does the stylesheet > binding extension (for CSS and XSL). This is what I was looking for: the *use* of a PI. Per my other email (treatise? :-), I think that I've discovered we are operating within two classes of applications: * data-oriented use of XML * layout-oriented use of XML For the former, I have not seen a case where a PI is necessary. For the latter: yes, you need a PI for stylesheets. Too bad... you get to use the DOM :-) > I don't doubt that namespaces are important but they can easily be viewed > as an extension of (or layer on top of) the minimal API. Nope. Namespaces are critical, as Fredrik has pointed out. My endeavors to use namespaces within the DOM style of programming has also led me to believe that it isn't a simple extension or layer on top of a minimal API. Why? Well... if you attempt to post-process the namespace information, then where do you store it? The client that is doing the post-processing only receives *proxy* objects. It cannot drop the information there since those objects are *not* persistent. Instead, the client has to reach into the internals of the DOM to set (and get!) the namespace info. Bleck! > There are four objects there. If we want it to be a tree we need a wrapper > object that contains them. You could argue that in the lightweight API the > version and doctype information could disappear but surely we want to > allow people to figure out what stylesheets are attached to their > documents! I maintain that the stylesheets are not applicable to certain classes of XML processing. So yes, they get punted too. A simple API of elements and text is more than suitable. > > NodeType is bogus. It should be absolutely obvious from the context what a > > Node is. If you have so many objects in your system that you need NodeType > > to distinguish them, then you are certainly not a light-weight solution. > > XML is a dynamically typed language, like XML. If I have a mix of > elements, characters and processing instructions then I need some way of > differentiating them. I don't feel like it is the place of an API to > decide that XML is a strongly typed language and silently throw away > important information from the document. Hello? It *is* the place of the API to define semantics. That is what APIs do. I can understand if you don't like this particular semantic, but I feel your argument is deeply flawed. > > > Document.DocumentElement (an element node property) > > > > If Document has no other properties, then it is totally bogus. Just return > > the root Element. Why the hell return an object with a single property > > that refers to another object? Just return that object! > > Document should also have ChildNodes. Your spec didn't show it. Okay... so it has ChildNodes. How do you get the root element? Oops. You have to scan for the thing. Painful! > > If you want light-weight, then GetAttribute is bogus given that the same > > concept is easily handled via the .Attributes value. Why introduce a > > method to simply do Element.Attributes.get(foo) ?? > > GetAttribute is simpler, more direct and maybe more efficient in some > cases. It works with simple strings and not attribute objects. It will *never* be more efficient. Accessing a Python attribute and doing a map-fetch will always be faster than a method call. Plain and simple. (caveat: as I mentioned in prior posts, qp_xml should be using a mapping rather than a list of objects... dunno what I was thinking) > > > Element.TagName > > > Element.PreviousSibling > > > Element.NextSibing > > > > These Sibling things mean one of two things: > > > > 1) you have introduced loops in your data structure > > 2) you have introduced the requirement for the proxy crap that the current > > DOM is dealing with (the Node vs _nodeData thing). > > > > (1) is mildly unacceptable in a light-weight solution (you don't want > > people to do a quick parse of data, and then require them to follow it up > > with .close()). > > I don't see this as a big deal. > > This is an efficiency versus simplicity issue. These functions are > extremely convenient in a lot of situations. The origin of qp_xml was for efficiency first, simplicity second. I maintain that qp_xml provides both. I will agree to disagree that parents and siblings are useful. (IMO, they are not, and only serve to complicate the system). > > Case in point: I wrote a first draft davlib.py against the DOM. Damn it > > was a serious bitch to simply extract the CDATA contents of an element! > > XML is a dynamically typed language. "I've implemented Java and now I'm > trying to implement Python and I notice that you guys through these > PyObject things around and they make my life harder. I'm going to dump > them from my implementation." Again, back to this "dynamically typed language". That is your point of view, rather than a statement of fact. I won't attempt to characterize how you derived that point of view (from the DOM maybe?), but it is NOT the view that I hold. XML is a means of representing structured data. That structure takes the form of elements (with attributes) and contained text. I do not see how XML is a programming langauge, or that it is dynamically typed. It is simply a representation in my mind. And I'll ignore the quote which just seems to be silliness or flamebait... > > Moreover, it was also a total bitch to simply say "give me the child > > elements". Of course, that didn't work since the DOM insisted on returning > > a list of a mix of CDATA and elements. > > It told you what was in your document. I also get that from qp_xml with a lot less hassle, so that says to me that the DOM is introducing needless complexity/hassle for the client. > If you want to include helper functions to do this stuff then I say fine: > but if you want to throw away the real structure of the document then I > don't think that that is appropriate. Helper functions are simply a mechanism to patch the inherent complexity introduced by the DOM. It does not need to be so complicated. Python has excellent mechanisms to hold structured data; qp_xml uses them to provide excellent benefit (relative to the DOM). The only "structure" that I toss are PIs and comments. I do not view those as "structure". The contents (elements, attributes, text) are retained and can be reconstructed from the structure that qp_xml returns. > > IMO, the XML DOM model is a neat theoretical expression of OO modelling of > > an XML document. For all practical purposes, it is nearly useless. (again: > > IMO) ... I mean hey: does anybody actually use the DOM to *generate* XML? > > Screw that -- I use "print". I can't imagine generating XML using the DOM. > > Complicated and processing intensive. > > I'm not sure what your point is here. I wouldn't use the DOM *or* qp_xml > to generate XML in most cases. As you point out "print" or "file.write" is > sufficient in most applications. This has nothing to do with the DOM and > everything to do with the fact that writing to a file is inherently a > streaming operation so a tree usually gets in the way. Most of the DOM's interface is for *building* a DOM structure. It is conceivable that those APIs only exist as a way to response to parsing events, but I believe their existence is due to the fact that people want to build a DOM and then generate the resulting XML. Otherwise, we could have had two levels of the DOM interface: read-only (with private construction mechanisms), and read-write (as exemplified by the current DOM). I believe that the notion of build/generate via the DOM is bogus. It seems you agree :-), and that print or file.write is more appropriate. Fredrik has some utility objects to do it. All fine. The DOM just blows :-) > > Sorry to go off here, but the DOM really bugs me. I think it is actually a > > net-negative for the XML community to deal with the beast. I would love to > > be educated on the positive benefits for expressing an XML document thru > > the DOM model. > > I think that the DOM is broken for a completely different set of reasons > than you do. But the DOM is also hugely popular and more widely > implemented than many comparable APIs in other domains. I'm told that I could care less about compatibility. I'm trying to write an application here. Geez... using your viewpoint: if I wanted compatibility, then maybe I should use Java or C since everybody else uses that. > Microsoft's DOM impelementation is referenced in dozens of their products > and throughout many upcoming technologies. Despite its flaws, the DOM is > an unqualified success and some people like it more than XML itself. They > are building DOM interfaces to non-XML data! Goody for them. That doesn't help me write my application. > > Use a mapping. Toss the intermediate object. If you just have name and > > value, then you don't need separate objects. Present the attributes as a > > mapping. > > In this case I am hamstrung by DOM compatibility. This is a small price to > pay as long as we keep the simpler GetAttribute methods. The only reason > to get the attribute objects is when you want to iterate over all > attributes which is probably relatively rare. This is why I say "toss the DOM". Help your client programmers, rather than be subserviant to the masses distorted view of XML programming :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Sat Apr 24 11:52:12 1999 From: gstein@lyra.org (Greg Stein) Date: Sat, 24 Apr 1999 03:52:12 -0700 Subject: [XML-SIG] DOM API References: <008b01be8b11$e19fe550$f29b12c2@pythonware.com> <371CA723.FAEBF6AA@prescod.net> Message-ID: <3721A25C.6DD5DA3D@lyra.org> Paul Prescod wrote: > > Fredrik Lundh wrote: > > > > the downside with Paul's line of reasoning is that it makes it > > impossible to come up with something that is light-weight > > also from the CPU's perspective... not good. > > That isn't true. I tend to think that usability is more important than > performance but if we decide to optimize for performance then we can make > a DOM-compatible API that is as fast as "qp". I mean the only thing that > is harder to implement in the miniDOM is siblings -- where I chose > convenience over efficiency. We can make the opposite choice. I maintain that qp_xml is both highly performant and highly usable. Per my other emails, I do not believe that the DOM is highly usable. I also tend to believe that being slaved to the DOM API will always hamper your performance when you *access* the data structure. Sure... you might be able to build it nearly as fast (nearly! you may have more objects to create), but you are constraining access to be through methods rather than Python data structures. > In fact, I think that the namespace and language support in qp already > makes it relatively "heavyweight". Those are necessary to retain all information from the input XML. Toss those and you *really* toss out information. IMO, they do not introduce any "heaviness". They are two attributes that you can totally ignore. If your document is unconcerned with namespaces, then ignore the .ns attribute. If you don't care about language-specific handling in your app, then ignore the .lang attribute. *Nothing* forces you to use those attributes, so that means they do *not* impinge upon your client. The only thing their presence does is to add some descriptive text in the API specification and introduce some overhead in the parsing process. > > I want something really light-weight, and highly pythonish, and I > > don't care the slightest about TLA compatibility. Go Fredrik! :-) My kinda guy :-) > It isn't a question of TLA compatibility. It's about using the data models > used everywhere else in the world. Python conforms to posix conventions Hello!?!?! Fredrik just said he DOES NOT CARE. Why are you stating that he SHOULD? He gets to program according to whatever guidelines *he* wants. > To me, this is the central issue: to me, the Guido's genious lies in the > fact that he usually chooses adapt something before re-inventing it. This > makes learning Python easy. "Oh yeah, I recognize that from the other > languages I use." Well, SAX and DOM are what the other languages use. As I've said before, "Goody for them." You cannot be a slave to a single API when it does not fulfill your needs. If I may speak for Fredrik, the two of us want a Pythonish and *fast* way to parse XML, and we don't care what other languages do because our application is in *PYTHON*. The DOM does not satisfy our requirements. > Anyhow, "qp" is hardly more "Pythonic" than in the lightweight DOM API. > The following_cdata stuff is not like any API I've ever seen in Python or > elsewhere. The call for "pythonic-ness" is mostly a strawman. The DOM > works better in Python than in almost any other language: Nodelists are > lists, NamedNodeLists are maps, object types are instance classes, lists > can be heterogenous, etc. The qp stuff uses native Python lists, mappings, and strings. The DOM uses NodeLists, NamedNodeLists, and TextElements. following_cdata is a design choice to model the underlying XML. My implementation of it as a string attribute on an object is very Pythonish. You just happen to disagree with my design choice. I don't feel that the choice is not-Python, but is actually an interesting and unique way to model the XML. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Sat Apr 24 11:55:42 1999 From: gstein@lyra.org (Greg Stein) Date: Sat, 24 Apr 1999 03:55:42 -0700 Subject: [XML-SIG] ANN: Minidom 0.6 References: <371F9681.D0C314AC@prescod.net> Message-ID: <3721A32E.6A7C1AD@lyra.org> I posted a general commentary along with a few items in my big email note. I'll just wrap up with a few extra items here: Paul Prescod wrote: > > Attached is a minature, lightweight subset of the DOM with a few > extensions for namespace handling. (I guess an extended subset is a > contradiction in terms but you get the idea!) > > I propose that > > * this become part of the xml package This would be fine, but I do not believe it is okay to include minidom to the exclusion of qp_xml (or a similar model). > * we consider the DOM-creation functions and namespaces extensions for > adoption in a standard Python DOM API Agreed, although the model your propose may need some work, per my other email. > * DOM-haters try this out and clearly describe where it falls down in > their applications The DOM model itself is hard to work with. This follows the same pattern. > * we try to figure out the right set of convenience functions to make the > DOM more palatable for everybody (if possible). The convenience functions are simply mechanism to avoid the inherent complexity. The convenience functions will also reduce the speed benefits that we are trying to achieve. If you don't reduce complexity or increase speed, they why go this route? Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Sat Apr 24 11:59:04 1999 From: gstein@lyra.org (Greg Stein) Date: Sat, 24 Apr 1999 03:59:04 -0700 Subject: [XML-SIG] qp API References: <3720A5EC.D79A464@prescod.net> Message-ID: <3721A3F8.6B4A37F2@lyra.org> Paul Prescod wrote: >... > If we take out parent and sibling pointers, I see know reason that a DOM > implementation should be more than a few percent slower than qp_xml. In Building it will be about the same. Accessing it will be slower and harder. > Size of interface: > > Minidom has 3 builder methods (building from strings, files and filenames) > and 6 runtime classes -- only one of which is even mildly complex (again, > because of namespace handling) If you are handling simple documents > without PIs and comments then you only need to deal with three classes: > document, element and text. In other words the interface that most people > will use is rather small. Clients will still need to learn and understand the interface (to then discover they only need a subset). The presence of the other classes adds complexity to the situation. > convenience: > > We can add convenience functions that allow people with different No additional comments here :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Sat Apr 24 12:01:05 1999 From: gstein@lyra.org (Greg Stein) Date: Sat, 24 Apr 1999 04:01:05 -0700 Subject: [XML-SIG] DOM API References: <3720A5EC.D79A464@prescod.net> <14112.51262.675379.668123@weyr.cnri.reston.va.us> <14112.52252.324863.576300@amarok.cnri.reston.va.us> <3720E315.A4532E83@prescod.net> Message-ID: <3721A471.19D88620@lyra.org> Paul Prescod wrote: >... > element.prefix gets the element's prefix. I don't think that the > namespaces view that prefixes are irrelevant should obviate the XML 1.0 > view that they are NOT. Even if we accept the namespaces view of the world > entirely, prefixes are chosen to be mmenonmic so they shouldn't be > discared by software. I discussed this in my other email, but wanted to emphasize the point: Retaining the prefix is a *very* bad idea. I will elaborate if necessary when I return from Mexico, but I will earnestly ask that any further progress on miniDOM should first remove this from the API. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Sat Apr 24 12:06:13 1999 From: gstein@lyra.org (Greg Stein) Date: Sat, 24 Apr 1999 04:06:13 -0700 Subject: [XML-SIG] punchy and belligerent? :-) Message-ID: <3721A5A5.38DA68E7@lyra.org> My apologies to all, and especially Paul, for any feelings of defensiveness or insult that my recent series of posts may have engendered. That certainly is not my intent. My issue is with the DOM API itself, and with my views on what a clean/simple/lean API should look like. Needless to say, I have strong convictions here :-) (that isn't to say that I'm against changes to qp_xml, but simply that I want to avoid certain characteristics of the DOM... in particular, I'd like to ask Fredrik to post his suggestions/changes/alternative module code) Paul: please don't take any of my comments personally. They are all based against the DOM. You just happen to be the person posting commentary on the DOM, so you (unfortunately) have born the brunt of my posts. Let's all hope that if I go an absorb many liters of tequila over the next week that I'll return without my DOM-bashing crusade :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From paul@prescod.net Sat Apr 24 18:52:54 1999 From: paul@prescod.net (Paul Prescod) Date: Sat, 24 Apr 1999 12:52:54 -0500 Subject: [XML-SIG] DOM API References: <371C9388.D0690375@prescod.net> <37219FD6.73061D0@lyra.org> Message-ID: <372204F6.4A6174E1@prescod.net> Greg Stein wrote: > > XML 1.0 *defines* PIs. That is very different. Okay, so you agree that PIs are part of XML document instance data. Let me ask you this, do you think that Gadfly should dump the parts of the SQL spec that Aaron doesn't like? > Per my other email (treatise? :-), I think that I've discovered we are > operating within two classes of applications: > > * data-oriented use of XML > * layout-oriented use of XML This is a false dichotomy. Many of my customers are data-oriented people who want to style their data. For instance I was at a stastical company last week. I gave you four specifications that used PIs: XML, xml-stylesheet, DCD and DDML. Only one of those four has anything to do with stylesheets or formatting. The other three are as applicable to data as to traditional documents. > For the former, I have not seen a case where a PI is necessary. For the > latter: yes, you need a PI for stylesheets. Too bad... you get to use > the DOM :-) So to keep PIs out we should split the interface and (further) confuse new Python programmers? > Instead, the > client has to reach into the internals of the DOM to set (and get!) the > namespace info. Bleck! Well, I've decided to put namespace info into minDOM even though it made it significantly less "lightweight." > I maintain that the stylesheets are not applicable to certain classes of > XML processing. So yes, they get punted too. If there is a class of processing that does not use a feature then the feature should be removed? Goodbye namespaces. Goodbye sub-elements. > A simple API of elements and text is more than suitable. Not data access APIs. XML's semantics are partially defined in the XML specification itself and will be fully specified in an upcoming specification called the "XML Information Set." http://www.w3.org/TR/NOTE-xml-infoset-req "The XML Information Set will describe these abstract XML objects and their properties. It will provide a common reference set that other specifications can use and extend to construct their underlying data models, and will help to ensure interoperability among the various XML-based specifications and among XML software tools in general." Technical and intellectual interoperability is what I'm fighting for. > Your spec didn't show it. Okay... so it has ChildNodes. How do you get > the root element? Oops. You have to scan for the thing. Painful! doc.childNodes doc.documentElement > It will *never* be more efficient. Accessing a Python attribute and > doing a map-fetch will always be faster than a method call. Plain and > simple. This gets back to Mike's question: Are we creating a new library here or defining a new *interface*? If we're defining a library then we know all of the performance implications in advance. Because if we are defining an interface then we need to consider implementations that are implemented in ways that do not use Python hashes underneath. Generating the hash or map-wrapper could be expensive. > The origin of qp_xml was for efficiency first, simplicity second. I > maintain that qp_xml provides both. first_cdata, following_cdata, non-recursive text dumping? Doesn't seem very simple to me. It is completely unlike any API I have ever seen, even in strongly typed programming languages where it would seem more appropriate. > Again, back to this "dynamically typed language". That is your point of > view, rather than a statement of fact. I won't attempt to characterize > how you derived that point of view (from the DOM maybe?), but it is NOT > the view that I hold. The contents of an element are *by definition*, elements, characters and processing instructions. You can't wish that fact away. That's a heterogenous list. WD-XML: "PIs are not part of the document's character data, but must be passed through to the application." > XML is a means of representing structured data. That structure takes the > form of elements (with attributes) and contained text. I do not see how > XML is a programming langauge, or that it is dynamically typed. It is > simply a representation in my mind. XML is not a programming language but it explicitly supports heterogenous lists. > And I'll ignore the quote which just seems to be silliness or > flamebait... My point: I don't think Python implementors should try to pretend that Python does not (for example) support heterogenous lists and neither should XML implementors. > > > Moreover, it was also a total bitch to simply say "give me the child > > > elements". Of course, that didn't work since the DOM insisted on returning > > > a list of a mix of CDATA and elements. > > > > It told you what was in your document. > > I also get that from qp_xml with a lot less hassle, so that says to me > that the DOM is introducing needless complexity/hassle for the client. It isn't needless complexity if you need the PIs. I could find an application of XML that doesn't use attributes -- do we now define an API that dumps those too? > The only "structure" that I toss are PIs and comments. I do not view > those as "structure". The contents (elements, attributes, text) are > retained and can be reconstructed from the structure that qp_xml > returns. Fortunately it is not up to us to define XML. The XML specification says that processors should pass them along to applications. > Most of the DOM's interface is for *building* a DOM structure. It is > conceivable that those APIs only exist as a way to response to parsing > events, but I believe their existence is due to the fact that people > want to build a DOM and then generate the resulting XML. In some cases they do. In other cases they read a DOM, make a small modification and then write that. In still other cases, they make a DOM, edit by hand in a graphical, DOM-based editor and then write that out. In yet other cases, DOM modifications are performed in order to create a graphical effect in a browser. > Otherwise, we > could have had two levels of the DOM interface: read-only (with private > construction mechanisms), and read-write (as exemplified by the current > DOM). That's exactly what we have. Minidom is the read-only version with private construction mechanisms and PyDOM/4DOM are read-write. > I could care less about compatibility. I'm trying to write an > application here. If you could care less about compatibility, maybe you shouldn't be using XML. XML is about compatibility. > Geez... using your viewpoint: if I wanted > compatibility, then maybe I should use Java or C since everybody else > uses that. Slavish adherence to conventions is not a good idea, but neither is reinventing wheels. From my point of view that's exactly what qp_xml does. > Goody for them. That doesn't help me write my application. You have a library. It works for you. What's the problem? Now you want to make it a standard API. That means that user interface concers become important. Here are some important principles of interface design are: * reuse what people already know * do not unnecessarily multiply interfaces People know and seem to like, the DOM. A subset can be made about as fast, convenient and small as qp_xml. -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco Company spokeswoman Lana Simon stressed that Interactive Yoda is not a Furby. Well, not exactly. "This is an interactive toy that utilizes Furby technology," Simon said. "It will react to its surroundings and will talk." - http://www.wired.com/news/news/culture/story/19222.html From ken@bitsko.slc.ut.us Fri Apr 23 22:53:05 1999 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 23 Apr 1999 16:53:05 -0500 Subject: [XML-SIG] DOM API In-Reply-To: Paul Prescod's message of "Sat, 24 Apr 1999 12:52:54 -0500" References: <371C9388.D0690375@prescod.net> <37219FD6.73061D0@lyra.org> <372204F6.4A6174E1@prescod.net> Message-ID: Paul Prescod writes: > Greg Stein wrote: > > > > XML 1.0 *defines* PIs. That is very different. > > Okay, so you agree that PIs are part of XML document instance > data. Why not have an option in the DOM tree builder or a method on the document and element nodes to remove PIs? Very similar to the normalize() method for joining consecutive text nodes. The combination of SAX filters and a SAX DOM tree builder allows one to choose (write a filter for) any type of tree you want to see. It seems very important to me that the tree model itself be able to hold every type of node an application may need (including even non-DOM, non-XML nodes), but it is also important to be able to constrain the nodes in a tree to _just_ those nodes an application wants. Re. lightweight API, I haven't seen qp yet so I'm not exactly sure what it's API is. When I think of a ``lightweight'' XML tree I don't think of an API, per se, at all. A lightweight XML tree to me is a nested tree of XML objects. For example, an element would have attributes name, attributes (a dictionary), and contents (a list) and a PI would have attributes target and data. The core objects are in classes, but there are no (or no strong need to have) methods in the core classes. Methods are added to the core classes by ``extensions''. Extensions are things like normalize, get elements by tag name, get elements by id, visitors, filters, writers, converters, etc. The effect, though, is that outside of calling methods to act on the tree all you're doing is working directly with the XML objects and their attributes. This pattern works well on any type of tree that has both complex data types and many categories of functions that may be applied to the tree, such as 2D and 3D graphics, directed graphs, networks, component hierarchies, etc. -- Ken MacLeod ken@bitsko.slc.ut.us From ken@bitsko.slc.ut.us Fri Apr 23 23:11:01 1999 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 23 Apr 1999 17:11:01 -0500 Subject: [XML-SIG] qp_xml API (was: DOM API) In-Reply-To: Greg Stein's message of "Mon, 19 Apr 1999 02:28:29 -0700" References: <199904171425.IAA03919@malatesta.local> <371AD27F.7E0334A0@lyra.org> <371AE5EF.13BA1B8C@lyra.org> <01f101be8a42$d665e830$f29b12c2@pythonware.com> <371AF73D.52254043@lyra.org> Message-ID: Greg Stein writes: > Parser.parse(input): input may be a string or an object supporting > the "read" method (e.g. a file or httplib.HTTPResponse (from my new > httplib module)). The input must represent a complete XML > document. It will be fully parsed and a lightweight representation > will be returned. This method may be called any number of times (for > multiple documents). The returned object is an instance of > qp_xml._element. It was suggested in an earlier thread that multiple builders should be allowed for. A technique for implementing this is to take the `parse' function out of the tree class altogether and put tree builders into their own classes. There is very little functional difference between the two (i.e. all you're doing is moving the `parse' function you have into a different class, it still returns a tree), but the semantic difference of ``who can build a tree'' becomes very clear. This can be very useful for the DOM and DOM-subset packages being talked about elsewhere. For example, a DOM-builder that takes SAX events and calls DOM-factory methods to build a tree can be used to build any of the DOM trees. -- Ken MacLeod ken@bitsko.slc.ut.us From uche.ogbuji@fourthought.com Sun Apr 25 16:00:34 1999 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Sun, 25 Apr 1999 09:00:34 -0600 Subject: [XML-SIG] SAX2: Parser properties In-Reply-To: Your message of "09 Apr 1999 22:43:58 +0200." Message-ID: <199904251500.JAA07692@malatesta.local> I'm sorry I'm responding so late, but better so than never, I hope. > The first three properties come from the JavaSAX proposal, while the > last one was invented by yours truly. > > > http://xml.org/sax/properties/namespace-sep (write-only) > Set the separator to be used between the URI part of a name and the > local part of a name when namespace processing is being performed > (see the http://xml.org/sax/features/namespaces feature). By > default, the separator is a single space. This property may not be > set while a parse is in progress (throws a SAXNotSupportedException). > > http://xml.org/sax/properties/dom-node (read-only) > Get the DOM node currently being visited, if the SAX parser is > iterating over a DOM tree. If the parser recognises and supports > this property but is not currently visiting a DOM node, it should > return null (this is a good way to check for availability before the > parse begins). > > This property doesn't make much sense for Python, but I see no point > in leaving it out, either. Actually, we are planning a SAX writer for a (hopefully near) future version of 4DOM, and this could support this property. > http://xml.org/sax/properties/xml-string (read-only) > Get the literal string of characters associated with the current > event. If the parser recognises and supports this property but is > not currently parsing text, it should return null (this is a good > way to check for availability before the parse begins). I stole > this idea from Expat. > > > In addition, I think PySAX needs the following property: > > http://python.org/sax/properties/data-encoding (read/write) > This property can be used to control which character encoding is > used for data events that come from the parser. In Java this is not > an issue since all strings are Unicode, but in Python it is. Expat > reports UTF-8, while xmlproc/xmllib just pass on whatever they're > given. > > Do we need a special SAXEncodingNotSupportedException for this? > Otherwise it may be impossible to tell whether the parser doesn't > support this at all or whether it just doesn't support this > particular encoding. I agree that this is the best way to go for now, but I think the question should be at least raised as to whether it is better to agree on a normal encoding form for parser string output and enforcing this in the SAX drivers (by conversion, if necessary). -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From uche.ogbuji@fourthought.com Sun Apr 25 16:02:34 1999 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Sun, 25 Apr 1999 09:02:34 -0600 Subject: [XML-SIG] SAX2: Handler classes In-Reply-To: Your message of "09 Apr 1999 22:44:50 +0200." Message-ID: <199904251502.JAA07706@malatesta.local> > This list is just copied from the Java proposal. Does anyone think we > should skip any of these or add any new ones? > > http://xml.org/sax/handlers/lexical > Receive callbacks for comments, CDATA sections, and (possibly) > entity references. > > http://xml.org/sax/handlers/dtd-decl > Receive callbacks for element, attribute, and (possibly) parsed > entity declarations. > > http://xml.org/sax/handlers/namespace > Receive callbacks for the start and end of the scope of each > namespace declaration. I think they are all important, and I can't think of any additions. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From uche.ogbuji@fourthought.com Sun Apr 25 16:13:37 1999 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Sun, 25 Apr 1999 09:13:37 -0600 Subject: [XML-SIG] SAX2: Attribute extensions In-Reply-To: Your message of "17 Apr 1999 18:06:12 +0200." Message-ID: <199904251513.JAA07722@malatesta.local> > This posting specifies two interfaces for information needed by the > DOM (and possibly also others) and also for full XML 1.0 conformance. > I'm not really sure whether we should actually use all of this, so > opinions are welcome. > > class AttributeList2: > > def isSpecified(self,attr): > """Returns true if the attribute was explicitly specified in the > document and false otherwise. attr can be the attribute name or > its index in the AttributeList.""" This is pretty much essential for full DOM support, and thus it would help us greatly for the SAX builder in 4DOM. > def getEntityRefList(self,attr): > """This returns the EntityRefList (see below) for an attribute, > which can be specified by name or index.""" > > The class below is inteded to be used for discovering entity reference > boundaries inside attribute values. This is needed because the XML 1.0 > recommendation requires parsers to report unexpanded entity references, > also inside attribute values. Whether this is really > something we want is another matter. I'm not clear on what the alternative is. For example, if the parser doesn't expand &monty;, do you suggest that it should instead just return the literal "xx&monty;xx" as the attribute value. leaving the application to spot the "&" and assume an entity reference appropriately? This seems rather a shift in burden to the app. If this is not what you mean, then it would seem to make sense for the parser to report unexpanded entity refs. > class EntityRefList: > > def getLength(self): > "Returns the number of entity references inside this attribute value." > > def getEntityName(self, ix): > "Returns the name of entity reference number ix (zero-based index)." > > def getEntityRefStart(self, ix): > """Returns the index of the first character inside the attribute > value that stems from entity reference number ix.""" > > def getEntityRefEnd(self, ix): > "Returns the index of the last character in entity reference ix." > > > One redeeming feature of this interface is that it lives entirely > outside the attribute value, and so can be ignored entirely by those > who are not interested. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From uche.ogbuji@fourthought.com Sun Apr 25 16:21:32 1999 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Sun, 25 Apr 1999 09:21:32 -0600 Subject: [XML-SIG] SAX2: LexicalHandler In-Reply-To: Your message of "17 Apr 1999 18:05:20 +0200." Message-ID: <199904251521.JAA07736@malatesta.local> > This handler is supposed to be used by applications that need > information about lexical details in the document such as comments and > entity boundaries. Most applications won't need this, but the DOM will > find it useful. Support for this handler will be optional. > > This handler has the handerID http://xml.org/sax/handlers/lexical. > > class LexicalHandler: > > def xmlDecl(self, version, encoding, standalone): > """All three parameters are strings. encoding and standalone are not > specified on the XML declaration, their values will be None.""" I think you're missing an "If" at the beginning of the last sentence. > def startDTD(self, root, publicID, systemID): > """This event is reported when the DOCTYPE declaration is > encountered. root is the name of the root element type, while the two last > parameters are the public and system identifiers of the external > DTD subset.""" Excellent. This would fill a huge hole in SAX -> DOM building. > def endDTD(self): > "This event is reported after the DTD has been parsed." > > def startEntity(self, name): > """Reports the beginning of a new entity. If the entity is the > external DTD subset the name will be '[dtd]'.""" > > def endEntity(self, name): > pass > > def startCDATA(self): > pass > > def endCDATA(self): > pass -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From uche.ogbuji@fourthought.com Sun Apr 25 16:50:21 1999 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Sun, 25 Apr 1999 09:50:21 -0600 Subject: [XML-SIG] qp API In-Reply-To: Your message of "Fri, 23 Apr 1999 16:04:25 CDT." <3720E059.28CD59BC@prescod.net> Message-ID: <199904251550.JAA07778@malatesta.local> > > I haven't really formed an opinion about the Minidom module. > > On the one hand, I don't like adding an interface that resembles > > another interface; too many similar choices can be confusing. (But if > > PyDOM is upward-compatible with Minidom, that may not be a problem.) > > I certainly intend for minidom to be a subset of PyDOM and 4DOM. Any > extensions I made should be interpreted as suggestions for extensions to > PyDOM and 4DOM. And we are watching with great unterest. 4DOM already has "DOMFromString", "DOMFromURL", and "DOMFromFile" equivalents, although we call them "FromXML" and "FromHTML", "From*MLURL ", and "From*MLFile". We also have "FromXMLStream" and "FromHTMLStream". These functions are all in DOM.Ext. These helper functions are provided since 4DOM 0.7.1, and so is supported by all the versions that come with 4XSL. > > (I do like the convenience functions like DOMFromString; > > something similar should definitely be added, perhaps to dom.utils.) > > Why not in "dom" itself? I don't see them as utilities but as the > fundamental, commonly used entry points to DOM functionality. Agreed, but I don't expect this in the near future from the beleaguered DOM WG. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From uche.ogbuji@fourthought.com Sun Apr 25 17:31:17 1999 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Sun, 25 Apr 1999 10:31:17 -0600 Subject: [XML-SIG] DOM API In-Reply-To: Your message of "Sat, 24 Apr 1999 03:41:26 PDT." <37219FD6.73061D0@lyra.org> Message-ID: <199904251631.KAA07815@malatesta.local> I've stayed out of the "justify the DOM" argument because I'm not really interested in it. I like the DOM, I find it powerful and useful, and I use it in many places. I can't help it if others feel the contrary, and I'm not in the mood for an emacs/vi, gnome/kde type debate. However, I am particularly puzzled by a couple of comments. > I believe that the notion of build/generate via the DOM is bogus. It > seems you agree :-), and that print or file.write is more appropriate. > Fredrik has some utility objects to do it. All fine. The DOM just blows > :-) Build/generate is explicitly outside the scope of the present DOM, so I don't see how the latter conclusion follows from the first sentence. > I could care less about compatibility. I'm trying to write an > application here. Geez... using your viewpoint: if I wanted > compatibility, then maybe I should use Java or C since everybody else > uses that. It's important to note that many of us _are_ successful building applications based on the DOM, and I agree with Paul that the DOM's extraordinary success is ample proof against the DOM's being broken for practical use. For example, at FourThought, we've had cause to evaluate commercial Databases with DOM support, and the answer is increasingly "all of them". Now one thing I'll say in my observation of DB vendors: one or two of them always adopt the latest fad, but you never find such large-scale adoption of a technology in the glacial DB world unless there is real merit. And I say the above even keeping in mind my disappointment with the slow adoption of ODMG/OQL. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From uche.ogbuji@fourthought.com Sun Apr 25 17:14:31 1999 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Sun, 25 Apr 1999 10:14:31 -0600 Subject: [XML-SIG] DOM API In-Reply-To: Your message of "Fri, 23 Apr 1999 16:16:05 CDT." <3720E315.A4532E83@prescod.net> Message-ID: <199904251614.KAA07800@malatesta.local> > As I said in my other messages, I want minidom to be a of PyDOM and 4DOM > and hopefully the start of a common API. In that vein, minidom makes some > decisions and extensions that we should discuss: > > dom = DOMFromString( string, SAXbuilder=None ) > dom = DOMFromURL( URL, SAXbuilder=None ) > dom = DOMFromFile( file, SAXbuilder=None ) As I've mentioned, 4DOM already supports these functions, if under different names (which we don't mind normalizing to any names that are generally agreed upon). We do have a few additional parameters, though, which I think are essential for strict DOM compliance, which I realize is not a key goal of PyDOM and minidom, but they're probably fodder for discussion. def FromXML( xmlStr, ownerDocument=None, validate=0, keepAllWS=0, catName=None, SAXHandlerClass=XMLDOMGenerator) * ownerDocument alows us to set this property for generated nodes. If None, we create a new Document node from the factory and add the built nodes to the document. If the ownerDocument _is_ set, the new nodes are not added to the document, and a DocumentFragment is returned instead. This behavior corresponds to most of the use-cases we determined for building. * validate is to tell the parser whether or not to validate * keepAllWS basically tells the SAX handler whether to discard ignorable_whitespace. * catName is for Xcatalog support (xmlproc only). I don't think this needs be considered for a unified DOMFromString * SAXHandlerClass is our equivalent of your SAXBuilder > The default SAXBuilder would probably be the PyDOM or minidom builder. > > Minidom uses mixed lower-first for property names. For compatibility with > PyDOM, properties can be requested through get_ methods. My question is: > do we really need get_ methods? They don't seem very Pythonish to me. Or > maybe we can use them as implementation mechanism (_get_) but not expose > them to the client. > > I prefer the class-specific properties to the weird generic ones: tagName > to nodeName, value to nodeValue and so forth. Obviously PyDOM and 4DOM > would implement both but I don't see any reason to support that redundancy > in minidom. > > I made some namespace extensions because we can't wait forever to do > namespace support. > > getAttribute( "foo", "http://www.blah.bar" ) > > Looks up the obvious attribute. > > element.localName gets the second have of the element type name. > > element.uri gets the URI associated with the prefix. > > element.prefix gets the element's prefix. I don't think that the > namespaces view that prefixes are irrelevant should obviate the XML 1.0 > view that they are NOT. Even if we accept the namespaces view of the world > entirely, prefixes are chosen to be mmenonmic so they shouldn't be > discared by software. > > element.attributes returns an attribute mapping object that I think > behaves exactly like PyDOMs except for namespace support: > > x.attributes["foo", "http://www.blah.bar"] > > This also works, however: > > x.attributes["bar:foo"] (just as in PyDOM) > > Namespace attributes ARE maintained as attributes. keys(), items() and > values() should be the same as PyDOM. We might consider this for Namespace support for 4DOM, although we had been planning to wait for W3C to jump, so that we could maintain standards-compliance. Right now 4DOM just treats namespaces entirely opaquely, i.e. ignores them. Maybe there is a way to add your above suggestions to DOM.Ext. > I should unify my Error class with PyDOM's. > > I am considering the following enhancements: > > element.elements: returns a list of element children. In full DOM, this is trivial using Level 2 iterators. We'd have no problem adding a wrapper function to DOM.Ext, though. > element.getText: returns a list of deep list of data from the text nodes. > Do your own string.join to choose an appropriate join character. I'm not sure how useful this is if we omit the semantics of nested elements. I would see more use for a method that simply returns the XML text within an element, including nested tags. > element.getChild("FOO") returns the first child (not descendant) element > with specified element type name. I've never had a need for such a method. I often need all such elements, in which case I just use getElementsByTagName. > element.getChild("FOO", "http://...") does the obvious thing. > > element.getChild( "#PCDATA" ) gets a list of child text nodes. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From dieter@handshake.de Fri Apr 23 19:49:30 1999 From: dieter@handshake.de (Dieter Maurer) Date: Fri, 23 Apr 1999 18:49:30 +0000 (/etc/localtime) Subject: [XML-SIG] addition of "encoding='iso8859-1'" in xml prolog Message-ID: <14112.48598.425955.848660@lindm.dm> The XML generators in our XML package (0.5.1) do not generate UTF-8, but use the character set that happens to be Pythons character set. I think, we should allow for an encoding hook and include a corresponding "encoding" declaration in the XML prolog. Tim Lavoie used XBEL (ns_parse.py) on bookmarks with international (iso8859-1) entries. The resulting XML was not parsable, because some of the non-ASCII characters led to invalid UTF-8 codes. - Dieter From dieter@handshake.de Sun Apr 25 22:40:19 1999 From: dieter@handshake.de (Dieter Maurer) Date: Sun, 25 Apr 1999 21:40:19 +0000 (/etc/localtime) Subject: [XML-SIG] qp API In-Reply-To: <3720DDB2.2483DD7F@prescod.net> References: <3720DDB2.2483DD7F@prescod.net> Message-ID: <14115.35341.379579.36825@lindm.dm> Paul Prescod writes: > The problem with close() is that it is O(N) with the size of your > document, isn't it? I'm on the fence about parent pointers...maybe they > should be a construction option. They would be off by default. But there is no difference in runtime behavior (O(N)), whether the close() is explicite or implicite (i.e. because the reference count reaches 0). The real problem with an explicite close() are dangling references. Assume, the application has a reference to an inner node in the document tree. The close() would probably remove all parent pointers from the subtree (this is very similar (a bit worse) to what would happen, if weakdicts would be used for parent pointer implementation). - Dieter From dieter@handshake.de Sun Apr 25 22:31:29 1999 From: dieter@handshake.de (Dieter Maurer) Date: Sun, 25 Apr 1999 21:31:29 +0000 (/etc/localtime) Subject: [XML-SIG] qp API In-Reply-To: <14112.52252.324863.576300@amarok.cnri.reston.va.us> References: <14112.51262.675379.668123@weyr.cnri.reston.va.us> <14112.52252.324863.576300@amarok.cnri.reston.va.us> Message-ID: <14115.35045.345868.73082@lindm.dm> Andrew M. Kuchling writes: > I haven't really formed an opinion about the Minidom module. > On the one hand, I don't like adding an interface that resembles > another interface; too many similar choices can be confusing. (But if > PyDOM is upward-compatible with Minidom, that may not be a problem.) > On the other hand, PyDOM *is* quite heavyweight, and I can understand > the desire for something similar. Can people please give their > opinions about this? I am quite happy with PyDOM. I would be even happier if DOM building and processing would be faster. I would not use a different API, if I am not forced to for performance reasons. - Dieter From grove@infotek.no Mon Apr 26 08:54:54 1999 From: grove@infotek.no (Geir Ove Grønmo) Date: 26 Apr 1999 09:54:54 +0200 Subject: [XML-SIG] PySAX more pythonish In-Reply-To: <3720A88E.C041B08B@prescod.net> References: <3720A88E.C041B08B@prescod.net> Message-ID: * Paul Prescod | I would like the attributes parameter to startElement to be defaulted in | all SAX implementations. | | I would also like a new method called "text" similar to the one | implemented by xml.dom.sax_builder. "text" just takes a text string | instead of a string and offsets. That's a little more pythonish for both | the caller and callback. | | The default DocumentHandler would re-route "characters" to "text". Someone | who needed the (potentially) more efficient behavior of "characters" could | override the implementation and re-route text to characters instead. | | What do you think? I like this. There is a small thing to notice in the current implementations: You are not guaranteed that a sequence of characters is returned by _one_ event only. Because of buffering in the parsers/drivers you may end up with several events. This is very inconvenient at times. Lars Marius has written some code to make sure that these events are merged into one. I think this was written as a parser filter. I'm not sure if he intends to include this in the Python SAX libraries, but it would be very nice to have it available. On the other hand, it would also be nice to have the text method do this. :-) All the best, Geir O. From grove@infotek.no Mon Apr 26 09:06:50 1999 From: grove@infotek.no (Geir Ove Grønmo) Date: 26 Apr 1999 10:06:50 +0200 Subject: [XML-SIG] Re: xml-0.5.1: LICENCE (xmlarch) In-Reply-To: References: Message-ID: * Karl Eichwalder | There's room for interpretation. The LICENCE file says: | | xmlarch: | -------------------------------------------------------------------- | Copyright (C) 1998 by Geir O. Grønmo, grove@infotek.no | | Free for commercial and non-commercial use. | -------------------------------------------------------------------- | | But arch/xmlarch.py says (as it stands, this implies, that it's "unfree" | under certain circumstances): | | Copyright (C) 1998 by Geir O. Grønmo, grove@infotek.no | | It is free for non-commercial use, if you modify it please let me | know. Oops, I'll fix this right away. xmlarch is free for _both_ commercial and non-commercial use. Sorry about the glitch. Geir O. From pmadsen@newbridge.com Mon Apr 26 13:47:04 1999 From: pmadsen@newbridge.com (Paul Madsen) Date: Mon, 26 Apr 1999 08:47:04 -0400 Subject: [XML-SIG] Windows compiled version of XML toolkit Message-ID: <37246048.575FEE4F@newbridge.com> This is a multi-part message in MIME format. --------------9EC7A2E1913B90CEB7E22F31 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi, the Python/XML HOWTO references a pre-compiled version of the toolkit for Windows. Is there such a beast available? Thanks for any info. Paul --------------9EC7A2E1913B90CEB7E22F31 Content-Type: text/x-vcard; charset=us-ascii; name="pmadsen.vcf" Content-Transfer-Encoding: 7bit Content-Description: Card for Paul Madsen Content-Disposition: attachment; filename="pmadsen.vcf" begin:vcard n:Madsen;Paul tel;work:599-3600 x6589 x-mozilla-html:FALSE url:http://eis.ca.newbridge.com org:Newbridge Networks;Electronic Information Services adr:;;;;;; version:2.1 email;internet:pmadsen@newbridge.com title:Structured Information Analyst x-mozilla-cpt:;-1 fn:Paul Madsen end:vcard --------------9EC7A2E1913B90CEB7E22F31-- From paul@prescod.net Mon Apr 26 20:50:14 1999 From: paul@prescod.net (Paul Prescod) Date: Mon, 26 Apr 1999 14:50:14 -0500 Subject: [XML-SIG] PySAX more pythonish References: <3720A88E.C041B08B@prescod.net> Message-ID: <3724C376.D99E95E0@prescod.net> "Geir Ove Grønmo" wrote: > > On the other hand, it would also be nice to have the text method do > this. :-) I can't think how to implement it easily in HandlerBase.characters. We would have to implement it in every driver, I think. Lars' filter is probably the best solution. -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco Company spokeswoman Lana Simon stressed that Interactive Yoda is not a Furby. Well, not exactly. "This is an interactive toy that utilizes Furby technology," Simon said. "It will react to its surroundings and will talk." - http://www.wired.com/news/news/culture/story/19222.html From paul@prescod.net Mon Apr 26 21:27:53 1999 From: paul@prescod.net (Paul Prescod) Date: Mon, 26 Apr 1999 15:27:53 -0500 Subject: [XML-SIG] Python DOM Unification -- level Message-ID: <3724CC49.AAB857A5@prescod.net> Following are some meta-questions on the proposed Python DOM unification. First, what is the appropriate level of unification? * Module level: if sys.argv[1]=="fast": from xml import minidom dom = minidom else if sys.argv[1]=="complete": from xml import dom else if sys.argv[1]=="distributed": from 4thought import dom * Builder level: if sys.argv[1]=="4thought": from 4thought.dom import sax_builder() else: from xml.dom import sax_builder() xml.dom.FromXML( sax_builder() ) * Document level: if sys.argv[1]=="4thought": 4thought.dom.Gimme.a.document() else: xml.dom.I.need.a.document() document.doStuff() My preference is for "Builder level", I think. Portable helper functions could go into a universal xml.dom package instead of into each package. -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco Company spokeswoman Lana Simon stressed that Interactive Yoda is not a Furby. Well, not exactly. "This is an interactive toy that utilizes Furby technology," Simon said. "It will react to its surroundings and will talk." - http://www.wired.com/news/news/culture/story/19222.html From paul@prescod.net Mon Apr 26 21:10:36 1999 From: paul@prescod.net (Paul Prescod) Date: Mon, 26 Apr 1999 15:10:36 -0500 Subject: [XML-SIG] DOM API References: <199904251614.KAA07800@malatesta.local> Message-ID: <3724C83C.FA5080E@prescod.net> I'll have to think about some things more than I have time for right now. Other stuff: uche.ogbuji@fourthought.com wrote: > > > We might consider this for Namespace support for 4DOM, although we had been > planning to wait for W3C to jump, so that we could maintain > standards-compliance. Right now 4DOM just treats namespaces entirely > opaquely, i.e. ignores them. Maybe there is a way to add your above > suggestions to DOM.Ext. In response to Greg's comments, I'm starting to think that namespace processing should be a mode: either completely on or completely off. The complex, scoped namespaces mechanism is more the result of politics than technology -- this wasn't how namespaces were supposed to turn out. > > element.getText: returns a list of deep list of data from the text nodes. > > Do your own string.join to choose an appropriate join character. > > I'm not sure how useful this is if we omit the semantics of > nested elements. Actually, it gets a fair amount of use and is easy to implement. DSSSL, XSL and the grove paradigm all provide this feature. Consider:
This is the <CODE>XSL</CODE> introduction. ...
Now I'm generating a TOC, index or cross-reference. I don't care abou the CODE element -- I just want to treat it as if the tags doen't exist. I could go either way on this function, though. > I would see more use for a method that simply returns the XML text within an > element, including nested tags. That's a different feature that is also useful. > > element.getChild("FOO") returns the first child (not descendant) element > > with specified element type name. > > I've never had a need for such a method. I often need all such elements, in > which case I just use getElementsByTagName. I'm surprised that you've never needed it. In Greg's data-ish world it would be incredibly useful but also in the data-ish subsets of the document world. Blah... Blah... ... doc.documentElement.getChild( "METADATA" ).getChild( "AUTHOR" ) You can emulate this with getElementsByTagName but you incur the overhead of building and discarding the node list. > > element.getChild( "#PCDATA" ) gets a list of child text nodes. I've never needed this one, but Greg seems to...we'll let him defend it (here or in qp_api) when he gets back. -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco Company spokeswoman Lana Simon stressed that Interactive Yoda is not a Furby. Well, not exactly. "This is an interactive toy that utilizes Furby technology," Simon said. "It will react to its surroundings and will talk." - http://www.wired.com/news/news/culture/story/19222.html From akuchlin@cnri.reston.va.us Mon Apr 26 22:26:50 1999 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Mon, 26 Apr 1999 17:26:50 -0400 (EDT) Subject: [XML-SIG] Python DOM Unification -- level In-Reply-To: <3724CC49.AAB857A5@prescod.net> References: <3724CC49.AAB857A5@prescod.net> Message-ID: <14116.55422.189139.235663@amarok.cnri.reston.va.us> Paul Prescod writes: > * Builder level: > >if sys.argv[1]=="4thought": > from 4thought.dom import sax_builder() >else: > from xml.dom import sax_builder() I'd lean toward module-level, as long as it's understood that an implementation can add extra stuff to its module, but builder-level would also be acceptable. Note that there isn't that much top-level stuff required for a DOM module: exception codes, DOMException, the Node class and its subclasses, NodeList and NamedNodeMap, and a createDocument() function. createDocument is the only thing not specified by the DOM1 REC, so anyone implementing DOM will have some version of the above classes and objects. -- A.M. Kuchling http://starship.python.net/crew/amk/ The warning message we sent the Russians was a calculated ambiguity that would be clearly understood. -- Alexander Haig From paul@prescod.net Mon Apr 26 22:00:42 1999 From: paul@prescod.net (Paul Prescod) Date: Mon, 26 Apr 1999 16:00:42 -0500 Subject: [XML-SIG] Py-ish PySax Suggestion #2 Message-ID: <3724D3FA.2E589DB6@prescod.net> I would like to suggest that we copy the *mllib start_foo convention for PySAX. Here's what a HandlerBase.StartElement would look like for that: def startElement( self, tagname, attrs ): method = getattr( self, "start_"+tagname, None) if method: method( attrs ) else: self.startUnknownElement( tagname, attrs ) def endElement( self, tagname, attrs ): method = getattr( self, "end_"+tagname, None) if method: method() else: self.startUnknownElement( tagname ) def startUnknownElement( self, tagname, attrs ): pass def endUnknownElement( self, tagname ): pass -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco Company spokeswoman Lana Simon stressed that Interactive Yoda is not a Furby. Well, not exactly. "This is an interactive toy that utilizes Furby technology," Simon said. "It will react to its surroundings and will talk." - http://www.wired.com/news/news/culture/story/19222.html From paul@prescod.net Mon Apr 26 23:03:13 1999 From: paul@prescod.net (Paul Prescod) Date: Mon, 26 Apr 1999 17:03:13 -0500 Subject: [XML-SIG] Python DOM Unification -- level References: <3724CC49.AAB857A5@prescod.net> <14116.55422.189139.235663@amarok.cnri.reston.va.us> Message-ID: <3724E2A1.62223458@prescod.net> "Andrew M. Kuchling" wrote: > > Paul Prescod writes: > > * Builder level: > > > >if sys.argv[1]=="4thought": > > from 4thought.dom import sax_builder() > >else: > > from xml.dom import sax_builder() > > I'd lean toward module-level, as long as it's understood that > an implementation can add extra stuff to its module, but builder-level > would also be acceptable. Note that there isn't that much top-level > stuff required for a DOM module: exception codes, DOMException, the > Node class and its subclasses, NodeList and NamedNodeMap, and > a createDocument() function. Shouldn't the exception objects and class constants be shared between DOM implementations? Why do Node, NodeList and NamedNodeMap have to be top-level. Does it make sense for clients to construct them? -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco Company spokeswoman Lana Simon stressed that Interactive Yoda is not a Furby. Well, not exactly. "This is an interactive toy that utilizes Furby technology," Simon said. "It will react to its surroundings and will talk." - http://www.wired.com/news/news/culture/story/19222.html From paul@prescod.net Mon Apr 26 23:18:27 1999 From: paul@prescod.net (Paul Prescod) Date: Mon, 26 Apr 1999 17:18:27 -0500 Subject: [XML-SIG] qp API References: <3720DDB2.2483DD7F@prescod.net> <14115.35341.379579.36825@lindm.dm> Message-ID: <3724E633.58B2A1C1@prescod.net> Dieter Maurer wrote: > > But there is no difference in runtime behavior (O(N)), > whether the close() is explicite or implicite (i.e. because > the reference count reaches 0). Yeah, I realized that later. Python allows you to forget that it is doing a lot of work under the covers. Even so, close() is Python code and refcount cleanup is in the heart of the interpreter. > The real problem with an explicite close() are dangling > references. Assume, the application has a reference to > an inner node in the document tree. The close() would > probably remove all parent pointers from the subtree You wouldn't really have a dangling reference -- you would have a reference to a node that no longer knows its parent. But that is still not ideal. -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco Company spokeswoman Lana Simon stressed that Interactive Yoda is not a Furby. Well, not exactly. "This is an interactive toy that utilizes Furby technology," Simon said. "It will react to its surroundings and will talk." - http://www.wired.com/news/news/culture/story/19222.html From mike.olson@fourthought.com Tue Apr 27 00:45:05 1999 From: mike.olson@fourthought.com (Mike Olson) Date: Mon, 26 Apr 1999 18:45:05 -0500 Subject: [XML-SIG] DOM API References: <199904251614.KAA07800@malatesta.local> <3724C83C.FA5080E@prescod.net> Message-ID: <3724FA80.33D46BE0@fourthought.com> This is a cryptographically signed message in MIME format. --------------ms96FD3791096FB6D164818BD1 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Paul Prescod wrote: > > I've never had a need for such a method. I often need all such elements, in > > which case I just use getElementsByTagName. > > I'm surprised that you've never needed it. In Greg's data-ish world it > would be incredibly useful but also in the data-ish subsets of the > document world. > > > > Blah... > Blah... > > ... > > > doc.documentElement.getChild( "METADATA" ).getChild( "AUTHOR" ) > GetElementsByTagName does not stop at the current level, it will check its children, then their children, ... This was a big pain for us and I had to implement a getChildren type method. to me, that would be more useful then a getChild. > > -- > Paul Prescod - ISOGEN Consulting Engineer speaking for only himself > http://itrc.uwaterloo.ca/~papresco > > Company spokeswoman Lana Simon stressed that Interactive > Yoda is not a Furby. Well, not exactly. > > "This is an interactive toy that utilizes Furby technology," > Simon said. "It will react to its surroundings and will talk." > - http://www.wired.com/news/news/culture/story/19222.html > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Mike Olson Member Consultant FourThought LLC http://www.fourthought.com http://opentechnology.org --- "No program is interesting in itself to a programmer. It's only interesting as long as there are new challenges and new ideas coming up." --- Linus Torvalds --------------ms96FD3791096FB6D164818BD1 Content-Type: application/x-pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIIKmQYJKoZIhvcNAQcCoIIKijCCCoYCAQExCzAJBgUrDgMCGgUAMAsGCSqGSIb3DQEHAaCC CCUwggTvMIIEWKADAgECAhAOCY8cYeSQOObs5zKyDmWRMA0GCSqGSIb3DQEBBAUAMIHMMRcw FQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UECxMWVmVyaVNpZ24gVHJ1c3QgTmV0d29y azFGMEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9yZXBvc2l0b3J5L1JQQSBJbmNvcnAuIEJ5 IFJlZi4sTElBQi5MVEQoYyk5ODFIMEYGA1UEAxM/VmVyaVNpZ24gQ2xhc3MgMSBDQSBJbmRp dmlkdWFsIFN1YnNjcmliZXItUGVyc29uYSBOb3QgVmFsaWRhdGVkMB4XDTk5MDMwNTAwMDAw MFoXDTk5MDUwNDIzNTk1OVowggEKMRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UE CxMWVmVyaVNpZ24gVHJ1c3QgTmV0d29yazFGMEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9y ZXBvc2l0b3J5L1JQQSBJbmNvcnAuIGJ5IFJlZi4sTElBQi5MVEQoYyk5ODEeMBwGA1UECxMV UGVyc29uYSBOb3QgVmFsaWRhdGVkMSYwJAYDVQQLEx1EaWdpdGFsIElEIENsYXNzIDEgLSBO ZXRzY2FwZTETMBEGA1UEAxQKTWlrZSBPbHNvbjEpMCcGCSqGSIb3DQEJARYabWlrZS5vbHNv bkBmb3VydGhvdWdodC5jb20wgZ8wDQYJKoZIhvcNAQEBBQADgY0AMIGJAoGBANKGswZUnQ/B IfNlZWIIy6G6AkyjYgPRhXynebPtI5ARMq9xDo2zgLgWE+8QffdoZp2hUnTpm63B6cG8yqH1 PnA/7SB2roIfml1vnOwXgNuBctciTmnrac4GWgL0CM9839fJZh47QIVYPlCbOPtnvnH1NGGD jFWAVX7vmES72Dl9AgMBAAGjggGPMIIBizAJBgNVHRMEAjAAMIGsBgNVHSAEgaQwgaEwgZ4G C2CGSAGG+EUBBwEBMIGOMCgGCCsGAQUFBwIBFhxodHRwczovL3d3dy52ZXJpc2lnbi5jb20v Q1BTMGIGCCsGAQUFBwICMFYwFRYOVmVyaVNpZ24sIEluYy4wAwIBARo9VmVyaVNpZ24ncyBD UFMgaW5jb3JwLiBieSByZWZlcmVuY2UgbGlhYi4gbHRkLiAoYyk5NyBWZXJpU2lnbjARBglg hkgBhvhCAQEEBAMCB4AwgYYGCmCGSAGG+EUBBgMEeBZ2ZDQ2NTJiZDYzZjIwNDcwMjkyOTg3 NjNjOWQyZjI3NTA2OWM3MzU5YmVkMWIwNTlkYTc1YmM0YmM5NzAxNzQ3ZGE1ZDNmMjE0MWJl YWRiMmJkMmU4OTIxM2FlNmFmOWRmMTE0OTk5YTNiODQ1ZjlmM2VhNDUwYzAzBgNVHR8ELDAq MCigJqAkhiJodHRwOi8vY3JsLnZlcmlzaWduLmNvbS9jbGFzczEuY3JsMA0GCSqGSIb3DQEB BAUAA4GBAIuxBeIOBMHbj5yM/Vu4UJxDcz4Xtc7h0K8c6d82SiwwKLN5Gbew69PevcN6Ak+p D8LO4NyCH8Cfu3acoT0Efi99XjWvdi2eSbDJUw6MvgJtnAfY03zM+Cf31A/1iyrvr3hD45/c yhUNRh8f6qX1NzeKvvh5AcYD1bsi+0wnP0D8MIIDLjCCApegAwIBAgIRANJ2Lo0UDD19sqgl Xa/uDXUwDQYJKoZIhvcNAQECBQAwXzELMAkGA1UEBhMCVVMxFzAVBgNVBAoTDlZlcmlTaWdu LCBJbmMuMTcwNQYDVQQLEy5DbGFzcyAxIFB1YmxpYyBQcmltYXJ5IENlcnRpZmljYXRpb24g QXV0aG9yaXR5MB4XDTk4MDUxMjAwMDAwMFoXDTA4MDUxMjIzNTk1OVowgcwxFzAVBgNVBAoT DlZlcmlTaWduLCBJbmMuMR8wHQYDVQQLExZWZXJpU2lnbiBUcnVzdCBOZXR3b3JrMUYwRAYD VQQLEz13d3cudmVyaXNpZ24uY29tL3JlcG9zaXRvcnkvUlBBIEluY29ycC4gQnkgUmVmLixM SUFCLkxURChjKTk4MUgwRgYDVQQDEz9WZXJpU2lnbiBDbGFzcyAxIENBIEluZGl2aWR1YWwg U3Vic2NyaWJlci1QZXJzb25hIE5vdCBWYWxpZGF0ZWQwgZ8wDQYJKoZIhvcNAQEBBQADgY0A MIGJAoGBALtaRIoEFrtV/QN6ii2UTxV4NrgNSrJvnFS/vOh3Kp258Gi7ldkxQXB6gUu5SBNW LccI4YRCq8CikqtEXKpC8IIOAukv+8I7u77JJwpdtrA2QjO1blSIT4dKvxna+RXoD4e2HOPM xpqOf2okkuP84GW6p7F+78nbN2rISsgJBuSZAgMBAAGjfDB6MBEGCWCGSAGG+EIBAQQEAwIB BjBHBgNVHSAEQDA+MDwGC2CGSAGG+EUBBwEBMC0wKwYIKwYBBQUHAgEWH3d3dy52ZXJpc2ln bi5jb20vcmVwb3NpdG9yeS9SUEEwDwYDVR0TBAgwBgEB/wIBADALBgNVHQ8EBAMCAQYwDQYJ KoZIhvcNAQECBQADgYEAiLg3O93alDcAraqf4YEBcR6Sam0v9vGd08pkONwbmAwHhluFFWoP uUmFpJXxF31ntH8tLN2aQp7DPrSOquULBt7yVir6M8e+GddTTMO9yOMXtaRJQmPswqYXD11Y Gkk8kFxVo2UgAP0YIOVfgqaxqJLFWGrBjQM868PNBaKQrm4xggI8MIICOAIBATCB4TCBzDEX MBUGA1UEChMOVmVyaVNpZ24sIEluYy4xHzAdBgNVBAsTFlZlcmlTaWduIFRydXN0IE5ldHdv cmsxRjBEBgNVBAsTPXd3dy52ZXJpc2lnbi5jb20vcmVwb3NpdG9yeS9SUEEgSW5jb3JwLiBC eSBSZWYuLExJQUIuTFREKGMpOTgxSDBGBgNVBAMTP1ZlcmlTaWduIENsYXNzIDEgQ0EgSW5k aXZpZHVhbCBTdWJzY3JpYmVyLVBlcnNvbmEgTm90IFZhbGlkYXRlZAIQDgmPHGHkkDjm7Ocy sg5lkTAJBgUrDgMCGgUAoIGxMBgGCSqGSIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcN AQkFMQ8XDTk5MDQyNjIzNDUwNVowIwYJKoZIhvcNAQkEMRYEFFBPCLGyr3US/EPMH41iiCme 704WMFIGCSqGSIb3DQEJDzFFMEMwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMAcGBSsO AwIHMA0GCCqGSIb3DQMCAgFAMA0GCCqGSIb3DQMCAgEoMA0GCSqGSIb3DQEBAQUABIGAGiI7 YFtTB1Kh1qV6MrrJ2ecQXCR6HlLp4XhykK12bNsmA5et5GNvcquuX8fMJjPAQ8BCwUqIseto ANS/5Xe37rh9j9IBQLD6YkhUfkRQIa/hft0J8sQxsiwgIuLG3amjmM4cRkkFxxwLbpiW+W4P p4lqvrUHqJnSxQR9QiF43Tw= --------------ms96FD3791096FB6D164818BD1-- From mike.olson@fourthought.com Tue Apr 27 00:56:12 1999 From: mike.olson@fourthought.com (Mike Olson) Date: Mon, 26 Apr 1999 18:56:12 -0500 Subject: [XML-SIG] Python DOM Unification -- level References: <3724CC49.AAB857A5@prescod.net> Message-ID: <3724FD1C.86EBD9E6@fourthought.com> This is a cryptographically signed message in MIME format. --------------msC31DF1719CA3CA2D111E3B60 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit I would say at the Builder Level, but handle it differently then you suggest. if sys.argv[1] == '4th' : fac = 4dom.Ext.Factory builder = 4dom.Ext.Builder elif sys.argv[1] == 'pydom': fac = pydom.factory builder = pydom.builder else fac = minidom.fac builder = minidom.builder doc = fac.CreateDocument(); doc = builder.XMLFromURL('www.fourthought.com') where the factory interface defines everything that is not creatable from a document. interface DOMFactory { Document CreateDocument(); HTMLDocument CreateHTMLDocument(); DocType CreateDocType(); NodeList CreateNodeList(in sequence); ... }; and builder defines an interface for creating documents from different streams interface DOMBuilder { Document FromXMLFile(in string URL); HTMLDocument FromHTMLFile(in string URL); Document FromXMLString(in string XML); ... }; Mike Paul Prescod wrote: > Following are some meta-questions on the proposed Python DOM unification. > > First, what is the appropriate level of unification? > > * Module level: > > if sys.argv[1]=="fast": > from xml import minidom > dom = minidom > else if sys.argv[1]=="complete": > from xml import dom > else if sys.argv[1]=="distributed": > from 4thought import dom > > * Builder level: > > if sys.argv[1]=="4thought": > from 4thought.dom import sax_builder() > else: > from xml.dom import sax_builder() > > xml.dom.FromXML( sax_builder() ) > > * Document level: > > if sys.argv[1]=="4thought": > 4thought.dom.Gimme.a.document() > else: > xml.dom.I.need.a.document() > > document.doStuff() > > My preference is for "Builder level", I think. Portable helper functions > could go into a universal xml.dom package instead of into each package. > > -- > Paul Prescod - ISOGEN Consulting Engineer speaking for only himself > http://itrc.uwaterloo.ca/~papresco > > Company spokeswoman Lana Simon stressed that Interactive > Yoda is not a Furby. Well, not exactly. > > "This is an interactive toy that utilizes Furby technology," > Simon said. "It will react to its surroundings and will talk." > - http://www.wired.com/news/news/culture/story/19222.html > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Mike Olson Member Consultant FourThought LLC http://www.fourthought.com http://opentechnology.org --- "No program is interesting in itself to a programmer. It's only interesting as long as there are new challenges and new ideas coming up." --- Linus Torvalds --------------msC31DF1719CA3CA2D111E3B60 Content-Type: application/x-pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIIKmQYJKoZIhvcNAQcCoIIKijCCCoYCAQExCzAJBgUrDgMCGgUAMAsGCSqGSIb3DQEHAaCC CCUwggTvMIIEWKADAgECAhAOCY8cYeSQOObs5zKyDmWRMA0GCSqGSIb3DQEBBAUAMIHMMRcw FQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UECxMWVmVyaVNpZ24gVHJ1c3QgTmV0d29y azFGMEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9yZXBvc2l0b3J5L1JQQSBJbmNvcnAuIEJ5 IFJlZi4sTElBQi5MVEQoYyk5ODFIMEYGA1UEAxM/VmVyaVNpZ24gQ2xhc3MgMSBDQSBJbmRp dmlkdWFsIFN1YnNjcmliZXItUGVyc29uYSBOb3QgVmFsaWRhdGVkMB4XDTk5MDMwNTAwMDAw MFoXDTk5MDUwNDIzNTk1OVowggEKMRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UE CxMWVmVyaVNpZ24gVHJ1c3QgTmV0d29yazFGMEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9y ZXBvc2l0b3J5L1JQQSBJbmNvcnAuIGJ5IFJlZi4sTElBQi5MVEQoYyk5ODEeMBwGA1UECxMV UGVyc29uYSBOb3QgVmFsaWRhdGVkMSYwJAYDVQQLEx1EaWdpdGFsIElEIENsYXNzIDEgLSBO ZXRzY2FwZTETMBEGA1UEAxQKTWlrZSBPbHNvbjEpMCcGCSqGSIb3DQEJARYabWlrZS5vbHNv bkBmb3VydGhvdWdodC5jb20wgZ8wDQYJKoZIhvcNAQEBBQADgY0AMIGJAoGBANKGswZUnQ/B IfNlZWIIy6G6AkyjYgPRhXynebPtI5ARMq9xDo2zgLgWE+8QffdoZp2hUnTpm63B6cG8yqH1 PnA/7SB2roIfml1vnOwXgNuBctciTmnrac4GWgL0CM9839fJZh47QIVYPlCbOPtnvnH1NGGD jFWAVX7vmES72Dl9AgMBAAGjggGPMIIBizAJBgNVHRMEAjAAMIGsBgNVHSAEgaQwgaEwgZ4G C2CGSAGG+EUBBwEBMIGOMCgGCCsGAQUFBwIBFhxodHRwczovL3d3dy52ZXJpc2lnbi5jb20v Q1BTMGIGCCsGAQUFBwICMFYwFRYOVmVyaVNpZ24sIEluYy4wAwIBARo9VmVyaVNpZ24ncyBD UFMgaW5jb3JwLiBieSByZWZlcmVuY2UgbGlhYi4gbHRkLiAoYyk5NyBWZXJpU2lnbjARBglg hkgBhvhCAQEEBAMCB4AwgYYGCmCGSAGG+EUBBgMEeBZ2ZDQ2NTJiZDYzZjIwNDcwMjkyOTg3 NjNjOWQyZjI3NTA2OWM3MzU5YmVkMWIwNTlkYTc1YmM0YmM5NzAxNzQ3ZGE1ZDNmMjE0MWJl YWRiMmJkMmU4OTIxM2FlNmFmOWRmMTE0OTk5YTNiODQ1ZjlmM2VhNDUwYzAzBgNVHR8ELDAq MCigJqAkhiJodHRwOi8vY3JsLnZlcmlzaWduLmNvbS9jbGFzczEuY3JsMA0GCSqGSIb3DQEB BAUAA4GBAIuxBeIOBMHbj5yM/Vu4UJxDcz4Xtc7h0K8c6d82SiwwKLN5Gbew69PevcN6Ak+p D8LO4NyCH8Cfu3acoT0Efi99XjWvdi2eSbDJUw6MvgJtnAfY03zM+Cf31A/1iyrvr3hD45/c yhUNRh8f6qX1NzeKvvh5AcYD1bsi+0wnP0D8MIIDLjCCApegAwIBAgIRANJ2Lo0UDD19sqgl Xa/uDXUwDQYJKoZIhvcNAQECBQAwXzELMAkGA1UEBhMCVVMxFzAVBgNVBAoTDlZlcmlTaWdu LCBJbmMuMTcwNQYDVQQLEy5DbGFzcyAxIFB1YmxpYyBQcmltYXJ5IENlcnRpZmljYXRpb24g QXV0aG9yaXR5MB4XDTk4MDUxMjAwMDAwMFoXDTA4MDUxMjIzNTk1OVowgcwxFzAVBgNVBAoT DlZlcmlTaWduLCBJbmMuMR8wHQYDVQQLExZWZXJpU2lnbiBUcnVzdCBOZXR3b3JrMUYwRAYD VQQLEz13d3cudmVyaXNpZ24uY29tL3JlcG9zaXRvcnkvUlBBIEluY29ycC4gQnkgUmVmLixM SUFCLkxURChjKTk4MUgwRgYDVQQDEz9WZXJpU2lnbiBDbGFzcyAxIENBIEluZGl2aWR1YWwg U3Vic2NyaWJlci1QZXJzb25hIE5vdCBWYWxpZGF0ZWQwgZ8wDQYJKoZIhvcNAQEBBQADgY0A MIGJAoGBALtaRIoEFrtV/QN6ii2UTxV4NrgNSrJvnFS/vOh3Kp258Gi7ldkxQXB6gUu5SBNW LccI4YRCq8CikqtEXKpC8IIOAukv+8I7u77JJwpdtrA2QjO1blSIT4dKvxna+RXoD4e2HOPM xpqOf2okkuP84GW6p7F+78nbN2rISsgJBuSZAgMBAAGjfDB6MBEGCWCGSAGG+EIBAQQEAwIB BjBHBgNVHSAEQDA+MDwGC2CGSAGG+EUBBwEBMC0wKwYIKwYBBQUHAgEWH3d3dy52ZXJpc2ln bi5jb20vcmVwb3NpdG9yeS9SUEEwDwYDVR0TBAgwBgEB/wIBADALBgNVHQ8EBAMCAQYwDQYJ KoZIhvcNAQECBQADgYEAiLg3O93alDcAraqf4YEBcR6Sam0v9vGd08pkONwbmAwHhluFFWoP uUmFpJXxF31ntH8tLN2aQp7DPrSOquULBt7yVir6M8e+GddTTMO9yOMXtaRJQmPswqYXD11Y Gkk8kFxVo2UgAP0YIOVfgqaxqJLFWGrBjQM868PNBaKQrm4xggI8MIICOAIBATCB4TCBzDEX MBUGA1UEChMOVmVyaVNpZ24sIEluYy4xHzAdBgNVBAsTFlZlcmlTaWduIFRydXN0IE5ldHdv cmsxRjBEBgNVBAsTPXd3dy52ZXJpc2lnbi5jb20vcmVwb3NpdG9yeS9SUEEgSW5jb3JwLiBC eSBSZWYuLExJQUIuTFREKGMpOTgxSDBGBgNVBAMTP1ZlcmlTaWduIENsYXNzIDEgQ0EgSW5k aXZpZHVhbCBTdWJzY3JpYmVyLVBlcnNvbmEgTm90IFZhbGlkYXRlZAIQDgmPHGHkkDjm7Ocy sg5lkTAJBgUrDgMCGgUAoIGxMBgGCSqGSIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcN AQkFMQ8XDTk5MDQyNjIzNTYxMlowIwYJKoZIhvcNAQkEMRYEFKZtvLPMBvW6c58NCO22YlMa n7dPMFIGCSqGSIb3DQEJDzFFMEMwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMAcGBSsO AwIHMA0GCCqGSIb3DQMCAgFAMA0GCCqGSIb3DQMCAgEoMA0GCSqGSIb3DQEBAQUABIGAKvEI 6APY8OxYoevk4dGtnj/Kgwn7NzADyvgm56WjIWYDmQbGGJQlrH75Cbi5uUeCcP1vp1kyEs3+ SskBHi9/pPa/fQxiaLzb+166W2fbwne6pu1cbAiM86Svp8YuKDiYDMEbtbjQDlWYJXrjc+19 cAKREOcbyiGxQEV/7cGrA/Q= --------------msC31DF1719CA3CA2D111E3B60-- From uche.ogbuji@fourthought.com Tue Apr 27 06:47:01 1999 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Mon, 26 Apr 1999 23:47:01 -0600 Subject: [XML-SIG] Python DOM Unification -- level In-Reply-To: Your message of "Mon, 26 Apr 1999 15:27:53 CDT." <3724CC49.AAB857A5@prescod.net> Message-ID: <199904270547.XAA09432@malatesta.local> > Following are some meta-questions on the proposed Python DOM unification. > > First, what is the appropriate level of unification? > > * Module level: > > if sys.argv[1]=="fast": > from xml import minidom > dom = minidom > else if sys.argv[1]=="complete": > from xml import dom > else if sys.argv[1]=="distributed": > from 4thought import dom Hmm. The last line would throw an exception. We have thought a bit about packaging for 4DOM: currently we use "DOM" as top level, but we understand that this might not play nicely with other DOM libs in the path. > * Builder level: > > if sys.argv[1]=="4thought": > from 4thought.dom import sax_builder() > else: > from xml.dom import sax_builder() > > xml.dom.FromXML( sax_builder() ) > > * Document level: > > if sys.argv[1]=="4thought": > 4thought.dom.Gimme.a.document() > else: > xml.dom.I.need.a.document() > > document.doStuff() > > My preference is for "Builder level", I think. Portable helper functions > could go into a universal xml.dom package instead of into each package. Agreed. Each implementation would know how to build its own concrete objects, and the unified interface (if we're able to pull that off) will allow transparent manipulation of heterogenous nodes within an app. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From uche.ogbuji@fourthought.com Tue Apr 27 06:56:19 1999 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Mon, 26 Apr 1999 23:56:19 -0600 Subject: [XML-SIG] Py-ish PySax Suggestion #2 In-Reply-To: Your message of "Mon, 26 Apr 1999 16:00:42 CDT." <3724D3FA.2E589DB6@prescod.net> Message-ID: <199904270556.XAA09446@malatesta.local> > I would like to suggest that we copy the *mllib start_foo convention for > PySAX. Here's what a HandlerBase.StartElement would look like for that: > > def startElement( self, tagname, attrs ): > method = getattr( self, "start_"+tagname, None) > if method: > method( attrs ) > else: > self.startUnknownElement( tagname, attrs ) > > def endElement( self, tagname, attrs ): > method = getattr( self, "end_"+tagname, None) > if method: > method() > else: > self.startUnknownElement( tagname ) > > > def startUnknownElement( self, tagname, attrs ): > pass > > def endUnknownElement( self, tagname ): > pass I don't have a big problem with this, but I'll bet it gives fits to those about these parts who are very concerned with every last bit of run-time speed. And indeed, since this is so easily achieved under the current PySAX, is there really a need to enforce the meta-programming? -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From uche.ogbuji@fourthought.com Tue Apr 27 07:02:57 1999 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Tue, 27 Apr 1999 00:02:57 -0600 Subject: [XML-SIG] Python DOM Unification -- level In-Reply-To: Your message of "Mon, 26 Apr 1999 18:56:12 CDT." <3724FD1C.86EBD9E6@fourthought.com> Message-ID: <199904270602.AAA09467@malatesta.local> > I would say at the Builder Level, but handle it differently then you suggest. > > if sys.argv[1] == '4th' : > fac = 4dom.Ext.Factory > builder = 4dom.Ext.Builder > elif sys.argv[1] == 'pydom': > fac = pydom.factory > builder = pydom.builder > else > fac = minidom.fac > builder = minidom.builder > > doc = fac.CreateDocument(); > doc = builder.XMLFromURL('www.fourthought.com') Et tu, Mikhail? Code that won't run? (See lines 2 and 3). And furthermore, I know that we do plan to put up the XML source for www.fourthought.com one of these days when browsers are sane, but won't that last line produce some funky results just now? > where the factory interface defines everything that is not creatable from a > document. > > interface DOMFactory { > > Document CreateDocument(); > HTMLDocument CreateHTMLDocument(); > DocType CreateDocType(); > NodeList CreateNodeList(in sequence); > ... > }; > > and builder defines an interface for creating documents from different > streams > > interface DOMBuilder { > > Document FromXMLFile(in string URL); > HTMLDocument FromHTMLFile(in string URL); > Document FromXMLString(in string XML); > ... > }; Of course, some may say I'm biased, but I think this is the strongest proposal. It also dovetails with those who have been calling for a PyDOM factory interface. The main problem I anticipate is that Paul might consider adding a factory to minidom a bit contrary to the "mini" idea. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From paul@prescod.net Tue Apr 27 10:08:20 1999 From: paul@prescod.net (Paul Prescod) Date: Tue, 27 Apr 1999 04:08:20 -0500 Subject: [XML-SIG] Py-ish PySax Suggestion #2 References: <199904270556.XAA09446@malatesta.local> Message-ID: <37257E84.CBE66B@prescod.net> uche.ogbuji@fourthought.com wrote: > > > I don't have a big problem with this, but I'll bet it gives fits to those > about these parts who are very concerned with every last bit of run-time speed. For better or worse I think those people have already abandoned SAX. Actually, the proposal doesn't slow anything down: if you need the speed of a single startElement method, you just override it and go. Existing SAX clients should be exactly as fast as they are today. > And indeed, since this is so easily achieved under the current PySAX, is there > really a need to enforce the meta-programming? It isn't so much enforcing it as making it accessible and "standard." It can help usability to make common idioms a part of the library or even language. -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "Microsoft spokesman Ian Hatton admits that the Linux system would have performed better had it been tuned." "Future press releases on the issue will clearly state that the research was sponsored by Microsoft." http://www.itweb.co.za/sections/enterprise/1999/9904221410.asp From akuchlin@cnri.reston.va.us Tue Apr 27 14:16:41 1999 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Tue, 27 Apr 1999 09:16:41 -0400 (EDT) Subject: [XML-SIG] Python DOM Unification -- level In-Reply-To: <3724E2A1.62223458@prescod.net> References: <3724CC49.AAB857A5@prescod.net> <14116.55422.189139.235663@amarok.cnri.reston.va.us> <3724E2A1.62223458@prescod.net> Message-ID: <14117.46989.519563.210317@amarok.cnri.reston.va.us> Paul Prescod writes: >Shouldn't the exception objects and class constants be shared between DOM >implementations? Good point; they could be, I suppose. >Why do Node, NodeList and NamedNodeMap have to be top-level. Does it make >sense for clients to construct them? For code like "if isinstance(obj, Node):..."; otherwise you'd have no way of telling when a class instance is in fact a DOM node. I suppose you could do without NodeList and NamedNodeMap -- they should simply resemble lists and dictionaries -- but Node is probably required. -- A.M. Kuchling http://starship.python.net/crew/amk/ No doubt, a scientist isn't necessarily penalized for being a complex, versatile, eccentric individual with lots of extra-scientific interests. But it certainly doesn't help him a bit. -- Stephen Toulmin From uche.ogbuji@fourthought.com Tue Apr 27 14:35:42 1999 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Tue, 27 Apr 1999 07:35:42 -0600 Subject: [XML-SIG] Py-ish PySax Suggestion #2 In-Reply-To: Your message of "Tue, 27 Apr 1999 04:08:20 CDT." <37257E84.CBE66B@prescod.net> Message-ID: <199904271335.HAA10015@malatesta.local> Paul Prescod: > > I don't have a big problem with this, but I'll bet it gives fits to those > > about these parts who are very concerned with every last bit of run-time speed. > > For better or worse I think those people have already abandoned SAX. > Actually, the proposal doesn't slow anything down: if you need the speed > of a single startElement method, you just override it and go. Existing SAX > clients should be exactly as fast as they are today. > > > And indeed, since this is so easily achieved under the current PySAX, is there > > really a need to enforce the meta-programming? > > It isn't so much enforcing it as making it accessible and "standard." It > can help usability to make common idioms a part of the library or even > language. All true. And given that, I do think it's a useful conventional idiom for many SAX apps, excluding DOM building, of course. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From Fred L. Drake, Jr." References: <3724D3FA.2E589DB6@prescod.net> Message-ID: <14117.52095.618817.525406@weyr.cnri.reston.va.us> Paul Prescod writes: > I would like to suggest that we copy the *mllib start_foo convention for > PySAX. Here's what a HandlerBase.StartElement would look like for that: It was really fun to try to build just this on top of Java SAX; that was my first real experience with Java reflection! ;-) I think this makes a lot of sense for use without namespaces, but not with namespaces. (I'm not a fan of the "" namespace.) Perhaps the startElement() and endElement() should be implemented as a subclass or filter? It may even be reasonable to have it as SAX2 "feature" that can be tested or requested. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From Fred L. Drake, Jr." References: <3724CC49.AAB857A5@prescod.net> <14116.55422.189139.235663@amarok.cnri.reston.va.us> <3724E2A1.62223458@prescod.net> Message-ID: <14117.52230.551462.836651@weyr.cnri.reston.va.us> Paul Prescod writes: > Shouldn't the exception objects and class constants be shared between DOM > implementations? Absolutely! > Why do Node, NodeList and NamedNodeMap have to be top-level. Does it make > sense for clients to construct them? No, but you knew that before I did. ;-) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From mike.olson@fourthought.com Tue Apr 27 16:05:41 1999 From: mike.olson@fourthought.com (Mike Olson) Date: Tue, 27 Apr 1999 10:05:41 -0500 Subject: [XML-SIG] Python DOM Unification -- level References: <3724CC49.AAB857A5@prescod.net> <14116.55422.189139.235663@amarok.cnri.reston.va.us> <3724E2A1.62223458@prescod.net> <14117.52230.551462.836651@weyr.cnri.reston.va.us> Message-ID: <3725D245.C703951B@fourthought.com> This is a cryptographically signed message in MIME format. --------------ms821B3B6B4F987290D459635B Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit "Fred L. Drake" wrote: > > > Why do Node, NodeList and NamedNodeMap have to be top-level. Does it make > > sense for clients to construct them? > > No, but you knew that before I did. ;-) > > Node, no, but NodeList and NamedNodeMap are just containers and I see no reason why a client should not be able to create them. Maybe they are doing some post processing ontop of the DOM but want to keep a DOMish interface. Then they will need to create NodeLists and NamedNodeMaps to repackage nodes. Mike > > -- > Fred L. Drake, Jr. > Corporation for National Research Initiatives > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Mike Olson Member Consultant FourThought LLC http://www.fourthought.com http://opentechnology.org --- "No program is interesting in itself to a programmer. It's only interesting as long as there are new challenges and new ideas coming up." --- Linus Torvalds --------------ms821B3B6B4F987290D459635B Content-Type: application/x-pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIIKmQYJKoZIhvcNAQcCoIIKijCCCoYCAQExCzAJBgUrDgMCGgUAMAsGCSqGSIb3DQEHAaCC CCUwggTvMIIEWKADAgECAhAOCY8cYeSQOObs5zKyDmWRMA0GCSqGSIb3DQEBBAUAMIHMMRcw FQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UECxMWVmVyaVNpZ24gVHJ1c3QgTmV0d29y azFGMEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9yZXBvc2l0b3J5L1JQQSBJbmNvcnAuIEJ5 IFJlZi4sTElBQi5MVEQoYyk5ODFIMEYGA1UEAxM/VmVyaVNpZ24gQ2xhc3MgMSBDQSBJbmRp dmlkdWFsIFN1YnNjcmliZXItUGVyc29uYSBOb3QgVmFsaWRhdGVkMB4XDTk5MDMwNTAwMDAw MFoXDTk5MDUwNDIzNTk1OVowggEKMRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UE CxMWVmVyaVNpZ24gVHJ1c3QgTmV0d29yazFGMEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9y ZXBvc2l0b3J5L1JQQSBJbmNvcnAuIGJ5IFJlZi4sTElBQi5MVEQoYyk5ODEeMBwGA1UECxMV UGVyc29uYSBOb3QgVmFsaWRhdGVkMSYwJAYDVQQLEx1EaWdpdGFsIElEIENsYXNzIDEgLSBO ZXRzY2FwZTETMBEGA1UEAxQKTWlrZSBPbHNvbjEpMCcGCSqGSIb3DQEJARYabWlrZS5vbHNv bkBmb3VydGhvdWdodC5jb20wgZ8wDQYJKoZIhvcNAQEBBQADgY0AMIGJAoGBANKGswZUnQ/B IfNlZWIIy6G6AkyjYgPRhXynebPtI5ARMq9xDo2zgLgWE+8QffdoZp2hUnTpm63B6cG8yqH1 PnA/7SB2roIfml1vnOwXgNuBctciTmnrac4GWgL0CM9839fJZh47QIVYPlCbOPtnvnH1NGGD jFWAVX7vmES72Dl9AgMBAAGjggGPMIIBizAJBgNVHRMEAjAAMIGsBgNVHSAEgaQwgaEwgZ4G C2CGSAGG+EUBBwEBMIGOMCgGCCsGAQUFBwIBFhxodHRwczovL3d3dy52ZXJpc2lnbi5jb20v Q1BTMGIGCCsGAQUFBwICMFYwFRYOVmVyaVNpZ24sIEluYy4wAwIBARo9VmVyaVNpZ24ncyBD UFMgaW5jb3JwLiBieSByZWZlcmVuY2UgbGlhYi4gbHRkLiAoYyk5NyBWZXJpU2lnbjARBglg hkgBhvhCAQEEBAMCB4AwgYYGCmCGSAGG+EUBBgMEeBZ2ZDQ2NTJiZDYzZjIwNDcwMjkyOTg3 NjNjOWQyZjI3NTA2OWM3MzU5YmVkMWIwNTlkYTc1YmM0YmM5NzAxNzQ3ZGE1ZDNmMjE0MWJl YWRiMmJkMmU4OTIxM2FlNmFmOWRmMTE0OTk5YTNiODQ1ZjlmM2VhNDUwYzAzBgNVHR8ELDAq MCigJqAkhiJodHRwOi8vY3JsLnZlcmlzaWduLmNvbS9jbGFzczEuY3JsMA0GCSqGSIb3DQEB BAUAA4GBAIuxBeIOBMHbj5yM/Vu4UJxDcz4Xtc7h0K8c6d82SiwwKLN5Gbew69PevcN6Ak+p D8LO4NyCH8Cfu3acoT0Efi99XjWvdi2eSbDJUw6MvgJtnAfY03zM+Cf31A/1iyrvr3hD45/c yhUNRh8f6qX1NzeKvvh5AcYD1bsi+0wnP0D8MIIDLjCCApegAwIBAgIRANJ2Lo0UDD19sqgl Xa/uDXUwDQYJKoZIhvcNAQECBQAwXzELMAkGA1UEBhMCVVMxFzAVBgNVBAoTDlZlcmlTaWdu LCBJbmMuMTcwNQYDVQQLEy5DbGFzcyAxIFB1YmxpYyBQcmltYXJ5IENlcnRpZmljYXRpb24g QXV0aG9yaXR5MB4XDTk4MDUxMjAwMDAwMFoXDTA4MDUxMjIzNTk1OVowgcwxFzAVBgNVBAoT DlZlcmlTaWduLCBJbmMuMR8wHQYDVQQLExZWZXJpU2lnbiBUcnVzdCBOZXR3b3JrMUYwRAYD VQQLEz13d3cudmVyaXNpZ24uY29tL3JlcG9zaXRvcnkvUlBBIEluY29ycC4gQnkgUmVmLixM SUFCLkxURChjKTk4MUgwRgYDVQQDEz9WZXJpU2lnbiBDbGFzcyAxIENBIEluZGl2aWR1YWwg U3Vic2NyaWJlci1QZXJzb25hIE5vdCBWYWxpZGF0ZWQwgZ8wDQYJKoZIhvcNAQEBBQADgY0A MIGJAoGBALtaRIoEFrtV/QN6ii2UTxV4NrgNSrJvnFS/vOh3Kp258Gi7ldkxQXB6gUu5SBNW LccI4YRCq8CikqtEXKpC8IIOAukv+8I7u77JJwpdtrA2QjO1blSIT4dKvxna+RXoD4e2HOPM xpqOf2okkuP84GW6p7F+78nbN2rISsgJBuSZAgMBAAGjfDB6MBEGCWCGSAGG+EIBAQQEAwIB BjBHBgNVHSAEQDA+MDwGC2CGSAGG+EUBBwEBMC0wKwYIKwYBBQUHAgEWH3d3dy52ZXJpc2ln bi5jb20vcmVwb3NpdG9yeS9SUEEwDwYDVR0TBAgwBgEB/wIBADALBgNVHQ8EBAMCAQYwDQYJ KoZIhvcNAQECBQADgYEAiLg3O93alDcAraqf4YEBcR6Sam0v9vGd08pkONwbmAwHhluFFWoP uUmFpJXxF31ntH8tLN2aQp7DPrSOquULBt7yVir6M8e+GddTTMO9yOMXtaRJQmPswqYXD11Y Gkk8kFxVo2UgAP0YIOVfgqaxqJLFWGrBjQM868PNBaKQrm4xggI8MIICOAIBATCB4TCBzDEX MBUGA1UEChMOVmVyaVNpZ24sIEluYy4xHzAdBgNVBAsTFlZlcmlTaWduIFRydXN0IE5ldHdv cmsxRjBEBgNVBAsTPXd3dy52ZXJpc2lnbi5jb20vcmVwb3NpdG9yeS9SUEEgSW5jb3JwLiBC eSBSZWYuLExJQUIuTFREKGMpOTgxSDBGBgNVBAMTP1ZlcmlTaWduIENsYXNzIDEgQ0EgSW5k aXZpZHVhbCBTdWJzY3JpYmVyLVBlcnNvbmEgTm90IFZhbGlkYXRlZAIQDgmPHGHkkDjm7Ocy sg5lkTAJBgUrDgMCGgUAoIGxMBgGCSqGSIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcN AQkFMQ8XDTk5MDQyNzE1MDU0MVowIwYJKoZIhvcNAQkEMRYEFLV/71+3jfIQ9IC1wp5m1ezg G0tvMFIGCSqGSIb3DQEJDzFFMEMwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMAcGBSsO AwIHMA0GCCqGSIb3DQMCAgFAMA0GCCqGSIb3DQMCAgEoMA0GCSqGSIb3DQEBAQUABIGAnGOa eUNXqozpxLVVjDe3kxdUBIVfLXV2L+TnbAPclCUysq+CUwXNLamZ6ruv1gbjwDecFNdodkuW MvMCadiS+TPhUFMtdsq3Klrpfnf4fOWzXhyAu76Fh9XYKRVqyYmO+BMJdWLcTXoE6ADz6kTO xkcCWHtiAGeG+Qg4inqoj0c= --------------ms821B3B6B4F987290D459635B-- From Fred L. Drake, Jr." References: <3724CC49.AAB857A5@prescod.net> <14116.55422.189139.235663@amarok.cnri.reston.va.us> <3724E2A1.62223458@prescod.net> <14117.52230.551462.836651@weyr.cnri.reston.va.us> <3725D245.C703951B@fourthought.com> Message-ID: <14117.56336.632205.967452@weyr.cnri.reston.va.us> Mike Olson writes: > Node, no, but NodeList and NamedNodeMap are just containers and I see no > reason why a client should not be able to create them. > > Maybe they are doing some post processing ontop of the DOM but want to > keep a DOMish interface. Then they will need to create NodeLists and > NamedNodeMaps to repackage nodes. Mike, Do you think this would be doable in a way portable across DOM implementations? I've not looked at 4DOM (even though I intended to ;), so I don't know how much it differs from PyDOM under the hood. I would expect that if building these is important, factory methods should be created on the Document object in the same way that there are factory methods for elements, etc. It's not that I object to having the classes available, it's that I don't see any requirement that they be available or that different DOM implementations share the implementation, even as a base class. I'm not convinced of Andrew's claim that having Node available for type tests would be useful, either. ;-) That would also make it difficult to create an all-C implementation of the DOM. (No, I don't have one in the works.) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From mike.olson@fourthought.com Tue Apr 27 17:59:49 1999 From: mike.olson@fourthought.com (Mike Olson) Date: Tue, 27 Apr 1999 11:59:49 -0500 Subject: [XML-SIG] Python DOM Unification -- level References: <3724CC49.AAB857A5@prescod.net> <14116.55422.189139.235663@amarok.cnri.reston.va.us> <3724E2A1.62223458@prescod.net> <14117.52230.551462.836651@weyr.cnri.reston.va.us> <3725D245.C703951B@fourthought.com> <14117.56336.632205.967452@weyr.cnri.reston.va.us> Message-ID: <3725ED05.1FF4DA9A@fourthought.com> This is a cryptographically signed message in MIME format. --------------ms95352A88316FA873F0E7C460 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit "Fred L. Drake" wrote: > Mike Olson writes: > > Node, no, but NodeList and NamedNodeMap are just containers and I see no > > reason why a client should not be able to create them. > > > > Maybe they are doing some post processing ontop of the DOM but want to > > keep a DOMish interface. Then they will need to create NodeLists and > > NamedNodeMaps to repackage nodes. > > Mike, > Do you think this would be doable in a way portable across DOM > implementations? I've not looked at 4DOM (even though I intended to > ;), so I don't know how much it differs from PyDOM under the hood. I > would expect that if building these is important, factory methods > should be created on the Document object in the same way that there > are factory methods for elements, etc. > It's not that I object to having the classes available, it's that I > don't see any requirement that they be available or that different DOM > implementations share the implementation, even as a base class. > I'm not convinced of Andrew's claim that having Node available for > type tests would be useful, either. ;-) That would also make it > difficult to create an all-C implementation of the DOM. (No, I don't > have one in the works.) > We didn't want to pollute the Document API with all of these extra factory methods. We moved all of the stuff that you cannot build from a document into out Factory interface. we also put in all of the other node types so there is one common factory for Nodes. In 4DOM a document has an internal member "factory" where it really creates all of its stuff. This allows us to have a "remote" factory if needed. Note we added the idea of an HTMLDocument. An HTMLDocument is a Document, but added functionality to meet a bunch of the DOM imposed requirements. ie a document must always have a head and body. It also overrides the creatElement to create DOM HTML classes of the required tag. I don't think anything is gained exposing Node. I see Andrew's pooint that at the Node level, appendChild must check to make sure that only Nodes are being added. But down the hierarchy chain another check must be made, this is to make sure that: a) only one Element is added to a document b) no text is added to a document c) etc so there is already validation that the object derives from Node. I think the factory methods would have to be DOM implementation specific. We might be able to have one factory that creates Python DOM implementation NodeList etc but I don't see much gained. I don't think that all python implementations should share base classes and NodeLists, et al. Each should have thier own implementation tailored to its purpose, ie speed, orbed, lightweight NodeFactory.idl #pragma prefix "fourthought.com" #include "../../DOM.idl" #include "../../HTML/HTML.idl" module NodeFactoryIF { typedef sequence listofnodes; interface NodeFactory { //The user should only call these four methods HTMLIF::HTMLDocument createHTMLDocument(); DOMIF::Document createDocument(); HTMLIF::HTMLElement createHTMLElement(in HTMLIF::HTMLDocument parent,in string tag); void releaseNode(in DOMIF::Node node); //Non public interface: user shouldn't call these //All require the ownerDocument, but when called from //Document.py, this is provided for the user DOMIF::DOMImplementation createDOMImplementation(in string feature, in string version); DOMIF::NodeList createNodeList(in listofnodes nodes); DOMIF::NamedNodeMap createNamedNodeMap(); DOMIF::Element createElement(in DOMIF::Document ownerDocument, in string tagName); DOMIF::DocumentFragment createDocumentFragment(in DOMIF::Document ownerDocument); DOMIF::DocumentType createDocumentType(in DOMIF::Document ownerDocument, in string name, in DOMIF::NamedNodeMap entities, in DOMIF::NamedNodeMap notations); DOMIF::Text createTextNode(in DOMIF::Document ownerDocument, in string data); DOMIF::Comment createComment(in DOMIF::Document ownerDocument, in string data); DOMIF::CDATASection createCDATASection(in DOMIF::Document ownerDocument, in string data); DOMIF::ProcessingInstruction createProcessingInstruction(in DOMIF::Document ownerDocument, in string target, in string data); DOMIF::Attr createAttribute(in DOMIF::Document ownerDocument, in string name); DOMIF::Entity createEntity(in DOMIF::Document ownerDocument, in string publicId, in string systemId, in string notationName); DOMIF::EntityReference createEntityReference(in DOMIF::Document ownerDocument,in string name); DOMIF::Notation createNotation(in DOMIF::Document ownerDocument, in string publicId, in string systemId, in string name); DOMIF::NodeIterator createNodeIterator(in DOMIF::Node start_node); DOMIF::NodeIterator createSelectiveNodeIterator(in DOMIF::Node start_node, in unsigned short what_to_show); DOMIF::NodeIterator createFilteredNodeIterator(in DOMIF::Node start_node, in DOMIF::NodeFilter filter); DOMIF::NodeIterator createSelectiveFilteredNodeIterator(in DOMIF::Node start_node, in unsigned short what_to_show, in DOMIF::NodeFilter filter); HTMLIF::HTMLCollection createHTMLCollection(in listofnodes nodes); }; }; Mike > > -Fred > > -- > Fred L. Drake, Jr. > Corporation for National Research Initiatives -- Mike Olson Member Consultant FourThought LLC http://www.fourthought.com http://opentechnology.org --- "No program is interesting in itself to a programmer. It's only interesting as long as there are new challenges and new ideas coming up." --- Linus Torvalds --------------ms95352A88316FA873F0E7C460 Content-Type: application/x-pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIIKmQYJKoZIhvcNAQcCoIIKijCCCoYCAQExCzAJBgUrDgMCGgUAMAsGCSqGSIb3DQEHAaCC CCUwggTvMIIEWKADAgECAhAOCY8cYeSQOObs5zKyDmWRMA0GCSqGSIb3DQEBBAUAMIHMMRcw FQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UECxMWVmVyaVNpZ24gVHJ1c3QgTmV0d29y azFGMEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9yZXBvc2l0b3J5L1JQQSBJbmNvcnAuIEJ5 IFJlZi4sTElBQi5MVEQoYyk5ODFIMEYGA1UEAxM/VmVyaVNpZ24gQ2xhc3MgMSBDQSBJbmRp dmlkdWFsIFN1YnNjcmliZXItUGVyc29uYSBOb3QgVmFsaWRhdGVkMB4XDTk5MDMwNTAwMDAw MFoXDTk5MDUwNDIzNTk1OVowggEKMRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UE CxMWVmVyaVNpZ24gVHJ1c3QgTmV0d29yazFGMEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9y ZXBvc2l0b3J5L1JQQSBJbmNvcnAuIGJ5IFJlZi4sTElBQi5MVEQoYyk5ODEeMBwGA1UECxMV UGVyc29uYSBOb3QgVmFsaWRhdGVkMSYwJAYDVQQLEx1EaWdpdGFsIElEIENsYXNzIDEgLSBO ZXRzY2FwZTETMBEGA1UEAxQKTWlrZSBPbHNvbjEpMCcGCSqGSIb3DQEJARYabWlrZS5vbHNv bkBmb3VydGhvdWdodC5jb20wgZ8wDQYJKoZIhvcNAQEBBQADgY0AMIGJAoGBANKGswZUnQ/B IfNlZWIIy6G6AkyjYgPRhXynebPtI5ARMq9xDo2zgLgWE+8QffdoZp2hUnTpm63B6cG8yqH1 PnA/7SB2roIfml1vnOwXgNuBctciTmnrac4GWgL0CM9839fJZh47QIVYPlCbOPtnvnH1NGGD jFWAVX7vmES72Dl9AgMBAAGjggGPMIIBizAJBgNVHRMEAjAAMIGsBgNVHSAEgaQwgaEwgZ4G C2CGSAGG+EUBBwEBMIGOMCgGCCsGAQUFBwIBFhxodHRwczovL3d3dy52ZXJpc2lnbi5jb20v Q1BTMGIGCCsGAQUFBwICMFYwFRYOVmVyaVNpZ24sIEluYy4wAwIBARo9VmVyaVNpZ24ncyBD UFMgaW5jb3JwLiBieSByZWZlcmVuY2UgbGlhYi4gbHRkLiAoYyk5NyBWZXJpU2lnbjARBglg hkgBhvhCAQEEBAMCB4AwgYYGCmCGSAGG+EUBBgMEeBZ2ZDQ2NTJiZDYzZjIwNDcwMjkyOTg3 NjNjOWQyZjI3NTA2OWM3MzU5YmVkMWIwNTlkYTc1YmM0YmM5NzAxNzQ3ZGE1ZDNmMjE0MWJl YWRiMmJkMmU4OTIxM2FlNmFmOWRmMTE0OTk5YTNiODQ1ZjlmM2VhNDUwYzAzBgNVHR8ELDAq MCigJqAkhiJodHRwOi8vY3JsLnZlcmlzaWduLmNvbS9jbGFzczEuY3JsMA0GCSqGSIb3DQEB BAUAA4GBAIuxBeIOBMHbj5yM/Vu4UJxDcz4Xtc7h0K8c6d82SiwwKLN5Gbew69PevcN6Ak+p D8LO4NyCH8Cfu3acoT0Efi99XjWvdi2eSbDJUw6MvgJtnAfY03zM+Cf31A/1iyrvr3hD45/c yhUNRh8f6qX1NzeKvvh5AcYD1bsi+0wnP0D8MIIDLjCCApegAwIBAgIRANJ2Lo0UDD19sqgl Xa/uDXUwDQYJKoZIhvcNAQECBQAwXzELMAkGA1UEBhMCVVMxFzAVBgNVBAoTDlZlcmlTaWdu LCBJbmMuMTcwNQYDVQQLEy5DbGFzcyAxIFB1YmxpYyBQcmltYXJ5IENlcnRpZmljYXRpb24g QXV0aG9yaXR5MB4XDTk4MDUxMjAwMDAwMFoXDTA4MDUxMjIzNTk1OVowgcwxFzAVBgNVBAoT DlZlcmlTaWduLCBJbmMuMR8wHQYDVQQLExZWZXJpU2lnbiBUcnVzdCBOZXR3b3JrMUYwRAYD VQQLEz13d3cudmVyaXNpZ24uY29tL3JlcG9zaXRvcnkvUlBBIEluY29ycC4gQnkgUmVmLixM SUFCLkxURChjKTk4MUgwRgYDVQQDEz9WZXJpU2lnbiBDbGFzcyAxIENBIEluZGl2aWR1YWwg U3Vic2NyaWJlci1QZXJzb25hIE5vdCBWYWxpZGF0ZWQwgZ8wDQYJKoZIhvcNAQEBBQADgY0A MIGJAoGBALtaRIoEFrtV/QN6ii2UTxV4NrgNSrJvnFS/vOh3Kp258Gi7ldkxQXB6gUu5SBNW LccI4YRCq8CikqtEXKpC8IIOAukv+8I7u77JJwpdtrA2QjO1blSIT4dKvxna+RXoD4e2HOPM xpqOf2okkuP84GW6p7F+78nbN2rISsgJBuSZAgMBAAGjfDB6MBEGCWCGSAGG+EIBAQQEAwIB BjBHBgNVHSAEQDA+MDwGC2CGSAGG+EUBBwEBMC0wKwYIKwYBBQUHAgEWH3d3dy52ZXJpc2ln bi5jb20vcmVwb3NpdG9yeS9SUEEwDwYDVR0TBAgwBgEB/wIBADALBgNVHQ8EBAMCAQYwDQYJ KoZIhvcNAQECBQADgYEAiLg3O93alDcAraqf4YEBcR6Sam0v9vGd08pkONwbmAwHhluFFWoP uUmFpJXxF31ntH8tLN2aQp7DPrSOquULBt7yVir6M8e+GddTTMO9yOMXtaRJQmPswqYXD11Y Gkk8kFxVo2UgAP0YIOVfgqaxqJLFWGrBjQM868PNBaKQrm4xggI8MIICOAIBATCB4TCBzDEX MBUGA1UEChMOVmVyaVNpZ24sIEluYy4xHzAdBgNVBAsTFlZlcmlTaWduIFRydXN0IE5ldHdv cmsxRjBEBgNVBAsTPXd3dy52ZXJpc2lnbi5jb20vcmVwb3NpdG9yeS9SUEEgSW5jb3JwLiBC eSBSZWYuLExJQUIuTFREKGMpOTgxSDBGBgNVBAMTP1ZlcmlTaWduIENsYXNzIDEgQ0EgSW5k aXZpZHVhbCBTdWJzY3JpYmVyLVBlcnNvbmEgTm90IFZhbGlkYXRlZAIQDgmPHGHkkDjm7Ocy sg5lkTAJBgUrDgMCGgUAoIGxMBgGCSqGSIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcN AQkFMQ8XDTk5MDQyNzE2NTk0OVowIwYJKoZIhvcNAQkEMRYEFEN1/IDy3e1CsJ5I0lu5OZbb ueHtMFIGCSqGSIb3DQEJDzFFMEMwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMAcGBSsO AwIHMA0GCCqGSIb3DQMCAgFAMA0GCCqGSIb3DQMCAgEoMA0GCSqGSIb3DQEBAQUABIGAEEjC AydymAhSkIi0XqXvCpulQYxV8kC7bJLdKZAU1bFgtehnlZDXFyPVy89jttSKPxD+x+HPpaiQ HvZDXO261Brw4L3Os8FfTH+jv53Gd3udeBYZbD/bed9I6pzrdyP2/PK+yyPangWa+jpgK0F5 IeYxHh5HWoqm6vYyJWhoFxc= --------------ms95352A88316FA873F0E7C460-- From skip@mojam.com (Skip Montanaro) Tue Apr 27 18:23:13 1999 From: skip@mojam.com (Skip Montanaro) (skip@mojam.com (Skip Montanaro)) Date: Tue, 27 Apr 1999 13:23:13 -0400 Subject: [XML-SIG] XML package speed (or lack thereof...)? Message-ID: <199904271723.NAA27170@cm-29-94-14.nycap.rr.com> I'm using XML-RPC to provide an over-the-net API to Python, Perl and Java clients on my server. I'm currently using a hacked up version of Fredrik Lundh's xmlrpclib module. The hacking part involved writing a C module to do the low-level encoding and decoding so it was fast enough for my purposes. This library only does XML-RPC, nothing else. Ideally, I'd like to dump my XML-RPC-specific code in favor of something more general, robust and better supported. Accordingly, I downloaded the 0.5.1 version of the xml-sig package today and gave it a whirl. After making a couple small mods to marshal/generic/test: def test(load, loads, dump, dumps, test_values, do_assert = 1): # Try all the above bits of data try: from cStringIO import StringIO except ImportError: from StringIO import StringIO import time t = time.time() for i in range(10): for item in test_values: s = dumps(item) #print item, s output = loads(s) # Try it from a file file = StringIO() dump(item, file) file.seek(0) output2 = load(file) if do_assert: assert item==output and item==output2 and output==output2 t = time.time() - t print "total time: %.2f seconds" % t print "time per pass: %.2f seconds" % (t/10) and commenting out a print statement in marshal/xmlrpc/XMLRPCUnmarshaller/ um_end_dictionary, I was able to run the test without any spurious messages. I got the following output on my 100 MHz Pentium (Python 1.5.1, RH Linux 5.0): >>> xml.marshal.xmlrpc.runtests () Testing XML-RPC marshalling... total time: 9.77 seconds time per pass: 0.98 seconds This is hardly what I would call blazing speed (perhaps 30-100x slower than what I currently get), especially considering the small size of the test data, so I thought perhaps I was missing something - an optional C library perhaps? I see that Fredrik's sgmlop module was built and installed, but my guess is that it's not being used. Thx, Skip Montanaro | Mojam: "Uniting the World of Music" http://www.mojam.com/ skip@mojam.com | Musi-Cal: http://www.musi-cal.com/ 518-372-5583 From Fred L. Drake, Jr." References: <3724CC49.AAB857A5@prescod.net> <14116.55422.189139.235663@amarok.cnri.reston.va.us> <3724E2A1.62223458@prescod.net> <14117.52230.551462.836651@weyr.cnri.reston.va.us> <3725D245.C703951B@fourthought.com> <14117.56336.632205.967452@weyr.cnri.reston.va.us> <3725ED05.1FF4DA9A@fourthought.com> Message-ID: <14117.64530.986628.424144@weyr.cnri.reston.va.us> Mike Olson writes: > We didn't want to pollute the Document API with all of these extra factory > methods. We moved all of the stuff that you cannot build from a document ... > I think the factory methods would have to be DOM implementation specific. ... > I don't think that all python implementations should share base classes and > NodeLists, et al. Each should have thier own implementation tailored to its > purpose, ie speed, orbed, lightweight Mike, I think we agree. ;-) I'm happy with using a factory object to gain access to node construction, and don't really care that much if it's a separate object from the document object. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From Jeff.Johnson@icn.siemens.com Tue Apr 27 22:31:55 1999 From: Jeff.Johnson@icn.siemens.com (Jeff.Johnson@icn.siemens.com) Date: Tue, 27 Apr 1999 17:31:55 -0400 Subject: [XML-SIG] DOM normalize() broken? entity refs lost? Message-ID: <85256760.007644BA.00@li01.lm.ssc.siemens.com> Entity references and any other tags covered by xml.dom.writer.Walker.doOtherNode() are thrown away when written to a file using XmlWriter or its subclass HtmlWriter. XmlWriter does not define .doOtherNode() so nothing gets written. I noticed it when bullets, registration marks, and apostrophes started disappearing from my HTML files. I haven't tried to write the code for XmlWriter.doOtherNode() yet, maybe you gurus could do it much better than I can... :) Last week I asked how to find simple strings in adjacent text nodes and was advized to use Element.normalize(). I tried it and unless I'm doing it wrong, it doesn't seem to work. I've included a test script that demonstrates both problems: #============== SCRIPT STARTS HERE =========================== import sys, os from xml.dom.utils import FileReader from xml.dom.writer import HtmlWriter from StringIO import StringIO html = """ test

Registered entity gets thrown away: ®

Text on multiple lines and with extra white space in the raw HTML doesn't change when dom.get_documentElement().normalize() is called.

""" fr = FileReader() dom = fr.readStream(StringIO(html),'HTML') dom.get_documentElement().normalize() w = HtmlWriter() w.write(dom) From bslesins@best.com Wed Apr 28 02:26:04 1999 From: bslesins@best.com (Brian Slesinsky) Date: Tue, 27 Apr 1999 18:26:04 -0700 (PDT) Subject: [XML-SIG] checking syntax with xmllib Message-ID: Hi, I tried using xmllib to check if an XML document is well-formed and found some bugs. If I use xmllib from Python 1.5.2, it complains about invalid characters. However, I'm fairly sure I'm using correct UTF8 encoding (the document contains European characters and was converted to Unicode from ISO-8859-1). It looks like the 'illegal' regular expression in xmllib is incorrect. I also tried xml.parsers.xmllib from Python/XML 0.5.1, but it doesn't seem to be doing any syntax checking at all - I tried a file with one close tag and it didn't complain. Here's the script I'm using to do the tests: #!/nuvo/bin/python import sys from xml.parsers.xmllib import XMLParser def check_xml(file): x = XMLParser() f = open(file) while 1: line = f.readline() if line=="": break x.feed(line) check_xml(sys.argv[1]) - Brian Slesinsky From akuchlin@cnri.reston.va.us Wed Apr 28 03:41:53 1999 From: akuchlin@cnri.reston.va.us (A.M. Kuchling) Date: Tue, 27 Apr 1999 22:41:53 -0400 Subject: [XML-SIG] DOM normalize() broken? entity refs lost? In-Reply-To: <85256760.007644BA.00@li01.lm.ssc.siemens.com> References: <85256760.007644BA.00@li01.lm.ssc.siemens.com> Message-ID: <199904280241.WAA00900@207-172-184-212.s212.tnt23.brd.va.dialup.rcn.com> Jeff.Johnson@icn.siemens.com writes: > XmlWriter does not define .doOtherNode() > so nothing gets written. Eek! You're right. Try this patch: Index: writer.py =================================================================== RCS file: /home/cvsroot/xml/dom/writer.py,v retrieving revision 1.8 diff -C2 -r1.8 writer.py *** writer.py 1999/04/08 00:14:29 1.8 --- writer.py 1999/04/28 02:29:42 *************** *** 119,123 **** self.stream.write(node.toxml()) ! class XmlLineariser(XmlWriter): --- 119,125 ---- self.stream.write(node.toxml()) ! def doOtherNode(self, node): ! self.stream.write( node.toxml() ) ! class XmlLineariser(XmlWriter): >

Text on multiple > lines and with extra white space in the > raw HTML doesn't change when dom.get_documentElement().normalize() Careful; that isn't what normalize() does. Add another Text node as a child of the TITLE element, to produce two Text nodes text to each other. dom.dump() will then output: > ... After calling normalize: > ... See how the two text nodes have been merged? It doesn't do anything about whitespace. To strip out whitespace, look at strip_whitespace or collapse_whitespace in xml.dom.utils; after collapse_whitespace(dom, WS_INTERNAL), runs of whitespace are collapsed down to a single space. -- A.M. Kuchling http://starship.python.net/crew/amk/ Guards! Guards! Stop this madman! He's turning everyone into monkeys! -- A sudden intrusion, in ZOT! #1 From paul@prescod.net Wed Apr 28 17:19:26 1999 From: paul@prescod.net (Paul Prescod) Date: Wed, 28 Apr 1999 11:19:26 -0500 Subject: [XML-SIG] Another SAX Suggestion References: Message-ID: <3727350E.6B51E1ED@prescod.net> I would like to suggest the default error handlers do something useful: def error(self, exception): "Handle a recoverable error." sys.stderr.write( "Error: "+ exception ) def fatalError(self, exception): "Handle a non-recoverable error." sys.stderr.write( "Fatal Error: "+ exception ) def warning(self, exception): "Handle a warning." sys.stderr.write( "Warning: "+ exception ) Of course if that's not what a particular implementation wants, they can override it, but I think that the current lack of behavior is non-intuitive. Maybe I'm corrupted by working with SGML tools but I expect the defaults to be as above. -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "Microsoft spokesman Ian Hatton admits that the Linux system would have performed better had it been tuned." "Future press releases on the issue will clearly state that the research was sponsored by Microsoft." http://www.itweb.co.za/sections/enterprise/1999/9904221410.asp From Jeff.Johnson@icn.siemens.com Wed Apr 28 18:21:04 1999 From: Jeff.Johnson@icn.siemens.com (Jeff.Johnson@icn.siemens.com) Date: Wed, 28 Apr 1999 13:21:04 -0400 Subject: [XML-SIG] DOM normalize() broken? entity refs lost? Message-ID: <85256761.005F477A.00@li01.lm.ssc.siemens.com> Thanks for the entity reference fix Andrew. It now saves "®" but it still loses things like "’". I think this is Unicode generated from the RTF to HTML filter I'm using, and while I can change the RTF to HTML character translation table to convert RTF "quoteright" to "'" instead of "’", I'm curious where the entity ref is going. I put some debug statements in HtmlBuilder.handle_entityref() but it never gets called. I know there is controversy over Unicode support but I don't know enough about it to know what to expect in this case. A new script is included: import sys, os from StringIO import StringIO from xml.dom import utils from xml.dom.writer import HtmlWriter, XmlWriter html = """

Don’t

""" # This works with Andrew's patch but the unicode single quote still vanishes without a trace. #

Registered ®

fr = utils.FileReader() dom = fr.readStream(StringIO(html),'HTML') w = XmlWriter() w.write(dom) From akuchlin@cnri.reston.va.us Wed Apr 28 18:39:42 1999 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Wed, 28 Apr 1999 13:39:42 -0400 (EDT) Subject: [XML-SIG] Another SAX Suggestion In-Reply-To: <3727350E.6B51E1ED@prescod.net> References: <3727350E.6B51E1ED@prescod.net> Message-ID: <14119.17665.211348.533470@amarok.cnri.reston.va.us> Paul Prescod writes: >I would like to suggest the default error handlers do something useful: Agreed; the general Python philosophy is to make noise when something is unexpectedly, rather than making some assumption and charging onward. Printing an error message seems to be the right level of noise for parsing errors; they could raise an exception and terminate further processing (and actually I wouldn't mind that either), but printing a message seems sufficient. -- A.M. Kuchling http://starship.python.net/crew/amk/ Principally I played pedants, idiots, old fathers, and drunkards. As you see, I had a narrow escape from becoming a professor. -- Robertson Davies, "Shakespeare over the Port" From Lutz.Ehrlich@EMBL-Heidelberg.de Fri Apr 30 10:56:51 1999 From: Lutz.Ehrlich@EMBL-Heidelberg.de (Lutz.Ehrlich@EMBL-Heidelberg.de) Date: Fri, 30 Apr 1999 11:56:51 +0200 (MDT) Subject: [XML-SIG] XQL: Somebody working on it? Message-ID: <14121.31687.843895.101080@cuckoo.EMBL-Heidelberg.DE> G'day all, as I didn't find anything in the recent CVS source for the xml package, I wondered whether somebody is currently working on implementing XQL (http://metalab.unc.edu/xql/) ? Before I start doing anything myself, I would like to hear your opinion about such a thing. Would implementation be a big thing? Have you guys discussed implementing any of the query language proposals already? Any comments are most welcome, Lutz ______________________________________________________________________ Lutz Ehrlich web : http://www.embl-heidelberg.de/~ehrlich email: lutz.ehrlich@embl-heidelberg.de European Molecular Biology Laboratory phone: +49-6221-387-140 Meyerhofstr. 1 fax : +49-6221-387-517 D-69012 Heidelberg, Germany From Jeff.Johnson@icn.siemens.com Fri Apr 30 15:13:16 1999 From: Jeff.Johnson@icn.siemens.com (Jeff.Johnson@icn.siemens.com) Date: Fri, 30 Apr 1999 10:13:16 -0400 Subject: [XML-SIG] unicode entitie refs Message-ID: <85256763.004E13CB.00@li01.lm.ssc.siemens.com> Sorry to be a pest but I never got a response on the following email and was hoping someone had an answer as to why unicode entity refs dissapear in PyDom. After I write this I'll start looking at the SAX code, maybe I have to install error handlers? Any suggestions? Thanks, Jeff ---------------------- Forwarded by Jeff Johnson/Service/ICN on 04/30/99 10:07 AM --------------------------- Jeff Johnson 04/28/99 01:21 PM To: akuchlin@cnri.reston.va.us cc: xml-sig@python.org Subject: Re: [XML-SIG] DOM normalize() broken? entity refs lost? (Document link not converted) Thanks for the entity reference fix Andrew. It now saves "®" but it still loses things like "’". I think this is Unicode generated from the RTF to HTML filter I'm using, and while I can change the RTF to HTML character translation table to convert RTF "quoteright" to "'" instead of "’", I'm curious where the entity ref is going. I put some debug statements in HtmlBuilder.handle_entityref() but it never gets called. I know there is controversy over Unicode support but I don't know enough about it to know what to expect in this case. A new script is included: import sys, os from StringIO import StringIO from xml.dom import utils from xml.dom.writer import HtmlWriter, XmlWriter html = """

Don’t

""" # This works with Andrew's patch but the unicode single quote still vanishes without a trace. #

Registered ®

fr = utils.FileReader() dom = fr.readStream(StringIO(html),'HTML') w = XmlWriter() w.write(dom) From paul@prescod.net Fri Apr 30 15:09:49 1999 From: paul@prescod.net (Paul Prescod) Date: Fri, 30 Apr 1999 09:09:49 -0500 Subject: [XML-SIG] XQL: Somebody working on it? References: <14121.31687.843895.101080@cuckoo.EMBL-Heidelberg.DE> Message-ID: <3729B9AD.C2D86911@prescod.net> Lutz.Ehrlich@EMBL-Heidelberg.de wrote: > > G'day all, > > as I didn't find anything in the recent CVS source for the xml > package, I wondered whether somebody is currently working on > implementing XQL (http://metalab.unc.edu/xql/) ? Before I start doing > anything myself, I would like to hear your opinion about such a > thing. Would implementation be a big thing? Have you guys discussed > implementing any of the query language proposals already? XSL implicitly depends on a query language. It isn't defined separately from XSL but it is defined in the XSL specification. That query language actually has W3C standadization status and is needed for the Python XSL implementation that is under development. XQL is sort of like that language -- but not quite, and not standardized. I think that before XQL becomes any kind of standard it would have to be aligned with XSL's query language. Therefore you can choose yourself whether you want to implement it in the meantime or not. It all depends on whether you want to work on something that will likely be obsolete in a year or not....in the XML world a year is a lifetime so maybe that's a good tradeoff. -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "Microsoft spokesman Ian Hatton admits that the Linux system would have performed better had it been tuned." "Future press releases on the issue will clearly state that the research was sponsored by Microsoft." http://www.itweb.co.za/sections/enterprise/1999/9904221410.asp From paul@prescod.net Fri Apr 30 15:10:05 1999 From: paul@prescod.net (Paul Prescod) Date: Fri, 30 Apr 1999 09:10:05 -0500 Subject: [XML-SIG] XQL: Somebody working on it? References: <14121.31687.843895.101080@cuckoo.EMBL-Heidelberg.DE> Message-ID: <3729B9BD.689AB4F8@prescod.net> Lutz.Ehrlich@EMBL-Heidelberg.de wrote: > > G'day all, > > as I didn't find anything in the recent CVS source for the xml > package, I wondered whether somebody is currently working on > implementing XQL (http://metalab.unc.edu/xql/) ? Before I start doing > anything myself, I would like to hear your opinion about such a > thing. Would implementation be a big thing? Have you guys discussed > implementing any of the query language proposals already? XSL implicitly depends on a query language. It isn't defined separately from XSL but it is defined in the XSL specification. That query language actually has W3C standadization status and is needed for the Python XSL implementation that is under development. XQL is sort of like that language -- but not quite, and not standardized. I think that before XQL becomes any kind of standard it would have to be aligned with XSL's query language. Therefore you can choose yourself whether you want to implement it in the meantime or not. It all depends on whether you want to work on something that will likely be obsolete in a year or not....in the XML world a year is a lifetime so maybe that's a good tradeoff. -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "Microsoft spokesman Ian Hatton admits that the Linux system would have performed better had it been tuned." "Future press releases on the issue will clearly state that the research was sponsored by Microsoft." http://www.itweb.co.za/sections/enterprise/1999/9904221410.asp From wunder@infoseek.com Fri Apr 30 16:51:19 1999 From: wunder@infoseek.com (Walter Underwood) Date: Fri, 30 Apr 1999 08:51:19 -0700 Subject: [XML-SIG] Another SAX Suggestion In-Reply-To: <3727350E.6B51E1ED@prescod.net> References: Message-ID: <3.0.5.32.19990430085119.00ad0c50@corp> At 11:19 AM 4/28/99 -0500, Paul Prescod wrote: >I would like to suggest the default error handlers do something useful: > > def error(self, exception): > "Handle a recoverable error." > sys.stderr.write( "Error: "+ exception ) Since we write servers, we consider output to stderr from a library to be a defect. Anybody else remember "RANGE ERROR" from the C math library? I had to rip out some stderr writes from pyexpat, too. I wouldn't mind having a stderr error handler provided as part of the module, with sample code that uses that error handler. Also along this line, does the SAX adaptor for expat catch all exceptions raised in a handler? The Expat core doesn't know how to propagate exceptions, so they need to be caught and reported locally. This is an interesting behavior difference between SAX over different parser implementations (a pure-Python parser would propagate the exceptions). Sorry for the ignorance of SAX details -- our XML support shipped last September and I haven't gone back and re-coded to the portable interface. wunder -- Walter R. Underwood wunder@infoseek.com wunder@best.com (home) http://software.infoseek.com/cce/ (my product) http://www.best.com/~wunder/ 1-408-543-6946