From Markup To Object Model

From faassen@vet.uu.nl Thu Jun 1 00:47:50 2000 From: faassen@vet.uu.nl (Martijn Faassen) Date: Thu, 1 Jun 2000 01:47:50 +0200 Subject: [XML-SIG] XML serialization / marshalling via DTD In-Reply-To: <383153385.959802355030.JavaMail.root@web135-mc.mail.com> References: <383153385.959802355030.JavaMail.root@web135-mc.mail.com> Message-ID: <20000601014750.A28400@vet.uu.nl> george willis wrote: [snip] > Does the code in the python SOAP package perform serialization and > deserialization? How does this code compare to other ser/deser code found > used in XMLDocument, XMLWidgets, and ZODB? It would seem to me that since > this is needed for ZODB, we might have some good code their? As far as I'm aware, XMLWidgets doesn't do any serialization or deserialization. I should know, as I wrote XMLWidgets. :) (XMLWidgets is a Zope product, by the way, for those listening to this on the XML-SIG mailing list. It provides user interface system (HTML) allowing you to attach a HTML user interface and user interface events to classes of XML nodes.) ZODB automatically serializes and deserializes arbitrary Python objects in a transparent fashion, using Python's 'pickle' facility (which I suggest you check out if you haven't heard of it yet). In fact, it does more, offering transparent persistence for Python objects (with just a few restrictions to support the transaction logic). Since Zope's XMLDocument is built on top of the ZODB, it automatically gains its persistence from that. (this applies to both versions of XMLDocument; both the original version now in use and the new one being developed by FourThought). Of course XMLDocument also includes a parser so that you can edit and upload XML, but the internal format is Python objects. > Has anyone compared these codebases that must perform the ser/deser to see > which might have the best code? Shouldn't we study them and put forth a > "best-of-breed" ser/deser mechanism that can then be used by all these > consumers? In the Zope codebase there is in fact no actual serialization of XML specifically. There's just the ZODB's pickle serialization. Regards, Martijn From bjorn@roguewave.com Thu Jun 1 03:58:05 2000 From: bjorn@roguewave.com (Bjorn Pettersen) Date: Wed, 31 May 2000 20:58:05 -0600 Subject: [XML-SIG] speed question re DOM parsing References: Message-ID: <3935D13D.F4EAD64B@roguewave.com> Greg Stein wrote: > > On Wed, 24 May 2000, Bjorn Pettersen wrote: > > I'm just starting to work with XML, so be gentle > > > > The problem is that I'm reading in a 280K xml file using the sample code > > from the XML howto: > > > > def getXmlDomDocument(name): > > p = saxexts.make_parser() > > dh = SaxBuilder() > > p.setDocumentHandler(dh) > > p.parseFile(open(name)) > > p.close() > > doc = dh.document > > xml.dom.utils.strip_whitespace(doc) > > return doc > > > > it takes about five seconds to read and parse the file... > > > > Is there a better way to read the file (or is there updated code that is > > faster)? > > If you want a DOM for the output, then no... you'll have to deal with the > speed. If you have simple requirements for the Python representation of > the XML, then take a look at xml.utils.qp_xml. > > Cheers, > -g Ok, time for an update ;-) I've been using the qp_xml.Parser class for a couple of days with good results. With xml files of ~500K parsing takes less than 2 secs. I just got a 1.2Mb xml file however, and the parsing time went up to a little over 50 secs... After some profiling, I found that most of the time was going into the else branch in the cdata method. This branch is growing a string character by character by saying: elem.first_cdata = elem.first_cdata + data testing my assumption I switched elem.first_cdata to be a cStringIO.StringIO object (I was lazy enough to not implement a __getattr__). With only this change, the parsing time went down to about 2.5 secs(!). Question: does using StringIO (or perhaps array) and __getattr__ sound like the right thing to do? (and if so, should I polish my changes and submit them?) -- bjorn ps: I'm running on a Pentium-II/450Mhz with 256Mb RAM (in case you thought I was swapping :-) From larsga@garshol.priv.no Thu Jun 1 14:26:59 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 01 Jun 2000 15:26:59 +0200 Subject: [XML-SIG] SAX 2.0 documentation Message-ID: To make it easier for people to see what SAX 2.0 is like and comment on the interfaces I've put up auto-generated HTML documentation at: There has also been lots of improvments on the driver and testing fronts. I will make a new release with these as soon as I can. --Lars M. From ke@gnu.franken.de Thu Jun 1 17:34:01 2000 From: ke@gnu.franken.de (Karl Eichwalder) Date: 01 Jun 2000 18:34:01 +0200 Subject: [XML-SIG] Trying to install (Re: ANN: 4XPath 0.9.0 and 4XSLT 0.9.0) In-Reply-To: uche.ogbuji@fourthought.com's message of "Wed, 24 May 2000 22:45:53 -0600" References: <200005250445.WAA02755@localhost.localdomain> Message-ID: uche.ogbuji@fourthought.com writes: > 4XSLT and 4XPath 0.9.0 Thanks for the new release! I'm trying to install your packages on SuSE Linux 6.4 (and I'd like to add them to our distribution). SuSE Linux comes with Python XML Tools 0.5.1; are they new enough? 4DOM.html doesn't specify a special version. Reading the documentation I've the impression I've to install 4DOM first (4DOM-0.10.0.tgz). I adjusted 4DOM.spec a little bit and I was able to build a package. Running python /usr/doc/packages/4DOM/demo/dom_from_html_file.py \ /usr/doc/packages/4DOM/demo/employee_table.html Ft.Lib was missing. After copying the Ft directory coming with 4DOM-0.10.0.tgz to the site-packages directory the error message changed: Traceback (innermost last): File "/usr/doc/packages/4DOM/demo/dom_from_html_file.py", line 3, in ? from xml.dom.ext.reader import HtmlLib File "/usr/lib/python1.5/site-packages/xml/dom/ext/__init__.py", line 18, in ? from xml.dom.Node import Node File "/usr/lib/python1.5/site-packages/xml/dom/Node.py", line 12, in ? from xml.dom import implementation ImportError: cannot import name implementation Now I need your help to solve this problem, please. [ NB. http://Fourthought.com has several dead links. E.g., it refers to .tar.gz files but actually .tgz files are offered. ] -- work : ke@suse.de | : http://www.suse.de/~ke/ | ------ ,__o home : ke@gnu.franken.de | ------ _-\_<, : http://www.franken.de/users/gnu/ke/ | ------ (*)/'(*) From fdrake@acm.org Thu Jun 1 17:58:16 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 1 Jun 2000 12:58:16 -0400 (EDT) Subject: [XML-SIG] XML support in Python 1.6 Message-ID: <14646.38440.220241.683691@cj42289-a.reston1.va.home.com> There are a few open issues with XML support in Python 1.6; these need to be resolved. One is the pyexpat support; the Modules/Setup.in file used to configure the set of modules built under Unix requires libexpat.a. There's a note that this can be done manually until James Clark adds it to the Unix build -- has *anyone* been in contact with James about this? Guido will not add the expat sources to the Python tree, since that just creates maintenance headaches. Paul, I expect you actually know James; can you ask him about this? I'd hate for all your work on the module to go to waste! We also need to decide what sort of API we want to publicize as part of the standard library -- SAX or SAX2? Given the delay for Python 1.6, it probably makes sense to include a saxlib module that implements SAX2. Lars, does this still make sense to you? We also need to determine how Unicode should be supported; should the parser always produce Unicode strings, or UTF-8, and provide a wrapper that converts everything? Since it appears likely that auto-conversion between Unicode and narrow strings will likely only work for 7-bit narrow strings, it may be reasonable to create Unicode output directly from the parser (probably at the pyexpat level for efficiency). Any other issues? Comments? Should we just drop Python 1.6 and concentrate on Python 3000? :) -Fred -- Fred L. Drake, Jr. PythonLabs at BeOpen.com From paul@prescod.net Thu Jun 1 17:23:03 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 01 Jun 2000 11:23:03 -0500 Subject: [XML-SIG] XML support in Python 1.6 References: <14646.38440.220241.683691@cj42289-a.reston1.va.home.com> Message-ID: <39368DE7.4F8FE40A@prescod.net> "Fred L. Drake, Jr." wrote: > > ... > One is the pyexpat support; the Modules/Setup.in file used to > configure the set of modules built under Unix requires libexpat.a. > There's a note that this can be done manually until James Clark adds > it to the Unix build -- has *anyone* been in contact with James about > this? Guido will not add the expat sources to the Python tree, since > that just creates maintenance headaches. Paul, I expect you actually > know James; can you ask him about this? I'd hate for all your work on > the module to go to waste! Sorry, I'm dense and forgetful and don't follow the question. What does James Clark need to do? I've talked to James about licensing issues but that doesn't seem to be the issue. Whatever it is, he's been very helpful so far so I don't think it is an impediment. > We also need to decide what sort of API we want to publicize as part > of the standard library -- SAX or SAX2? Given the delay for Python > 1.6, it probably makes sense to include a saxlib module that > implements SAX2. Lars, does this still make sense to you? I agree 100%. The only question is whether to make a SAX2 subset or just put all of SAX2 in there. > We also need to determine how Unicode should be supported; should > the parser always produce Unicode strings, or UTF-8, and provide a > wrapper that converts everything? Since it appears likely that > auto-conversion between Unicode and narrow strings will likely only > work for 7-bit narrow strings, it may be reasonable to create Unicode > output directly from the parser (probably at the pyexpat level for > efficiency). I think that the logical thing is to produce real Unicode objects. If the auto-conversion thing turns out to be a hassle then we can provide some sort of alternate interface that is ASCII-only and thus more convenient for those who have no interest in Unicode. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself At the same moment that the Justice Department and the Federal Trade Commission are trying to restrict the negative consequences of monopoly, the Commerce Department and the Congress are helping to define new intellectual property rights, rights that have a significant potential to create new monopolies. This is the policy equivalent of arm-wrestling with yourself. - http://www.salon.com/tech/feature/2000/04/07/greenspan/index.html From akuchlin@mems-exchange.org Thu Jun 1 19:18:37 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Thu, 1 Jun 2000 14:18:37 -0400 Subject: [XML-SIG] XML support in Python 1.6 In-Reply-To: <39368DE7.4F8FE40A@prescod.net>; from paul@prescod.net on Thu, Jun 01, 2000 at 11:23:03AM -0500 References: <14646.38440.220241.683691@cj42289-a.reston1.va.home.com> <39368DE7.4F8FE40A@prescod.net> Message-ID: <20000601141837.A10122@amarok.cnri.reston.va.us> On Thu, Jun 01, 2000 at 11:23:03AM -0500, Paul Prescod wrote: >Sorry, I'm dense and forgetful and don't follow the question. What does >James Clark need to do? I've talked to James about licensing issues but Expat 1.1 doesn't build a libexpat.a file, so you have to list all the object files inside expat. Modules/Setup.in assumes that libexpat.a has already been constructed; it doesn't list all the object files. I sent JC a Makefile patch to build libexpat.a; I haven't checked the semi-public release of Expat at thaiopensource.com/whatever-it-was to see if he incorporated it or not. -- A.M. Kuchling http://starship.python.net/crew/amk/ Molesters and foul abusers of comfy chairs and imitation Edwardian sideboards, deflowerers of teak coffee tables and meek landscape paintings and ... -- The Interior League described, in ENIGMA #5: "Lizards and Ghosts" From paul@prescod.net Thu Jun 1 18:35:47 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 01 Jun 2000 12:35:47 -0500 Subject: [XML-SIG] XML support in Python 1.6 References: <14646.38440.220241.683691@cj42289-a.reston1.va.home.com> <39368DE7.4F8FE40A@prescod.net> <20000601141837.A10122@amarok.cnri.reston.va.us> Message-ID: <39369EF3.63D31D50@prescod.net> Yes, the latest version makes xmlparse/libexpat.a > Guido will not add the expat sources to the Python tree, since > that just creates maintenance headaches. I would say rather that adding it would avoid version skew problems but Unix users are accustomed to fending for themselves so I won't push the issue. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself From uogbuji@fourthought.com Thu Jun 1 20:43:29 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 01 Jun 2000 13:43:29 -0600 Subject: [XML-SIG] XML support in Python 1.6 In-Reply-To: Message from "Fred L. Drake, Jr." of "Thu, 01 Jun 2000 12:58:16 EDT." <14646.38440.220241.683691@cj42289-a.reston1.va.home.com> Message-ID: <200006011943.NAA10072@localhost.localdomain> > We also need to decide what sort of API we want to publicize as part > of the standard library -- SAX or SAX2? Given the delay for Python > 1.6, it probably makes sense to include a saxlib module that > implements SAX2. Lars, does this still make sense to you? Here's a loud vote for SAX2. Also, depending on how long the delay is, can we get minidom in? > We also need to determine how Unicode should be supported; should > the parser always produce Unicode strings, or UTF-8, and provide a > wrapper that converts everything? Since it appears likely that > auto-conversion between Unicode and narrow strings will likely only > work for 7-bit narrow strings, it may be reasonable to create Unicode > output directly from the parser (probably at the pyexpat level for > efficiency). I'm not happy with Guido's final ex cathedra determination of Python 1.6's unicode. I'm sure I'm not alone here: the socket-programming-and-other-binary- stream-manipulating-crowd will be happy, but we have a few hurdles in our way now. But since Paul and /F argued more intelligently than I could and left the wall unmoved, there's nothing for it. I think we should put out true unicode everywhere possible, and do whatever magic need be done within the black box to make that happen. > Any other issues? Comments? Should we just drop Python 1.6 and > concentrate on Python 3000? :) Are you kiding? Guido's about to get married, so the "3000" bit could quickly become less of a joke than it is now... -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From gstein@lyra.org Thu Jun 1 20:56:28 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 1 Jun 2000 12:56:28 -0700 (PDT) Subject: [XML-SIG] PyExpat encoding (was: XML support in Python 1.6) In-Reply-To: <14646.38440.220241.683691@cj42289-a.reston1.va.home.com> Message-ID: On Thu, 1 Jun 2000, Fred L. Drake, Jr. wrote: >... > We also need to determine how Unicode should be supported; should > the parser always produce Unicode strings, or UTF-8, and provide a > wrapper that converts everything? Since it appears likely that > auto-conversion between Unicode and narrow strings will likely only > work for 7-bit narrow strings, it may be reasonable to create Unicode > output directly from the parser (probably at the pyexpat level for > efficiency). Expat is typically compiled to spit out a particular encoding. By default, this is UTF-8. Presuming that the compilation flags are exposed and/or runtime-queryable, then pyexpat can compensate accordingly. This implies that it would sometimes return UTF-8 strings, or Unicode objects. IMO, we should have a fixed output format, which is the Expat default: UTF-8. Cheers, -g -- Greg Stein, http://www.lyra.org/ From uogbuji@fourthought.com Thu Jun 1 21:02:21 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 01 Jun 2000 14:02:21 -0600 Subject: [XML-SIG] Trying to install (Re: ANN: 4XPath 0.9.0 and 4XSLT 0.9.0) In-Reply-To: Message from Karl Eichwalder of "01 Jun 2000 18:34:01 +0200." Message-ID: <200006012002.OAA10121@localhost.localdomain> > uche.ogbuji@fourthought.com writes: > > > 4XSLT and 4XPath 0.9.0 > > Thanks for the new release! I'm trying to install your packages on SuSE > Linux 6.4 (and I'd like to add them to our distribution). > > SuSE Linux comes with Python XML Tools 0.5.1; are they new enough? > 4DOM.html doesn't specify a special version. 0.5.1 should be new enough, although we only tested with 0.5.4. The main problem will be that the PyXML package will contain a dom directory to be installed as /usr/lib/python1.5/site-packages/xml/dom. 4DOM now replaces the PyDOM from PyXML and so neds to go into the same directory. This will, of course, make RPM unhappy. We solved this by putting together a PyXML 0.5.4 RPM that excludes DOM, over which the 4DOM ROM can be installed. You can find it at ftp://ftp.fourthought.com/pub/mirrors/python4linux/redhat/i386/python-xml-nodom -0.5.4-1.i386.rpm ftp://ftp.fourthought.com/pub/mirrors/python4linux/redhat/SRPMS/python-xml-nodo m-0.5.4-1.src.rpm You might want to either combine this with 4DOM (something the XML SIG be doing soon anyway), or use a nodom version for SuSE. > Reading the documentation I've the impression I've to install 4DOM first > (4DOM-0.10.0.tgz). I adjusted 4DOM.spec a little bit and I was able to > build a package. Running > > python /usr/doc/packages/4DOM/demo/dom_from_html_file.py \ > /usr/doc/packages/4DOM/demo/employee_table.html > > Ft.Lib was missing. After copying the Ft directory coming with > 4DOM-0.10.0.tgz to the site-packages directory the error message > changed: This is a brown-paper bag bug on our part. The DOM demos are completely broken in the released package. There is at least another major bug, so we're going to work this weekeng to get 4DOM 0.10.1 and 4XSLT/4XPath 0.9.1 out. > [ NB. http://Fourthought.com has several dead links. E.g., it refers to > .tar.gz files but actually .tgz files are offered. ] Thanks. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From akuchlin@mems-exchange.org Thu Jun 1 21:04:16 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Thu, 1 Jun 2000 16:04:16 -0400 Subject: [XML-SIG] PyExpat encoding (was: XML support in Python 1.6) In-Reply-To: ; from gstein@lyra.org on Thu, Jun 01, 2000 at 12:56:28PM -0700 References: <14646.38440.220241.683691@cj42289-a.reston1.va.home.com> Message-ID: <20000601160416.B10244@amarok.cnri.reston.va.us> On Thu, Jun 01, 2000 at 12:56:28PM -0700, Greg Stein wrote: >IMO, we should have a fixed output format, which is the Expat default: >UTF-8. I don't know; it seems a bit odd to parse a Unicode string and then have to convert from an 8-bit encoding back to Unicode in your character data handlers, attributes, etc. The problem is that it's also odd to parse a regular Python string and get back Unicode. OTOH, if Latin1-encoded XML has something like &unichar; in it, Unicode is the only thing it could possibly return. Maybe PyExpat could attempt to convert its Unicode output into an 8-bit string (but using what encoding?), and only return Unicode if it has to. Hmmm... on the third hand, XML is a Unicode based standard, and sometimes returning Unicode and sometimes an 8-bit string is also strange. Maybe it's best to just always return Unicode, and leave further conversion to the caller. I think I'd go for the third option: always returning Unicode strings. -- A.M. Kuchling http://starship.python.net/crew/amk/ I was somebody else once. I... I... don't think I was a very good person. -- The detective in THE MYSTERY PLAY From andy@reportlab.com Thu Jun 1 21:24:00 2000 From: andy@reportlab.com (Andy Robinson) Date: Thu, 1 Jun 2000 21:24:00 +0100 Subject: [XML-SIG] PyExpat encoding (was: XML support in Python 1.6) In-Reply-To: <20000601160416.B10244@amarok.cnri.reston.va.us> Message-ID: Andrew M. Kuchling: > Hmmm... on the third hand, XML is a Unicode based standard, and > sometimes returning Unicode and sometimes an 8-bit string is also > strange. Maybe it's best to just always return Unicode, and leave > further conversion to the caller. > > I think I'd go for the third option: always returning Unicode strings. > I agree. It will make more people start using Unicode if a major library outputs it :-) - Andy Robinson From akuchlin@mems-exchange.org Thu Jun 1 21:38:53 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Thu, 1 Jun 2000 16:38:53 -0400 (EDT) Subject: [XML-SIG] Moving PyXML CVS tree to SourceForge? Message-ID: <200006012038.QAA10397@amarok.cnri.reston.va.us> What do people think of the idea of moving the PyXML CVS tree to SourceForge? That makes adding new people with commit privileges is easier, and we wouldn't need to bug Greg Stein (who currently hosts the CVS tree) when that happens. Downsides are, we're putting more eggs in the SourceForge basket -- but then we're hardly alone in that. Thoughts? -- A.M. Kuchling http://starship.python.net/crew/amk/ It was a man's mind, the size of a house. -- Robertson Davies, _What's Bred in the Bone_ (1985) From gstein@lyra.org Thu Jun 1 22:10:46 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 1 Jun 2000 14:10:46 -0700 (PDT) Subject: [XML-SIG] Moving PyXML CVS tree to SourceForge? In-Reply-To: <200006012038.QAA10397@amarok.cnri.reston.va.us> Message-ID: On Thu, 1 Jun 2000, Andrew M. Kuchling wrote: > What do people think of the idea of moving the PyXML CVS tree to > SourceForge? That makes adding new people with commit privileges is > easier, and we wouldn't need to bug Greg Stein (who currently hosts > the CVS tree) when that happens. Downsides are, we're putting more > eggs in the SourceForge basket -- but then we're hardly alone in > that. > > Thoughts? No skin off my back. I certainly don't take it personally :-) SourceForge is a fantastic tool. While I can certainly provide a number of resources (CVS repository, mailing lists, etc), SourceForge does so *actively*. The XML repository can stay indefinitely. But if people would like to see it moved, then I can 'tar' it up so that the SF admins can untar the repository over at SourceForge. Somebody would have to work on the email'd checkin notices. I don't know that SourceForge does that automatically. It would simply be a matter of extracting some of the files from /CVSROOT/ and moving them to the new SF root. Cheers, -g -- Greg Stein, http://www.lyra.org/ From fdrake@acm.org Thu Jun 1 22:17:51 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 1 Jun 2000 17:17:51 -0400 (EDT) Subject: [XML-SIG] Moving PyXML CVS tree to SourceForge? In-Reply-To: References: <200006012038.QAA10397@amarok.cnri.reston.va.us> Message-ID: <14646.54015.10027.382712@cj42289-a.reston1.va.home.com> Greg Stein writes: > Somebody would have to work on the email'd checkin notices. I don't know > that SourceForge does that automatically. It would simply be a matter of > extracting some of the files from /CVSROOT/ and moving them to the > new SF root. Barry has done this; look at current messages to python-checkins. You can look at CVSROOT/loginfo and CVSROOT/syncmail in the Python repository to see how he does it; you should be able to add the files into the repository just as any other addition, working with the CVSROOT module instead of the python (or whatever) module. I still need to set up the Grail repository there, but that's how I intend to do it. ;) -Fred -- Fred L. Drake, Jr. PythonLabs at BeOpen.com From gstein@lyra.org Thu Jun 1 22:23:48 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 1 Jun 2000 14:23:48 -0700 (PDT) Subject: [XML-SIG] Moving PyXML CVS tree to SourceForge? In-Reply-To: <14646.54015.10027.382712@cj42289-a.reston1.va.home.com> Message-ID: On Thu, 1 Jun 2000, Fred L. Drake, Jr. wrote: > Greg Stein writes: > > Somebody would have to work on the email'd checkin notices. I don't know > > that SourceForge does that automatically. It would simply be a matter of > > extracting some of the files from /CVSROOT/ and moving them to the > > new SF root. > > Barry has done this; look at current messages to python-checkins. > You can look at CVSROOT/loginfo and CVSROOT/syncmail in the Python > repository to see how he does it; you should be able to add the files > into the repository just as any other addition, working with the > CVSROOT module instead of the python (or whatever) module. Yup. That system works. It isn't quite as good as the one that I got from the Apache folks, though. The one on lyra will group changes *across* directories into a single notification. Con: it is written in Perl :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Thu Jun 1 22:28:06 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 1 Jun 2000 14:28:06 -0700 (PDT) Subject: [XML-SIG] PyExpat encoding (was: XML support in Python 1.6) In-Reply-To: <20000601160416.B10244@amarok.cnri.reston.va.us> Message-ID: On Thu, 1 Jun 2000, Andrew M. Kuchling wrote: > On Thu, Jun 01, 2000 at 12:56:28PM -0700, Greg Stein wrote: > >IMO, we should have a fixed output format, which is the Expat default: > >UTF-8. > > I don't know; it seems a bit odd to parse a Unicode string and then > have to convert from an 8-bit encoding back to Unicode in your > character data handlers, attributes, etc. The problem is that it's > also odd to parse a regular Python string and get back Unicode. > > OTOH, if Latin1-encoded XML has something like ޴> &unichar; in it, Unicode is the only thing it could possibly > return. Yes, Unicode is the only thing it can return. BUT: it can return it as a Unicode object, or as a UTF-8 encoded string. In other words, I think you're confusing the character set that Expat operates with (Unicode) with the encoding of that charset (UTF-8 or UTF-16; the latter is used by the Unicode object). > Maybe PyExpat could attempt to convert its Unicode output > into an 8-bit string (but using what encoding?), and only return > Unicode if it has to. > > Hmmm... on the third hand, XML is a Unicode based standard, and > sometimes returning Unicode and sometimes an 8-bit string is also > strange. Maybe it's best to just always return Unicode, and leave > further conversion to the caller. > > I think I'd go for the third option: always returning Unicode strings. Expat is characterized by its speed. Throwing conversions in there is not going to help. Yes, varying output is wrong. Expat's default is UTF-8. My recommendation is to use UTF-8. If somebody is adventurous, then they can add a flag to pyexpat that states what encoding to use for the callbacks: UTF-8 or UnicodeObs. But without that extra work, it "should" be UTF-8. Cheers, -g -- Greg Stein, http://www.lyra.org/ From akuchlin@mems-exchange.org Thu Jun 1 22:41:51 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Thu, 1 Jun 2000 17:41:51 -0400 Subject: [XML-SIG] PyExpat encoding In-Reply-To: ; from gstein@lyra.org on Thu, Jun 01, 2000 at 02:28:06PM -0700 References: <20000601160416.B10244@amarok.cnri.reston.va.us> Message-ID: <20000601174151.A10516@amarok.cnri.reston.va.us> On Thu, Jun 01, 2000 at 02:28:06PM -0700, Greg Stein wrote: >In other words, I think you're confusing the character set that Expat >operates with (Unicode) with the encoding of that charset (UTF-8 or >UTF-16; the latter is used by the Unicode object). Perhaps; I'm asking what's the Python type of the Python objects passed to callbacks used by Expat. >Expat is characterized by its speed. Throwing conversions in there is not >going to help. I thought Paul said Expat could be compiled to return 16-bit Unicode. Or... damn, does it return UCS-2 and we need UTF-16? In Expat 1.1, it looks to me that if you #define XML_UNICODE, and don't #define XML_UNICODE_WCHAR_T, Expat will return "UTF-16 encoded as unsigned shorts". Wouldn't that be just what we need to return Unicode objects? On the other hand, that means you can't use the system's copy of Expat, since who knows what it was compiled with? Actually, this seems like a bug in Expat; if I have an Expat library, I have no way of figuring out what it'll be outputting: C 'char's containing UTF-8, unsigned short holding UTF-16, or wchar_t holding UTF-16. (Argh, my head explodes every time character encodings come up.) -- A.M. Kuchling http://starship.python.net/crew/amk/ And if there's a moral there, I don't know what it is, save maybe that we should take our goodbyes whenever we can. -- Barbie, in SANDMAN #37: "I Woke Up and One of Us Was Crying" From gstein@lyra.org Thu Jun 1 22:53:32 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 1 Jun 2000 14:53:32 -0700 (PDT) Subject: [XML-SIG] PyExpat encoding In-Reply-To: <20000601174151.A10516@amarok.cnri.reston.va.us> Message-ID: On Thu, 1 Jun 2000, Andrew M. Kuchling wrote: > On Thu, Jun 01, 2000 at 02:28:06PM -0700, Greg Stein wrote: > >In other words, I think you're confusing the character set that Expat > >operates with (Unicode) with the encoding of that charset (UTF-8 or > >UTF-16; the latter is used by the Unicode object). > > Perhaps; I'm asking what's the Python type of the Python objects > passed to callbacks used by Expat. Right. I'm saying that it can be either, depending on how Expat was built. > >Expat is characterized by its speed. Throwing conversions in there is not > >going to help. > > I thought Paul said Expat could be compiled to return 16-bit Unicode. > Or... damn, does it return UCS-2 and we need UTF-16? xmlparse.h> In Expat 1.1, it looks to me that if you #define > XML_UNICODE, and don't #define XML_UNICODE_WCHAR_T, Expat will return > "UTF-16 encoded as unsigned shorts". Wouldn't that be just what we > need to return Unicode objects? Python is the same: UTF-16 encoded as unsigned shorts. > On the other hand, that means you can't use the system's copy of > Expat, since who knows what it was compiled with? Bingo. My point exactly. By default, Expat is going to be built using UTF-8 for the output. > Actually, this > seems like a bug in Expat; if I have an Expat library, I have no way > of figuring out what it'll be outputting: C 'char's containing UTF-8, > unsigned short holding UTF-16, or wchar_t holding UTF-16. (Argh, my > head explodes every time character encodings come up.) Eek. You're right. This can be determined at compile-time, so we can Do The Right Thing when building pyexpat. But things will be hosed if somebody drops in a libexpat.a that was compiled differently. Bleh. This says we should simply depend on it being compiled to output UTF-8, or we should include a copy of the library. The latter is already "not recommended" by the BDFL, so we can only assume that Expat will return UTF-8. This still doesn't discount pyexpat from having a setting to do a decoding on the UTF-8 text and calling into Python with Unicode obs. Cheers, -g -- Greg Stein, http://www.lyra.org/ From akuchlin@mems-exchange.org Thu Jun 1 23:01:55 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Thu, 1 Jun 2000 18:01:55 -0400 Subject: [XML-SIG] PyExpat encoding In-Reply-To: ; from gstein@lyra.org on Thu, Jun 01, 2000 at 02:53:32PM -0700 References: <20000601174151.A10516@amarok.cnri.reston.va.us> Message-ID: <20000601180155.D10516@amarok.cnri.reston.va.us> On Thu, Jun 01, 2000 at 02:53:32PM -0700, Greg Stein wrote: >Eek. You're right. This can be determined at compile-time, so we can Do >The Right Thing when building pyexpat. But things will be hosed if >somebody drops in a libexpat.a that was compiled differently. Is this something that should be reported to James Clark for fixing in the next version? >This still doesn't discount pyexpat from having a setting to do a decoding >on the UTF-8 text and calling into Python with Unicode obs. Good idea! Would it be an argument to .ParserCreate(), or to the .Parse[File]() methods of parser objects? --amk From dieter@handshake.de Thu Jun 1 22:55:13 2000 From: dieter@handshake.de (Dieter Maurer) Date: Thu, 1 Jun 2000 23:55:13 +0200 (CEST) Subject: [XML-SIG] XML support in Python 1.6 In-Reply-To: <14646.38440.220241.683691@cj42289-a.reston1.va.home.com> References: <14646.38440.220241.683691@cj42289-a.reston1.va.home.com> Message-ID: <14646.56146.147683.561706@lindm.dm> Fred L. Drake, Jr. writes: > We also need to decide what sort of API we want to publicize as part > of the standard library -- SAX or SAX2? Given the delay for Python > 1.6, it probably makes sense to include a saxlib module that > implements SAX2. Lars, does this still make sense to you? I am for SAX2. > We also need to determine how Unicode should be supported; should > the parser always produce Unicode strings, or UTF-8, and provide a > wrapper that converts everything? I prefer UTF-8. > Any other issues? Comments? Should we just drop Python 1.6 and > concentrate on Python 3000? :) I do not think so. Dieter From jeremy.kloth@fourthought.com Thu Jun 1 23:53:08 2000 From: jeremy.kloth@fourthought.com (Jeremy Kloth) Date: Thu, 01 Jun 2000 16:53:08 -0600 Subject: [XML-SIG] Path for 4XSLT 0.9.0 Message-ID: <3936E954.8D66A825@fourthought.com> In testing for the upcoming release of the 4Suite tools, we have came across a rather bad bug in the XSLT TextWriter code. Basically it was printing an end tag after a single < /> element tag. What follows is the diff of the file TextWriter.py --- removed line +++ changed line in function endElement() @@ -139,9 +142,11 @@ trace("End Element %s" % name) if self.__currElement: - self.__completeLastElement(1) + elementIsEmpty = self.__completeLastElement(1) + else: + elementIsEmpty = 0 if self.__outputParams.method != 'html' or (string.upper(name) not in HTML_FORBIDDEN_END): - text = '' + text = (not elementIsEmpty) and ('') or '' if self.__outputParams.indent == 'yes': self.__indent = self.__indent[:-2] if (self.__outputParams.method != 'html') or \ and in function __completeLastElement() @@ -181,6 +186,7 @@ else: self.__result = self.__result + '>' self.__nextNewLine = 0 + elementIsEmpty = 0 else: self.__result = self.__result + '>' self.__nextNewLine = 1 @@ -188,5 +194,5 @@ if self.__outputParams.indent == 'yes': self.__indent = self.__indent + ' ' self.__currElement = None - return self.__currElement + return elementIsEmpty Sorry for any problems that this has caused. -- Jeremy Kloth Consultant jeremy.kloth@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://www.fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tpassin@home.com Fri Jun 2 03:11:11 2000 From: tpassin@home.com (tpassin@home.com) Date: Thu, 1 Jun 2000 22:11:11 -0400 Subject: [XML-SIG] PyExpat encoding References: Message-ID: <005201bfcc37$d3180ea0$7cac1218@reston1.va.home.com> With all the talk about default encodings, compiling for different encodings, and passing "unicode" to Python objects, I'm losing track -or my grip :) -. Do these considerations affect either pyexpat or other python XML code in their ability to handle the basic required encodings, per the XML 1.0 Rec: "All XML processors must accept the UTF-8 and UTF-16 encodings of 10646" also, "In the absence of information provided by an external transport protocol (e.g. HTTP or MIME), it is an error for an entity including an encoding declaration to be presented to the XML processor in an encoding other than that named in the declaration, for an encoding declaration to occur other than at the beginning of an external entity, or for an entity which begins with neither a Byte Order Mark nor an encoding declaration to use an encoding other than UTF-8. Note that since ASCII is a subset of UTF-8, ordinary ASCII entities do not strictly need an encoding declaration." Since there is a lot of XML out there without encoding declarations, it would seem that UTF-8 would HAVE to be used as the default. I admit, I can't find anything in the Rec that says what encoding a processor must use to send results to other pieces of code. And don't these other pieces of code also constitute XML processors, so that they should follow the same rules? I'd appreciate some enlightment in this area - if expat/pyexpat are compiled to "use" encoding X, how does this fact interact with the Rec's requirements? Or doesn't it? Tom Passin Greg Stein (and lots of others) wrote: > On Thu, 1 Jun 2000, Andrew M. Kuchling wrote: > > On Thu, Jun 01, 2000 at 02:28:06PM -0700, Greg Stein wrote: > > >In other words, I think you're confusing the character set that Expat > > >operates with (Unicode) with the encoding of that charset (UTF-8 or > > >UTF-16; the latter is used by the Unicode object). > > > > Perhaps; I'm asking what's the Python type of the Python objects > > passed to callbacks used by Expat. > > Right. I'm saying that it can be either, depending on how Expat was built. > > > >Expat is characterized by its speed. Throwing conversions in there is not > > >going to help. > > > > I thought Paul said Expat could be compiled to return 16-bit Unicode. > > Or... damn, does it return UCS-2 and we need UTF-16? > xmlparse.h> In Expat 1.1, it looks to me that if you #define > > XML_UNICODE, and don't #define XML_UNICODE_WCHAR_T, Expat will return > > "UTF-16 encoded as unsigned shorts". Wouldn't that be just what we > > need to return Unicode objects? > > Python is the same: UTF-16 encoded as unsigned shorts. > > > On the other hand, that means you can't use the system's copy of > > Expat, since who knows what it was compiled with? > > Bingo. My point exactly. By default, Expat is going to be built using > UTF-8 for the output. > > > Actually, this > > seems like a bug in Expat; if I have an Expat library, I have no way > > of figuring out what it'll be outputting: C 'char's containing UTF-8, > > unsigned short holding UTF-16, or wchar_t holding UTF-16. (Argh, my > > head explodes every time character encodings come up.) > > Eek. You're right. This can be determined at compile-time, so we can Do > The Right Thing when building pyexpat. But things will be hosed if > somebody drops in a libexpat.a that was compiled differently. > > Bleh. This says we should simply depend on it being compiled to output > UTF-8, or we should include a copy of the library. The latter is already > "not recommended" by the BDFL, so we can only assume that Expat will > return UTF-8. > > This still doesn't discount pyexpat from having a setting to do a decoding > on the UTF-8 text and calling into Python with Unicode obs. > From ke@gnu.franken.de Fri Jun 2 03:59:14 2000 From: ke@gnu.franken.de (Karl Eichwalder) Date: 02 Jun 2000 04:59:14 +0200 Subject: [XML-SIG] PyXML-0.5.4 demo/genxml/loaddata.py Message-ID: Most demos I tried are working; but `python demo/genxml/loaddata.py yyy' failed for me: Traceback (innermost last): File "/usr/doc/packages/pyxml/demo/genxml/loaddata.py", line 243, in ? main() File "/usr/doc/packages/pyxml/demo/genxml/loaddata.py", line 58, in main processor.run() File "/usr/doc/packages/pyxml/demo/genxml/loaddata.py", line 95, in run rec = self.getNextRecord() File "/usr/doc/packages/pyxml/demo/genxml/loaddata.py", line 107, in getNextRecord lname, fname, eid, mid = parts ValueError: unpack list of wrong size ke@tux:~/Projects/py_xml/BUILD/PyXML-0.5.4 > cat yyy hallo, welt, hallo, welt hallo, welt, hallo, welt Nevertheless I'll add PyXML-0.5.4 as an update to SuSE Linux. I hope this is okay with you. -- work : ke@suse.de | : http://www.suse.de/~ke/ | ------ ,__o home : ke@gnu.franken.de | ------ _-\_<, : http://www.franken.de/users/gnu/ke/ | ------ (*)/'(*) From ke@gnu.franken.de Fri Jun 2 03:56:36 2000 From: ke@gnu.franken.de (Karl Eichwalder) Date: 02 Jun 2000 04:56:36 +0200 Subject: [XML-SIG] Re: Trying to install (Re: ANN: 4XPath 0.9.0 and 4XSLT 0.9.0) In-Reply-To: Uche Ogbuji's message of "Thu, 01 Jun 2000 14:02:21 -0600" References: <200006012002.OAA10121@localhost.localdomain> Message-ID: Uche Ogbuji writes: > > SuSE Linux comes with Python XML Tools 0.5.1; are they new enough? > > 4DOM.html doesn't specify a special version. > > 0.5.1 should be new enough, although we only tested with 0.5.4. Thanks for this info. I'll update the package for SuSE Linux. I never saw 0.5.4 officially released; amk flagged it as an "release candidate": From: "Andrew M. Kuchling" Subject: 0.5.4 release candidate To: xml-sig@python.org Date: Fri, 21 Apr 2000 12:12:51 -0400 (EDT) Demos coming with 0.5.4 are mostly working; I'll issue a separate report. > The main problem will be that the PyXML package will contain a dom > directory to be installed as /usr/lib/python1.5/site-packages/xml/dom. > 4DOM now replaces the PyDOM from PyXML and so neds to go into the same > directory. This will, of course, make RPM unhappy. We solved this by > putting together a PyXML 0.5.4 RPM that excludes DOM, over which the > 4DOM ROM can be installed. You can find it at > ftp://ftp.fourthought.com/pub/mirrors/python4linux/redhat/i386/python-xml-nodom > -0.5.4-1.i386.rpm > ftp://ftp.fourthought.com/pub/mirrors/python4linux/redhat/SRPMS/python-xml-nodo > m-0.5.4-1.src.rpm Thanks for this service. > You might want to either combine this with 4DOM (something the XML SIG be > doing soon anyway), or use a nodom version for SuSE. Thanks for the pointer. I'll do something alogn these lines (depending on our "feature freeze" for 7.0). > [...] we're > going to work this weekeng to get 4DOM 0.10.1 and 4XSLT/4XPath 0.9.1 out. Thanks. I can wait :) -- work : ke@suse.de | : http://www.suse.de/~ke/ | ------ ,__o home : ke@gnu.franken.de | ------ _-\_<, : http://www.franken.de/users/gnu/ke/ | ------ (*)/'(*) From paul@prescod.net Fri Jun 2 03:57:48 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 01 Jun 2000 21:57:48 -0500 Subject: [XML-SIG] PyExpat encoding References: <005201bfcc37$d3180ea0$7cac1218@reston1.va.home.com> Message-ID: <393722AC.5C2EEFD0@prescod.net> tpassin@home.com wrote: > > With all the talk about default encodings, compiling for different > encodings, and passing "unicode" to Python objects, I'm losing track -or my > grip :) -. Do these considerations affect either pyexpat or other python > XML code in their ability to handle the basic required encodings, per the > XML 1.0 Rec: No, Expat is great at reading multiple encodings on input. The question is what you get on output. Basically, we can compile it to give us 8-bit or 16-bit output. 8-bit is expat's default (which is why it is Greg's preference). 16-bit is Python's default (which is why it is my preference). I mean if Python has a Unicode object and we choose to encode Unicode text in hacked-up 8-bit strings then I think that there is something seriously awry somewhere in the system. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Simplicity does not precede complexity, but follows it. - http://www.cs.yale.edu/~perlis-alan/quotes.html From paul@prescod.net Fri Jun 2 04:23:38 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 01 Jun 2000 22:23:38 -0500 Subject: [XML-SIG] Moving PyXML CVS tree to SourceForge? References: <200006012038.QAA10397@amarok.cnri.reston.va.us> Message-ID: <393728BA.9AA20BDA@prescod.net> "Andrew M. Kuchling" wrote: > > What do people think of the idea of moving the PyXML CVS tree to > SourceForge? That makes adding new people with commit privileges is > easier, and we wouldn't need to bug Greg Stein (who currently hosts > the CVS tree) when that happens. Downsides are, we're putting more > eggs in the SourceForge basket -- but then we're hardly alone in > that. I think that there is an interesting philosophical and social issue here. If we give them our data, shouldn't we be able to get it back in a manner that is easy to back up and re-locate? Perhaps people should start demanding that SourceForge make all information associated with a project available as a big tarfile of source, web pages, and XML-ized database metadata. That data could be backed up by anyone who wanted to do so -- even someone setting up a SourceForge competitor. I mean doesn't it seem odd to swap closed source for information hostage on someone else's computer system? Of course SourceForge doesn't stop you from getting at your source code but there is a lot of content and metadata around it (mailing lists and so forth) that is also valuable and I don't think that there is any easy way to back that all up. If there were, there would be very little danger in putting all of our eggs in the SourceForge basket. We would just install SourceForgeSource.tgz and then do an import python-xml-everything.tgz using some import feature. Okay, this doesn't solve any immediate problems, but it is worth thinking about as SourceForge users. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself At the same moment that the Justice Department and the Federal Trade Commission are trying to restrict the negative consequences of monopoly, the Commerce Department and the Congress are helping to define new intellectual property rights, rights that have a significant potential to create new monopolies. This is the policy equivalent of arm-wrestling with yourself. - http://www.salon.com/tech/feature/2000/04/07/greenspan/index.html From paul@prescod.net Fri Jun 2 04:54:59 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 01 Jun 2000 22:54:59 -0500 Subject: [XML-SIG] PyExpat encoding References: <20000601160416.B10244@amarok.cnri.reston.va.us> <20000601174151.A10516@amarok.cnri.reston.va.us> Message-ID: <39373013.55B97E41@prescod.net> "Andrew M. Kuchling" wrote: > > ... > > On the other hand, that means you can't use the system's copy of > Expat, since who knows what it was compiled with? Actually, this > seems like a bug in Expat; if I have an Expat library, I have no way > of figuring out what it'll be outputting: Adding this feature doesn't sound too tough. We should concentrate on what we want because the implementation doesn't sound too brutal. I don't see how we can in good conscience choose not to use Python's Unicode type. I am not averse, however, to a flag that returns 8-bit strings instead. We can use the Unicode object's features do that easily. So how about, this: we ask Expat 1.1000000001 (our new version) what encoding it was compiled with. We can even expose this to the Python programmer. parser.nativeEncoding() -> returns "UTF-8" or "UTF-16" There is an independent flag that controls the encoding and type of the returned objects. You get Unicode objects by default. If you want 8-bit strings, you specifically ask for them. parser.requestUTF8( ) 97% of programmers will never ask Expat what encoding it is using under the cover nor will they change the flag to get 8-bit strings. Docs say: "Unless you know what you are doing, leave these methods alone. They are for performance freaks who know what they are doing only." A performance freak would probably write code like this: if parser.nativeEncoding()=="UTF-8": parser.requestUTF8() Now managing the internationalization of the code is their problem. The Windows binaries should come with a 16-bit-returing Expat. Still and all, this is getting more complex than just bundling our favorite version of Expat with the compile flags set the way we want them!!! -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Simplicity does not precede complexity, but follows it. - http://www.cs.yale.edu/~perlis-alan/quotes.html From gstein@lyra.org Fri Jun 2 06:31:24 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 1 Jun 2000 22:31:24 -0700 (PDT) Subject: [XML-SIG] Moving PyXML CVS tree to SourceForge? In-Reply-To: <393728BA.9AA20BDA@prescod.net> Message-ID: On Thu, 1 Jun 2000, Paul Prescod wrote: >... > I think that there is an interesting philosophical and social issue > here. If we give them our data, shouldn't we be able to get it back in a > manner that is easy to back up and re-locate? Who says that we can't get it back? The rest of your mail seems to have rather unsupported speculation :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From paul@prescod.net Fri Jun 2 05:00:57 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 01 Jun 2000 23:00:57 -0500 Subject: [XML-SIG] XML support in Python 1.6 References: <14646.38440.220241.683691@cj42289-a.reston1.va.home.com> Message-ID: <39373179.5E9AE360@prescod.net> "Fred L. Drake, Jr." wrote: > > ... > > There's a note that this can be done manually until James Clark adds > it to the Unix build -- has *anyone* been in contact with James about > this? Guido will not add the expat sources to the Python tree, since > that just creates maintenance headaches. Let me ask more formally for opinions on this point. If we use the Expat sources *with no changes* and then in the Python build we set an environment variable and call "make" on the Expat subtree, what maintenance headaches have we caused ourselves? The PyXML distribution does this today and as far as I know it has been the least of our problems. Plus, we are going to have a dependency on a pretty new version of Expat. I just don't see what the big deal is. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Simplicity does not precede complexity, but follows it. - http://www.cs.yale.edu/~perlis-alan/quotes.html From gstein@lyra.org Fri Jun 2 06:36:17 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 1 Jun 2000 22:36:17 -0700 (PDT) Subject: [XML-SIG] PyExpat encoding In-Reply-To: <005201bfcc37$d3180ea0$7cac1218@reston1.va.home.com> Message-ID: Expat will accept either encoding for the text that it *consumes*. The discussion point is about what kind of objects are passed to the Handlers from the Expat parser. Are those objects UTF-8 strings or Unicode objects? Cheers, -g On Thu, 1 Jun 2000 tpassin@home.com wrote: > With all the talk about default encodings, compiling for different > encodings, and passing "unicode" to Python objects, I'm losing track -or my > grip :) -. Do these considerations affect either pyexpat or other python > XML code in their ability to handle the basic required encodings, per the > XML 1.0 Rec: > > "All XML processors must accept the UTF-8 and UTF-16 encodings of 10646" > > also, > > "In the absence of information provided by an external transport protocol > (e.g. HTTP or MIME), it is an error for an entity including an encoding > declaration to be presented to the XML processor in an encoding other than > that named in the declaration, for an encoding declaration to occur other > than at the beginning of an external entity, or for an entity which begins > with neither a Byte Order Mark nor an encoding declaration to use an > encoding other than UTF-8. Note that since ASCII is a subset of UTF-8, > ordinary ASCII entities do not strictly need an encoding declaration." > > Since there is a lot of XML out there without encoding declarations, it > would seem that UTF-8 would HAVE to be used as the default. I admit, I > can't find anything in the Rec that says what encoding a processor must use > to send results to other pieces of code. And don't these other pieces of > code also constitute XML processors, so that they should follow the same > rules? > > I'd appreciate some enlightment in this area - if expat/pyexpat are compiled > to "use" encoding X, how does this fact interact with the Rec's > requirements? Or doesn't it? > > Tom Passin > > Greg Stein (and lots of others) wrote: > > > On Thu, 1 Jun 2000, Andrew M. Kuchling wrote: > > > On Thu, Jun 01, 2000 at 02:28:06PM -0700, Greg Stein wrote: > > > >In other words, I think you're confusing the character set that Expat > > > >operates with (Unicode) with the encoding of that charset (UTF-8 or > > > >UTF-16; the latter is used by the Unicode object). > > > > > > Perhaps; I'm asking what's the Python type of the Python objects > > > passed to callbacks used by Expat. > > > > Right. I'm saying that it can be either, depending on how Expat was built. > > > > > >Expat is characterized by its speed. Throwing conversions in there is > not > > > >going to help. > > > > > > I thought Paul said Expat could be compiled to return 16-bit Unicode. > > > Or... damn, does it return UCS-2 and we need UTF-16? > > xmlparse.h> In Expat 1.1, it looks to me that if you #define > > > XML_UNICODE, and don't #define XML_UNICODE_WCHAR_T, Expat will return > > > "UTF-16 encoded as unsigned shorts". Wouldn't that be just what we > > > need to return Unicode objects? > > > > Python is the same: UTF-16 encoded as unsigned shorts. > > > > > On the other hand, that means you can't use the system's copy of > > > Expat, since who knows what it was compiled with? > > > > Bingo. My point exactly. By default, Expat is going to be built using > > UTF-8 for the output. > > > > > Actually, this > > > seems like a bug in Expat; if I have an Expat library, I have no way > > > of figuring out what it'll be outputting: C 'char's containing UTF-8, > > > unsigned short holding UTF-16, or wchar_t holding UTF-16. (Argh, my > > > head explodes every time character encodings come up.) > > > > Eek. You're right. This can be determined at compile-time, so we can Do > > The Right Thing when building pyexpat. But things will be hosed if > > somebody drops in a libexpat.a that was compiled differently. > > > > Bleh. This says we should simply depend on it being compiled to output > > UTF-8, or we should include a copy of the library. The latter is already > > "not recommended" by the BDFL, so we can only assume that Expat will > > return UTF-8. > > > > This still doesn't discount pyexpat from having a setting to do a decoding > > on the UTF-8 text and calling into Python with Unicode obs. > > > > > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig > -- Greg Stein, http://www.lyra.org/ From paul@prescod.net Fri Jun 2 05:07:28 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 01 Jun 2000 23:07:28 -0500 Subject: [XML-SIG] Moving PyXML CVS tree to SourceForge? References: Message-ID: <39373300.65DB8558@prescod.net> Greg Stein wrote: > > On Thu, 1 Jun 2000, Paul Prescod wrote: > >... > > I think that there is an interesting philosophical and social issue > > here. If we give them our data, shouldn't we be able to get it back in a > > manner that is easy to back up and re-locate? > > Who says that we can't get it back? SourceForge provides the following services: * mailing list * CVS * project management * bug tracking * discussion forums * shell/ftp access Can you get back ALL of the information relating to all of these facets of project development? Even the project management, mailing list management, bug tracking and discussion forums? I mean without scraping HTML screens? If so, bravo, forget I opened my mouth. If not, I think that there is some reason to be concerned that in the middle of a project where you depend on these services they could disappear with the metadata necessary to rebuild them. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Simplicity does not precede complexity, but follows it. - http://www.cs.yale.edu/~perlis-alan/quotes.html From larsga@garshol.priv.no Fri Jun 2 10:08:16 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 02 Jun 2000 11:08:16 +0200 Subject: [XML-SIG] Moving PyXML CVS tree to SourceForge? In-Reply-To: <200006012038.QAA10397@amarok.cnri.reston.va.us> References: <200006012038.QAA10397@amarok.cnri.reston.va.us> Message-ID: * Andrew M. Kuchling | | What do people think of the idea of moving the PyXML CVS tree to | SourceForge? That's fine with me. --Lars M. From larsga@garshol.priv.no Fri Jun 2 11:22:39 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 02 Jun 2000 12:22:39 +0200 Subject: [XML-SIG] PyExpat encoding (was: XML support in Python 1.6) In-Reply-To: References: Message-ID: Here is my take on this: - the entire XML data model is based on Unicode and we should just accept that rather than try to work against it - since Python 1.6 supports Unicode directly we should exploit that; especially since mixing ordinary and Unicode strings seems to be painless (in other words: the fact that you get Unicode strings should be more or less invisible to you unless you actively care) - I can't imagine why anyone would want ordinary strings with UTF-8 encoded text in them; but if someone can come up with a convincing use case we should support that as well Conclusion: - if Python version is lower than 1.6, we should just do what we do today: return UTF-8 encoded normal strings - if not, return Unicode objects - I have no problems with adding a run-time configuration option to expat that allows users to say 'parser.set_return_unicode(0)'. - there should probably also be a 'parser.get_return_unicode()' so that applications can check what is going on The real question is of course who will do the actual work of adding this... :-) --Lars M. From larsga@garshol.priv.no Fri Jun 2 11:29:45 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 02 Jun 2000 12:29:45 +0200 Subject: [XML-SIG] XML support in Python 1.6 In-Reply-To: <14646.38440.220241.683691@cj42289-a.reston1.va.home.com> References: <14646.38440.220241.683691@cj42289-a.reston1.va.home.com> Message-ID: * Fred L. Drake, Jr. | | We also need to decide what sort of API we want to publicize as part | of the standard library -- SAX or SAX2? Given the delay for Python | 1.6, it probably makes sense to include a saxlib module that | implements SAX2. Lars, does this still make sense to you? Yes, it very much does. We might have to leave out LexicalHandler and DeclHandler, but the rest should go in. If you can give me some kind of estimation as to what dates I need to take into account I can do this. --Lars M. From tpassin@home.com Fri Jun 2 13:29:29 2000 From: tpassin@home.com (tpassin@home.com) Date: Fri, 2 Jun 2000 08:29:29 -0400 Subject: [XML-SIG] PyExpat encoding References: <005201bfcc37$d3180ea0$7cac1218@reston1.va.home.com> <393722AC.5C2EEFD0@prescod.net> Message-ID: <006a01bfcc8e$3280fe00$7cac1218@reston1.va.home.com> Thank you, Paul, that clarified it for me. Tom Passin Paul Prescod wrote > No, Expat is great at reading multiple encodings on input. The question > is what you get on output. Basically, we can compile it to give us 8-bit > or 16-bit output. 8-bit is expat's default (which is why it is Greg's > preference). 16-bit is Python's default (which is why it is my > preference). > > I mean if Python has a Unicode object and we choose to encode Unicode > text in hacked-up 8-bit strings then I think that there is something > seriously awry somewhere in the system. From tpassin@home.com Fri Jun 2 13:46:58 2000 From: tpassin@home.com (tpassin@home.com) Date: Fri, 2 Jun 2000 08:46:58 -0400 Subject: [XML-SIG] PyExpat encoding References: <20000601160416.B10244@amarok.cnri.reston.va.us> <20000601174151.A10516@amarok.cnri.reston.va.us> <39373013.55B97E41@prescod.net> Message-ID: <007c01bfcc90$a3edf0a0$7cac1218@reston1.va.home.com> I think this is a good approach. In other words, the system does what you usually want if you do nothing to tell it different; you can tell it to do the other things you want if you need to; and you can find out the configuration. Perfect. Using native Python unicode objects also makes sense. The one potential downside - version skew because we might need to use a special version of expat - may not be too bad. After all, you could make sure that pyexpat always uses the copy of expat that lives in the Python library. And there is already a potential version issue with all the other extensions (like tkinter) because they need to be compiled for the right version of Python as well as the right version of their target c program. I've been bit by this a few times. So why would pyexpat/expat be different in this regard than any other extension? Tom Passin Paul Prescod suggests: > I don't see how we can in good conscience choose not to use Python's > Unicode type. I am not averse, however, to a flag that returns 8-bit > strings instead. We can use the Unicode object's features do that > easily. > > So how about, this: we ask Expat 1.1000000001 (our new version) what > encoding it was compiled with. We can even expose this to the Python > programmer. > > parser.nativeEncoding() -> returns "UTF-8" or "UTF-16" > > There is an independent flag that controls the encoding and type of the > returned objects. You get Unicode objects by default. If you want 8-bit > strings, you specifically ask for them. > > parser.requestUTF8( ) > > 97% of programmers will never ask Expat what encoding it is using under > the cover nor will they change the flag to get 8-bit strings. Docs say: > "Unless you know what you are doing, leave these methods alone. They are > for performance freaks who know what they are doing only." > > A performance freak would probably write code like this: > > if parser.nativeEncoding()=="UTF-8": > parser.requestUTF8() > > Now managing the internationalization of the code is their problem. > > The Windows binaries should come with a 16-bit-returing Expat. > > Still and all, this is getting more complex than just bundling our > favorite version of Expat with the compile flags set the way we want > them!!! > From akuchlin@mems-exchange.org Fri Jun 2 15:05:26 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Fri, 2 Jun 2000 10:05:26 -0400 Subject: [XML-SIG] XML support in Python 1.6 In-Reply-To: <39373179.5E9AE360@prescod.net>; from paul@prescod.net on Thu, Jun 01, 2000 at 11:00:57PM -0500 References: <14646.38440.220241.683691@cj42289-a.reston1.va.home.com> <39373179.5E9AE360@prescod.net> Message-ID: <20000602100526.C10960@amarok.cnri.reston.va.us> On Thu, Jun 01, 2000 at 11:00:57PM -0500, Paul Prescod wrote: >The PyXML distribution does this today and as far as I know it has been >the least of our problems. Plus, we are going to have a dependency on a >pretty new version of Expat. I just don't see what the big deal is. GvR doesn't like including sizable chunks of outside code in the Python distribution. He doesn't care about what gets added to the PyXML tree, of course. If Expat had a call to identify what it was compiled to return, we could handle all 4 cases: 1) Expat returns UTF-8, PyExpat user wants UTF-8 regular strings 2) Expat returns UTF-8, PyExpat user wants Unicode strings 3) Expat returns UTF-16, PyExpat user wants UTF-8 regular strings 4) Expat returns UTF-16, PyExpat user wants Unicode strings In cases 1 and 3 no extra work is needed; in cases 2 and 4 the PyExpat module would have to perform extra work and take a performance hit if the system Expat library was compiled with the wrong output. But if GvR relents and allows incorporating Expat's code later, that copy could then be compiled any way we like. So, I propose we ask James Clark to add a C function to determine how Expat was compiled, and then follow Paul's suggested interface: parser.nativeEncoding() -> returns "UTF-8" or "UTF-16" parser.requestUTF8( ) causes the parser to return UTF-8-encoded 8-bit strings; by default Unicode strings will be returned. Three questions: * Can you call parser.requestUTF8() at any point, even after parsing has started? (I see no reason to forbid this, though it would be strange.) * Do we need a .requestUTF16() or .requestUnicode() method to switch things back? Or should it be very general, with .requestOutputEncoding('iso-8859-1' or whatever) instead of just .requestUTF8? * What do we assume for old versions of Expat? I guess all we can do is assume UTF-8, and trust that the strangeness will be apparent if it was compiled for UTF-16. If this is approved, I'll implement it this weekend. -- A.M. Kuchling http://starship.python.net/crew/amk/ There are no gryphons, no wyverns, no winged horses in the waking world, raven. Not anymore. But we are here... -- The gryphon at the door, in SANDMAN #57: "The Kindly Ones:1" From akuchlin@mems-exchange.org Fri Jun 2 15:16:38 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Fri, 2 Jun 2000 10:16:38 -0400 Subject: [XML-SIG] Moving PyXML CVS tree to SourceForge? In-Reply-To: <39373300.65DB8558@prescod.net>; from paul@prescod.net on Thu, Jun 01, 2000 at 11:07:28PM -0500 References: <39373300.65DB8558@prescod.net> Message-ID: <20000602101638.D10960@amarok.cnri.reston.va.us> On Thu, Jun 01, 2000 at 11:07:28PM -0500, Paul Prescod wrote: >Can you get back ALL of the information relating to all of these facets >of project development? Even the project management, mailing list Not as far as I know. I'll bring it up somewhere at SourceForge, and see what they say. Web pages and FTP are easy to mirror; I don't know of ways to get the CVS tree or mailing list archives. Clearly the most critical item there is the CVS data. Is this issue worth postponing a move to SF, though? -- A.M. Kuchling http://starship.python.net/crew/amk/ Little one, I would like to see anyone -- prophet, king or God -- persuade a thousand cats to do anything at the same time. -- The cynical cat, in SANDMAN #18: "A Dream of a Thousand Cats" From larsga@garshol.priv.no Fri Jun 2 15:19:54 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 02 Jun 2000 16:19:54 +0200 Subject: [XML-SIG] XML support in Python 1.6 In-Reply-To: <20000602100526.C10960@amarok.cnri.reston.va.us> References: <14646.38440.220241.683691@cj42289-a.reston1.va.home.com> <39373179.5E9AE360@prescod.net> <20000602100526.C10960@amarok.cnri.reston.va.us> Message-ID: * Andrew M. Kuchling | | * Can you call parser.requestUTF8() at any point, even after | parsing has started? (I see no reason to forbid this, though it | would be strange.) I would be happier if it were forbidden. SAX 2.0 explicitly forbids this kind of thing and I can see no reason to allow it. If the rest of you decide otherwise I have no real problems with that, though. | * Do we need a .requestUTF16() or .requestUnicode() method to | switch things back? I assume that this means that pyexpat will return Unicode strings as default regardless of how it is compiled? If so, I don't really have any opinion on this. | Or should it be very general, with | .requestOutputEncoding('iso-8859-1' or whatever) instead of just | .requestUTF8? If there turns out to be a huge demand we can add that later. In any case, Python 1.6 should make this possible to implement for the user. | * What do we assume for old versions of Expat? I guess all we | can do is assume UTF-8, and trust that the strangeness will | be apparent if it was compiled for UTF-16. Sounds reasonable. | If this is approved, I'll implement it this weekend. Great! --Lars M. From uche.ogbuji@fourthought.com Fri Jun 2 15:38:20 2000 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Fri, 02 Jun 2000 08:38:20 -0600 Subject: [XML-SIG] [Fwd: [4suite] Patch for 4XSLT 0.9.0] Message-ID: <3937C6DC.1ABBAD5A@fourthought.com> -------- Original Message -------- Subject: [4suite] Patch for 4XSLT 0.9.0 Date: Thu, 01 Jun 2000 16:53:56 -0600 From: Jeremy Kloth Organization: Fourthought, Inc To: 4suite@dollar.fourthought.com In testing for the upcoming release of the 4Suite tools, we have came across a rather bad bug in the XSLT TextWriter code. Basically it was printing an end tag after a single < /> element tag. What follows is the diff of the file TextWriter.py --- removed line +++ changed line in function endElement() @@ -139,9 +142,11 @@ trace("End Element %s" % name) if self.__currElement: - self.__completeLastElement(1) + elementIsEmpty = self.__completeLastElement(1) + else: + elementIsEmpty = 0 if self.__outputParams.method != 'html' or (string.upper(name) not in HTML_FORBIDDEN_END): - text = '' + text = (not elementIsEmpty) and ('') or '' if self.__outputParams.indent == 'yes': self.__indent = self.__indent[:-2] if (self.__outputParams.method != 'html') or \ and in function __completeLastElement() @@ -181,6 +186,7 @@ else: self.__result = self.__result + '>' self.__nextNewLine = 0 + elementIsEmpty = 0 else: self.__result = self.__result + '>' self.__nextNewLine = 1 @@ -188,5 +194,5 @@ if self.__outputParams.indent == 'yes': self.__indent = self.__indent + ' ' self.__currElement = None - return self.__currElement + return elementIsEmpty Sorry for any problems that this has caused. -- Jeremy Kloth Consultant jeremy.kloth@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://www.fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python _______________________________________________ 4suite mailing list 4suite@lists.fourthought.com http://lists.fourthought.com/mailman/listinfo/4suite From gstein@lyra.org Fri Jun 2 16:17:32 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 2 Jun 2000 08:17:32 -0700 (PDT) Subject: [XML-SIG] PyExpat changes for encoding (was: XML support in Python 1.6) In-Reply-To: <20000602100526.C10960@amarok.cnri.reston.va.us> Message-ID: On Fri, 2 Jun 2000, Andrew M. Kuchling wrote: >... > parser.nativeEncoding() -> returns "UTF-8" or "UTF-16" pyexpat.native_encoding as a readonly attribute. I see no particular use in making it a function. (Note the module-level, too!) > parser.requestUTF8( ) causes the parser to return UTF-8-encoded 8-bit > strings; by default Unicode strings will be returned. parser.returns_unicode as an 0/1-valued attribute (1 is the default) Again: no need for a function, and the attribute solves both the get and set cases. An alternate would be a .output_encoding attribute that is string-valued. >... > * What do we assume for old versions of Expat? I guess all we > can do is assume UTF-8, and trust that the strangeness will > be apparent if it was compiled for UTF-16. Agreed -- I think you need to assume UTF-8. Cheers, -g -- Greg Stein, http://www.lyra.org/ From paul@prescod.net Fri Jun 2 17:07:06 2000 From: paul@prescod.net (Paul Prescod) Date: Fri, 02 Jun 2000 11:07:06 -0500 Subject: [XML-SIG] XML support in Python 1.6 References: <200006011943.NAA10072@localhost.localdomain> Message-ID: <3937DBAA.38E86@prescod.net> Uche Ogbuji wrote: > > ... > > Here's a loud vote for SAX2. Also, depending on how long the delay is, can we > get minidom in? As usual, the problem with minidom is figuring out what to do about cyclic trash! I was expecting cyclic garbage collection to get into 1.6 but that seems up in the air right now. Why don't I clean up a version of minidom that doesn't have parent pointers by default and we'll "turn on" the parent pointers if we get cyclic trash collection into 1.6. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Simplicity does not precede complexity, but follows it. - http://www.cs.yale.edu/~perlis-alan/quotes.html From akuchlin@mems-exchange.org Fri Jun 2 20:05:26 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Fri, 2 Jun 2000 15:05:26 -0400 Subject: [XML-SIG] PyExpat changes for encoding In-Reply-To: ; from gstein@lyra.org on Fri, Jun 02, 2000 at 08:17:32AM -0700 References: <20000602100526.C10960@amarok.cnri.reston.va.us> Message-ID: <20000602150526.H11802@amarok.cnri.reston.va.us> On Fri, Jun 02, 2000 at 08:17:32AM -0700, Greg Stein wrote: >pyexpat.native_encoding as a readonly attribute. I see no particular use >in making it a function. (Note the module-level, too!) >parser.returns_unicode as an 0/1-valued attribute (1 is the default) What's the reaction to Greg's alternative proposal? I need to know what to implement... --amk From larsga@garshol.priv.no Sat Jun 3 12:37:28 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 03 Jun 2000 13:37:28 +0200 Subject: [XML-SIG] PyExpat changes for encoding In-Reply-To: <20000602150526.H11802@amarok.cnri.reston.va.us> References: <20000602100526.C10960@amarok.cnri.reston.va.us> <20000602150526.H11802@amarok.cnri.reston.va.us> Message-ID: * Greg Stein | | pyexpat.native_encoding as a readonly attribute. I see no particular use | in making it a function. (Note the module-level, too!) | parser.returns_unicode as an 0/1-valued attribute (1 is the default) * Andrew M. Kuchling | | What's the reaction to Greg's alternative proposal? I need to know | what to implement... I prefer it to the other proposals. Attributes are simpler and also fit better with the existing interface. --Lars M. From jeremy@beopen.com Sat Jun 3 17:14:30 2000 From: jeremy@beopen.com (Jeremy Hylton) Date: Sat, 3 Jun 2000 12:14:30 -0400 Subject: [XML-SIG] XML support in Python 1.6 In-Reply-To: <3937DBAA.38E86@prescod.net> Message-ID: Paul Prescod wrote: >As usual, the problem with minidom is figuring out what to do about >cyclic trash! I was expecting cyclic garbage collection to get into 1.6 >but that seems up in the air right now. I think it will go in, provided that we can get it debugged. If you have XML code that generates cyclic garbage, it would be great if you could test it with a version of Python that includes the GC patch. Jeremy From paul@prescod.net Sat Jun 3 19:50:25 2000 From: paul@prescod.net (Paul Prescod) Date: Sat, 03 Jun 2000 13:50:25 -0500 Subject: [XML-SIG] PyExpat changes for encoding References: <20000602100526.C10960@amarok.cnri.reston.va.us> <20000602150526.H11802@amarok.cnri.reston.va.us> Message-ID: <39395371.99513332@prescod.net> "Andrew M. Kuchling" wrote: > > ... > > What's the reaction to Greg's alternative proposal? I need to know > what to implement... Properties instead of methods sounds fine to me. Same with the presumption of expat default. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Simplicity does not precede complexity, but follows it. - http://www.cs.yale.edu/~perlis-alan/quotes.html From pwolff@cox.rr.com Sat Jun 3 23:04:37 2000 From: pwolff@cox.rr.com (Greg Wolff) Date: Sat, 03 Jun 2000 18:04:37 -0400 Subject: [XML-SIG] A usage scenario for Python and XML... Message-ID: <393980F5.B6973F63@cox.rr.com> Hello, I've been lurking, I mean following, the discussion of Python and XML implementations on the Python XML Sig list for a while now. I had one particular email exchange with Fred Drake about the SAX versus SAX2 issue. It might be useful if I elaborate to the whole group on the example I gave of why I want SAX2. The Expat C interface includes features that I use that are not in SAX, but are in SAX2, if I read the documentation correctly. I hope I don't have to hack the driver interface to get them, but if my current understanding of SAX2 is correct, I won't need to... I'm intending to use Python extensions in ZOPE to build an e-pub system. I am the architect of several large SGML/XML based web publishing systems. Although these systems are not constructed with ZOPE, but first on C++/NSAPI and currently on Vignette StoryServer with C++ DLOAD modules, I think some of the relevant experience may be of use to this discussion. First, a brief discussion of requirements. The purpose of my implementations are for large scale online publishing with massive document sets in SGML/XML. Full DTD document models apply. Document fragments must be pulled as needed and formatted for display as HTML. Multiple styles of presentation must be applicable to any particular fragment as needed for webGUI presentation. Documents are very large, many 10s of megabytes in some cases with up to 15 levels of hierarchy supported. Sophisticated SGML/XML structure aware search is used to do full text search. Individual documents must be useable in multiple end user published products at the same time without embedding any product specific info in the documents. Implementation: An inverted index of the "relevant" portion of the XML object hierarchy is built but the document is not broken up into its component objects. Document files are stored as an XML character stream. When a user desires to view a particular web page, that page is constructed on the fly and presented. Caching of the final documents is used. Page Generation: The XML document fragment is pulled, the inversion tells us the byte offsets for the start and end tags of the particular fragment of interest. The fragment is run through a SAX based XML to HTML conversion object that takes in the fragment, the style to use and control information. NOTE: Byte offsets are not available in SAX but are in SAX2. They are available in Expat and the Perl Expat modules which we currently use for this purpose. The older Java SAX api is unusable for this application. Performance: The conversion from XML to HTML runs at just about a mega byte per second at this time on about 300 mhz class Linux boxes and is faster on big Sparc machines with fast memory back planes. It slows down as less information is thrown away in the conversion. Time is directly proportional to the amount of data read in and put out. Usually, much less data comes out than goes in because an individual web page is small relative to the megabyte size input fragment. Implications: SAX (Expat) can stream across a large document at amazing speed. The event driven document handler approach allows a complete style conversion to be performed and produce an output HTML file with CSS and the works. Major Requirement: Complete location information, including byte offsets, are required at all relevant element start and end tag instances. /pgw Greg Wolff pwolff@cox.rr.com From areynolds@hitmacca.com Sat Jun 3 19:17:07 2000 From: areynolds@hitmacca.com (AbbeyReynolds) Date: Sat, 3 Jun 2000 18:17:07 Subject: [XML-SIG] You asked me when I was going to have pictures Message-ID: <20000603222012.AB8D21CD95@dinsdale.python.org> Remember we were chatting a few weeks ago (Ikeep everyone's email if I can) and you got a little flirty, and asked me if I was ever going to put pictures of myself up? Well I put together a silly little site up. Just get a load of the catch line! Meet the sizzling sibling that's in the room next to yours. So available but forever out of reach. She sneers at you as she parades through the house, warning you to . . . . . Look but DON'T TOUCH little boy! Check it out for me, o'kay sweetie. I think you'll like it - that is if you're into fetishey stuff like I am. http://www.bitchysister.com From larsga@garshol.priv.no Sun Jun 4 11:17:54 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 04 Jun 2000 12:17:54 +0200 Subject: [XML-SIG] A usage scenario for Python and XML... In-Reply-To: <393980F5.B6973F63@cox.rr.com> References: <393980F5.B6973F63@cox.rr.com> Message-ID: * Greg Wolff | | NOTE: Byte offsets are not available in SAX but are in SAX2. They | are available in Expat and the Perl Expat modules which we currently | use for this purpose. The older Java SAX api is unusable for this | application. I haven't yet made offsets available in SAX2, but I plan to do so in the next release. I was thinking of two possible ways to do this: - a SAX2 parser property that returns the current offset - a SAX2 parser property that returns a function that returns the current offset The first is the least surprising way, but I think the second is likely to be faster and also more convenient. Opinions on this would be welcome. Support for this property will of course be optional, but both expat and xmlproc will support it, since they both provide the necessary information. If any other parsers provide it they will also support the property. | Major Requirement: Complete location information, including byte | offsets, are required at all relevant element start and end tag | instances. With the addition of the byte offset information SAX 2 should cover your requirements, I assume? BTW: Thank you for a very interesting post. It's always interesting to know what people are using this software for. --Lars M. From larsga@garshol.priv.no Sun Jun 4 11:22:42 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 04 Jun 2000 12:22:42 +0200 Subject: [XML-SIG] speed question re DOM parsing In-Reply-To: <3935D13D.F4EAD64B@roguewave.com> References: <3935D13D.F4EAD64B@roguewave.com> Message-ID: * Bjorn Pettersen | | Question: does using StringIO (or perhaps array) and __getattr__ | sound like the right thing to do? StringIO sounds like the right thing, at least for that particular document. Probably it wouldn't be too bad for the other documents either, but I have no experience with its performance. I'm afraid I don't have the necessary context to answer the __getattr__ questions, but: I would definitely like to see your sources. If you could post them somewhere, I, at least, would be happy to have a look at them. --Lars M. From larsga@garshol.priv.no Sun Jun 4 11:38:06 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 04 Jun 2000 12:38:06 +0200 Subject: [XML-SIG] Zope and DOM Message-ID: I just noticed that Zope is distributed with something they call ZDOM, which seems to be a straightforward DOM implementation, although with some strange assumptions and base classes that I don't fully understand. However, it uses getFoo, where foo is an IDL attribute, rather than get_foo or _get_foo, which is not very good for interoperability. Are lots of people using this? Should we try to get the Zope people to change this? Also, one solution to problems like those Bj�rn Pettersen has been experiencing might be a DOM implementation based on ZODB. I've looked at this briefly and it seems as though it should be fairly easy to do, even though some attributes may not be available as attributes, but only as methods. (Since we shouldn't use __getattr__ with ZODB.) The _p_changed attribute also has to be maintained. Beyond that, however, implementing a DBDOM seems relatively straightforward. Has anyone done this already? (If so, with what interface?) Is there any interest in this sort of thing? --Lars M. From paul@prescod.net Sun Jun 4 22:34:45 2000 From: paul@prescod.net (Paul Prescod) Date: Sun, 04 Jun 2000 16:34:45 -0500 Subject: [XML-SIG] Zope and DOM References: Message-ID: <393ACB75.6CA3A11F@prescod.net> Lars Marius Garshol wrote: > > ... > > Beyond that, however, implementing a DBDOM seems relatively > straightforward. Has anyone done this already? (If so, with what > interface?) Is there any interest in this sort of thing? I believe that FourThought and Digital Creations are looking at this. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Simplicity does not precede complexity, but follows it. - http://www.cs.yale.edu/~perlis-alan/quotes.html From dieter@handshake.de Sun Jun 4 21:40:15 2000 From: dieter@handshake.de (Dieter Maurer) Date: Sun, 4 Jun 2000 22:40:15 +0200 (CEST) Subject: [XML-SIG] Zope and DOM In-Reply-To: References: Message-ID: <14650.48406.666444.439901@lindm.dm> Lars Marius Garshol writes: > I just noticed that Zope is distributed with something they call ZDOM, > which seems to be a straightforward DOM implementation, although with > some strange assumptions and base classes that I don't fully understand. > > However, it uses getFoo, where foo is an IDL attribute, rather than > get_foo or _get_foo, which is not very good for interoperability. Are > lots of people using this? Should we try to get the Zope people to > change this? As far as I now, DC works with FourThought for XML technology. They, probably, won't like "_get_foo", because names starting with "_" are private and can be used neither from DTML nor from HTTP. > Also, one solution to problems like those Bjorn Pettersen has been > experiencing might be a DOM implementation based on ZODB. I've looked > at this briefly and it seems as though it should be fairly easy to do, > even though some attributes may not be available as attributes, but > only as methods. (Since we shouldn't use __getattr__ with ZODB.) The > _p_changed attribute also has to be maintained. ZODB objects can have (application specific) attributes that are accessed as attributes. Usually, the "_p_changed" attribute is maintained automatically. Jim Fulton recently announced that soon "__getattr__" and "__setattr__" will be usable by the application. > Beyond that, however, implementing a DBDOM seems relatively > straightforward. Has anyone done this already? (If so, with what > interface?) Is there any interest in this sort of thing? I just read, that they have such a thing. Extract from a message of "Martijn Faassen " to "zope-dev@zope.org" and "zope-xml@egroups.com": : Right -- XMLDocument does this. It parses the XML into a DOM-like tree, : storing the XML nodes as objects in the database. You still get an XML : view on it, but it's actually all objects. Dieter From gstein@lyra.org Mon Jun 5 14:10:44 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 5 Jun 2000 06:10:44 -0700 (PDT) Subject: [XML-SIG] 0.5.4 distro incorrect? Message-ID: I just got a query about my davlib, which uses qp_xml and my new httplib. (and pyexpat) My page had a reference to an old pyexpat, so I corrected that to point to the XML distro. However... I just downloaded the distro and the qp_xml.py looks out of date. Specifically, the CVS repository has version 1.3 in it, and the v054 tag refers to 1.3 ... but that isn't what is in PyXML-0.5.4 Any ideas? Was the .tar.gz snapped from a not-up-to-date local directory? Or was the tag applied after-the-fact and doesn't match the build? Maybe an 0.5.5 can be built to correct this? (I'd like to just refer people to the XML distro, rather than pyexpat from it plus a separate qp_xml) Cheers, -g -- Greg Stein, http://www.lyra.org/ From akuchlin@mems-exchange.org Mon Jun 5 15:07:52 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Mon, 5 Jun 2000 10:07:52 -0400 Subject: [XML-SIG] 0.5.4 distro incorrect? In-Reply-To: ; from gstein@lyra.org on Mon, Jun 05, 2000 at 06:10:44AM -0700 References: Message-ID: <20000605100752.B18376@amarok.cnri.reston.va.us> On Mon, Jun 05, 2000 at 06:10:44AM -0700, Greg Stein wrote: >Any ideas? Was the .tar.gz snapped from a not-up-to-date local directory? >Or was the tag applied after-the-fact and doesn't match the build? Almost certainly the latter; some changes to the PyExpat module also crept in. I'll cut a 0.5.4final release ASAP. >Maybe an 0.5.5 can be built to correct this? That's another option, since other changes have gone into the CVS since 0.5.4 was pulled; javadom.py, the new version of xmlproc, &c. -- A.M. Kuchling http://starship.python.net/crew/amk/ Imagine a world where nothing is stable. In the West, we have three moving elements -- Air, Fire, Water -- but at least we can depend on the fourth. -- Philip, in Peter Greenaway's _8 1/2 Women_ (1999) From Mike.Olson@fourthought.com Mon Jun 5 16:17:39 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Mon, 05 Jun 2000 09:17:39 -0600 Subject: [XML-SIG] Zope and DOM References: <14650.48406.666444.439901@lindm.dm> Message-ID: <393BC493.A64D291C@FourThought.com> Dieter Maurer wrote: > > Lars Marius Garshol writes: > > > I just noticed that Zope is distributed with something they call ZDOM, > > which seems to be a straightforward DOM implementation, although with > > some strange assumptions and base classes that I don't fully understand. > > > > However, it uses getFoo, where foo is an IDL attribute, rather than > > get_foo or _get_foo, which is not very good for interoperability. Are > > lots of people using this? Should we try to get the Zope people to > > change this? > As far as I now, DC works with FourThought for XML technology. > > They, probably, won't like "_get_foo", because names starting > with "_" are private and can be used neither from DTML nor > from HTTP. Yep, we just started working with them. The ZDOM your looking at is the old prototype. The new prototype supports the _get_foo and get_foo. The second for the exact reason Dieter mentions. > > > Also, one solution to problems like those Bjorn Pettersen has been > > experiencing might be a DOM implementation based on ZODB. I've looked > > at this briefly and it seems as though it should be fairly easy to do, > > even though some attributes may not be available as attributes, but > > only as methods. (Since we shouldn't use __getattr__ with ZODB.) The > > _p_changed attribute also has to be maintained. > ZODB objects can have (application specific) attributes that > are accessed as attributes. > Usually, the "_p_changed" attribute is maintained automatically. > Jim Fulton recently announced that soon "__getattr__" and "__setattr__" > will be usable by the application. I think this was just released. > > > Beyond that, however, implementing a DBDOM seems relatively > > straightforward. Has anyone done this already? (If so, with what > > interface?) Is there any interest in this sort of thing? > I just read, that they have such a thing. > Extract from a message of "Martijn Faassen " > to "zope-dev@zope.org" and "zope-xml@egroups.com": > > : Right -- XMLDocument does this. It parses the XML into a DOM-like tree, > : storing the XML nodes as objects in the database. You still get an XML > : view on it, but it's actually all objects. This is the old XMLDOcument prototype. We are adding this as a feature to the new prototype. Unlike Amos's implementation, we hope to give the user a very wide range of flexibility over what portions of the document are converted to objects, and which are stored as attributes on objects. We have a bit of hashing out to yet but hope to have something useful out in a week or two.... Mike > > Dieter > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From larsga@garshol.priv.no Mon Jun 5 16:32:26 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 05 Jun 2000 17:32:26 +0200 Subject: [XML-SIG] Zope and DOM In-Reply-To: <393BC493.A64D291C@FourThought.com> References: <14650.48406.666444.439901@lindm.dm> <393BC493.A64D291C@FourThought.com> Message-ID: * Mike Olson | | The ZDOM your looking at is the old prototype. The new prototype | supports the _get_foo and get_foo. The second for the exact reason | Dieter mentions. Aha. Well, that sounds like very good news to me. | This is the old XMLDOcument prototype. We are adding this as a | feature to the new prototype. Unlike Amos's implementation, we hope | to give the user a very wide range of flexibility over what portions | of the document are converted to objects, and which are stored as | attributes on objects. We have a bit of hashing out to yet but hope | to have something useful out in a week or two.... Sounds good. Where will it be announced? (I'd like to be able to take a look at it so that I can mention it in my book and also so that I can list it on Free XML tools.) --Lars M. From molson@fourthought.com Mon Jun 5 17:06:10 2000 From: molson@fourthought.com (Mike Olson) Date: Mon, 5 Jun 2000 10:06:10 -0600 (MDT) Subject: [XML-SIG] Zope and DOM In-Reply-To: Message-ID: On 5 Jun 2000, Lars Marius Garshol wrote: > > | This is the old XMLDOcument prototype. We are adding this as a > | feature to the new prototype. Unlike Amos's implementation, we hope > | to give the user a very wide range of flexibility over what portions > | of the document are converted to objects, and which are stored as > | attributes on objects. We have a bit of hashing out to yet but hope > | to have something useful out in a week or two.... > > Sounds good. Where will it be announced? (I'd like to be able to > take a look at it so that I can mention it in my book and also so that > I can list it on Free XML tools.) Currently we are hashing out the issues on zope.org. See: http://www.zope.org/Wikis/zope-xml/FrontPage and please add comments. We have gotten a good amount of feedback from the Zope community but could use some more from the XML community. We will be sure to inform the list of our releases. Mike > > --Lars M. > > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig > From uche.ogbuji@fourthought.com Mon Jun 5 19:46:05 2000 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 05 Jun 2000 12:46:05 -0600 Subject: [XML-SIG] Great Book Message-ID: <393BF56D.BE504410@fourthought.com> 4XSLT users (and all XSLT users in general), I cannot recommend highly enough Mike Kay's book: http://www.wrox.com/Consumer/Store/Details.asp?ISBN=1861003129 http://www1.fatbrain.com/asp/bookinfo/bookinfo.asp?theisbn=1861003129 Mr. Kay is author of Saxon, the most compliant XSLT implementation there is (though 4XSLT might just be second place, and is catching up swiftly). He has written one of the best technical books I've ever read, which, despite its name, is as useful as a detailed XSLT intro as well as an XSLT reference. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From fdrake@beopen.com Mon Jun 5 20:27:16 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Mon, 5 Jun 2000 15:27:16 -0400 (EDT) Subject: [XML-SIG] Great Book In-Reply-To: <393BF56D.BE504410@fourthought.com> References: <393BF56D.BE504410@fourthought.com> Message-ID: <14651.65300.464693.257475@cj42289-a.reston1.va.home.com> Uche Ogbuji writes: > I cannot recommend highly enough Mike Kay's book: > > http://www.wrox.com/Consumer/Store/Details.asp?ISBN=1861003129 > http://www1.fatbrain.com/asp/bookinfo/bookinfo.asp?theisbn=1861003129 > > Mr. Kay is author of Saxon, the most compliant XSLT implementation there > is (though 4XSLT might just be second place, and is catching up > swiftly). He has written one of the best technical books I've ever > read, which, despite its name, is as useful as a detailed XSLT intro as > well as an XSLT reference. Uche, I'm glad to hear the book is good -- I saw it last night at the local bookstore (which is really just another cancer produced by a national chain), but was distracted when my three-year-old spilled his orange juice on the floor (purchased from another cancerous national chain). I'll have to make another run over there to save this good cell from it's tepid storage location. ;-) -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From wunder@ultraseek.com Mon Jun 5 22:19:33 2000 From: wunder@ultraseek.com (Walter Underwood) Date: Mon, 05 Jun 2000 14:19:33 -0700 Subject: [XML-SIG] PyExpat changes for encoding (was: XML support in Python 1.6) In-Reply-To: Message-ID: <2437066126.960214773@serrano.infoseek.com> --On Friday, June 02, 2000 8:17 AM -0700 Greg Stein wrote: > On Fri, 2 Jun 2000, Andrew M. Kuchling wrote: >> ... >> parser.nativeEncoding() -> returns "UTF-8" or "UTF-16" > > pyexpat.native_encoding as a readonly attribute. I see no particular > use in making it a function. (Note the module-level, too!) I like it, but "unicode" is not an encoding. The proper Unicode 3.0 name for this is "UTF-16". In Uncode 2.x, it was called "UCS-2". If there is no byte-order mark (BOM), then it should identify as little- or big-endian, that is, UTF-16LE or UTF-16BE. But I'm strongly in favor of Expat returning UTF-16 in native byte order, and the Python interface returning Python unicode objects. Relying on locally-installed copies of Expat would be a support nightmare for us. wunder -- Walter R. Underwood Senior Staff Engineer, Infoseek Software http://software.infoseek.com/ From pwolff@cox.rr.com Mon Jun 5 22:39:53 2000 From: pwolff@cox.rr.com (Greg Wolff) Date: Mon, 05 Jun 2000 17:39:53 -0400 Subject: [XML-SIG] A usage scenario for Python and XML... References: <393980F5.B6973F63@cox.rr.com> Message-ID: <393C1E29.59AA7A63@cox.rr.com> This looks good. Once the byte offsets are in place it looks like that we get me going on the Python implementation. Thanks! Answers below... Lars Marius Garshol wrote: > > * Greg Wolff > | > | NOTE: Byte offsets are not available in SAX but are in SAX2. .... > > I haven't yet made offsets available in SAX2, but I plan to do so in > the next release. I was thinking of two possible ways to do this: > > - a SAX2 parser property that returns the current offset > - a SAX2 parser property that returns a function that returns the > current offset > > The first is the least surprising way, but I think the second is > likely to be faster and also more convenient. Opinions on this would > be welcome. I think I would like the second better than the first, but I don't see that it makes any difference. > | Major Requirement: Complete location information, including byte > | offsets, are required at all relevant element start and end tag > | instances. > > With the addition of the byte offset information SAX 2 should cover > your requirements, I assume? Yes, I think it will. The more I work with the Python XML implementations, the better I like it. You'all have done a very nice job with it, and of course, Python is excellent in its own right. > > BTW: Thank you for a very interesting post. It's always interesting to > know what people are using this software for. > > --Lars M. You're welcome. /pgw Greg Wolff From ivanlan@home.com Mon Jun 5 23:06:13 2000 From: ivanlan@home.com (Ivan Van Laningham) Date: Mon, 05 Jun 2000 16:06:13 -0600 Subject: [XML-SIG] Pyexpat Message-ID: <393C2455.1B9685B@home.com> Hi All-- I've "installed" the PyXml distribution (0.5.4) and downloaded pyxie (from Sean McGrath) which uses pyexpat. Despite having the pyexpat.dll in my winnt directory, it refuses to import: >>> import pyexpat Traceback (innermost last): File "", line 1, in ? ImportError: No module named pyexpat >>> I know that when I get the answer to this, I'll be deeply embarassed since it is almost certainly something stupid that I'm doing/not doing (have we left anything out?), but I am prepared. -ly y'rs, Ivan;-) ---------------------------------------------- Ivan Van Laningham Axent Technologies, Inc. http://www.pauahtun.org http://www.foretec.com/python/workshops/1998-11/proceedings.html Army Signal Corps: Cu Chi, Class of '70 Author: Teach Yourself Python in 24 Hours From bjorn@roguewave.com Mon Jun 5 23:52:30 2000 From: bjorn@roguewave.com (Bjorn Pettersen) Date: Mon, 05 Jun 2000 16:52:30 -0600 Subject: [XML-SIG] speed question re DOM parsing References: <3935D13D.F4EAD64B@roguewave.com> Message-ID: <393C2F2E.7AF4FA01@roguewave.com> Lars Marius Garshol wrote: > > * Bjorn Pettersen > | > | Question: does using StringIO (or perhaps array) and __getattr__ > | sound like the right thing to do? > > StringIO sounds like the right thing, at least for that particular > document. Probably it wouldn't be too bad for the other documents > either, but I have no experience with its performance. > > I'm afraid I don't have the necessary context to answer the > __getattr__ questions, but: I would definitely like to see your > sources. If you could post them somewhere, I, at least, would be happy > to have a look at them. Ok, give me a couple of days and I'll put it up. -b From bjorn@roguewave.com Tue Jun 6 01:03:55 2000 From: bjorn@roguewave.com (Bjorn Pettersen) Date: Mon, 05 Jun 2000 18:03:55 -0600 Subject: [XML-SIG] speed question re DOM parsing References: <3935D13D.F4EAD64B@roguewave.com> Message-ID: <393C3FEB.A2DAB88E@roguewave.com> This is a multi-part message in MIME format. --------------2165DD6D55CFA19313775CB7 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Lars Marius Garshol wrote: > > * Bjorn Pettersen > | > | Question: does using StringIO (or perhaps array) and __getattr__ > | sound like the right thing to do? > > StringIO sounds like the right thing, at least for that particular > document. Probably it wouldn't be too bad for the other documents > either, but I have no experience with its performance. > > I'm afraid I don't have the necessary context to answer the > __getattr__ questions, but: I would definitely like to see your > sources. If you could post them somewhere, I, at least, would be happy > to have a look at them. I've included the patched file as an attachment. My changes are confined to: - importing (c)StringIO at the top - changing the constructor call to _element (line 82) to pass a StringIO object rather than an empty string. - hiding the "first_cdata" member in the __init__ method of _element - adding a __getattr__ method to _element. With limited performance testing I got: File Size Original Patched 37K 0.14s 0.07s 968K 103.77s 1.68s -- bjorn --------------2165DD6D55CFA19313775CB7 Content-Type: text/plain; charset=us-ascii; name="qp_xml.py" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="qp_xml.py" # # qp_xml: Quick Parsing for XML # # Written by Greg Stein. Public Domain. # No Copyright, no Rights Reserved, and no Warranties. # # This module is maintained by Greg and is available as part of the XML-SIG # distribution. This module and its changelog can be fetched at: # http://www.lyra.org/cgi-bin/viewcvs.cgi/xml/xml/utils/qp_xml.py # # Additional information can be found on Greg's Python page at: # http://www.lyra.org/greg/python/ # # This module was added to the XML-SIG distribution on February 14, 2000. # As part of that distribution, it falls under the XML distribution license. # import string try: import cStringIO _StringIO = cStringIO except ImportError: import StringIO _StringIO = StringIO try: import pyexpat except ImportError: from xml.parsers import pyexpat error = __name__ + '.error' # # The parsing class. Instantiate and pass a string/file to .parse() # class Parser: def __init__(self): self.reset() def reset(self): self.root = None self.cur_elem = None self.error = None def find_prefix(self, prefix): elem = self.cur_elem while elem: if elem.ns_scope.has_key(prefix): return elem.ns_scope[prefix] elem = elem.parent if prefix == '': return '' # empty URL for "no namespace" return None def process_prefix(self, name, use_default): idx = string.find(name, ':') if idx == -1: if use_default: return self.find_prefix(''), name return '', name # no namespace if string.lower(name[:3]) == 'xml': return '', name # name is reserved by XML. don't break out a NS. ns = self.find_prefix(name[:idx]) if ns is None: self.error = 'namespace prefix not found' return ns, name[idx+1:] def start(self, name, attrs): if self.error: return elem = _element(name=name, lang=None, parent=None, children=[], ns_scope={}, attrs={}, first_cdata=_StringIO.StringIO(), following_cdata='') if self.cur_elem: elem.parent = self.cur_elem elem.parent.children.append(elem) self.cur_elem = elem else: self.cur_elem = self.root = elem work_attrs = [ ] # scan for namespace declarations (and xml:lang while we're at it) for i in range(0, len(attrs), 2): name = attrs[i] value = attrs[i+1] if name == 'xmlns': elem.ns_scope[''] = value elif name[:6] == 'xmlns:': elem.ns_scope[name[6:]] = value elif name == 'xml:lang': elem.lang = value else: work_attrs.append((name, value)) # inherit xml:lang from parent if elem.lang is None and elem.parent: elem.lang = elem.parent.lang # process prefix of the element name elem.ns, elem.name = self.process_prefix(elem.name, 1) # process attributes' namespace prefixes for name, value in work_attrs: elem.attrs[self.process_prefix(name, 0)] = value def end(self, name): if self.error: return parent = self.cur_elem.parent del self.cur_elem.ns_scope del self.cur_elem.parent self.cur_elem = parent def cdata(self, data): if self.error: return elem = self.cur_elem if elem.children: last = elem.children[-1] last.following_cdata = last.following_cdata + data else: # this branch taken ~3 times more than true branch elem.first_cdata.write(data) #elem.first_cdata = elem.first_cdata + data def parse(self, input): self.reset() p = pyexpat.ParserCreate() p.StartElementHandler = self.start p.EndElementHandler = self.end p.CharacterDataHandler = self.cdata exception = None try: if type(input) == type(''): try: p.Parse(input, 1) except pyexpat.error, exception: pass else: while 1: s = input.read(_BLOCKSIZE) if not s: try: p.Parse('', 1) except pyexpat.error, exception: pass break try: rv = p.Parse(s, 0) except pyexpat.error, exception: pass if exception or self.error: break if exception: s = pyexpat.ErrorString(p.ErrorCode) raise error, 'expat parsing error: ' + exception if self.error: raise error, self.error finally: if self.root: _clean_tree(self.root) print 'self.root', self.root return self.root # # handy function for dumping a tree that is returned by Parser # def dump(f, root): f.write('\n') namespaces = _collect_ns(root) _dump_recurse(f, root, namespaces, 1) f.write('\n') # # This function returns the element's CDATA. Note: this is not recursive -- # it only returns the CDATA immediately within the element, excluding the # CDATA in child elements. # def textof(elem): return elem.textof() ######################################################################### # # private stuff for qp_xml # _BLOCKSIZE = 1024 * 16 # chunk size for parsing input class _element: def __init__(self, **kw): self.__dict__.update(kw) # changing first_cdata to be a StringIO object and # handling it transparently in __getattr__ below. # To make it work, we need to hide it first... self.__fcd = self.__dict__['first_cdata'] del self.__dict__['first_cdata'] def textof(self): '''Return the CDATA of this element. Note: this is not recursive -- it only returns the CDATA immediately within the element, excluding the CDATA in child elements. ''' s = self.first_cdata #.getvalue() for child in self.children: s = s + child.following_cdata return s def find(self, name, ns=''): for elem in self.children: if elem.name == name and elem.ns == ns: return elem return None def __getattr__(self, attr): """first_cdata used to be a string attribute, but is now a StringIO object. Preserve the illusion that it is still a string attribute. """ if attr == 'first_cdata': return self.__fcd.getvalue() else: return self.__dict__[attr] def _clean_tree(elem): elem.parent = None del elem.parent map(_clean_tree, elem.children) def _collect_recurse(elem, dict): dict[elem.ns] = None for ns, name in elem.attrs.keys(): dict[ns] = None for child in elem.children: _collect_recurse(child, dict) def _collect_ns(elem): "Collect all namespaces into a NAMESPACE -> PREFIX mapping." d = { '' : None } _collect_recurse(elem, d) del d[''] # make sure we don't pick up no-namespace entries keys = d.keys() for i in range(len(keys)): d[keys[i]] = i return d def _dump_recurse(f, elem, namespaces, dump_ns=0): if elem.ns: f.write('' + elem.first_cdata) for child in elem.children: _dump_recurse(f, child, namespaces) f.write(child.following_cdata) if elem.ns: f.write('' % (namespaces[elem.ns], elem.name)) else: f.write('' % elem.name) else: f.write('/>') --------------2165DD6D55CFA19313775CB7-- From akuchlin@mems-exchange.org Tue Jun 6 01:05:01 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Mon, 5 Jun 2000 20:05:01 -0400 Subject: [XML-SIG] PyXML 0.5.5 release Message-ID: <20000605200501.A18966@newcnri.cnri.reston.va.us> I've made a quickie 0.5.5 release, containing 3 changes: 1) Updated version of qp_xdml.py, pointed out by Greg Stein 2) Patches to pyexpat.c, for correct compilation on Windows 3) Bugfix to dom/core.py, to escape ' in attribute values. --amk From tpassin@home.com Tue Jun 6 03:10:16 2000 From: tpassin@home.com (tpassin@home.com) Date: Mon, 5 Jun 2000 22:10:16 -0400 Subject: [XML-SIG] Pyexpat References: <393C2455.1B9685B@home.com> Message-ID: <00ca01bfcf5c$5b9cb240$7cac1218@reston1.va.home.com> Ivan Van Laningham asked > Hi All-- > I've "installed" the PyXml distribution (0.5.4) and downloaded pyxie > (from Sean McGrath) which uses pyexpat. Despite having the pyexpat.dll > in my winnt directory, it refuses to import: > I know that when I get the answer to this, I'll be deeply embarassed > since it is almost certainly something stupid that I'm doing/not doing > (have we left anything out?), but I am prepared. > I had trouble getting pyexpat to import too, on Win95/98. Right now I have it in both C:\Program FIles\python\dlls and C:\Program Files\Python\xml\parsers. One of them works, but after all these months, I forget which one did the trick. You need it in a python directory, rather than just a windows path directory, I think. Tom Passin From paul@prescod.net Tue Jun 6 03:19:21 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 05 Jun 2000 21:19:21 -0500 Subject: [XML-SIG] Pyexpat References: <393C2455.1B9685B@home.com> Message-ID: <393C5FA9.D0631A4F@prescod.net> Try renaming pyexpat.dll to pyexpat.pyd. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself When I'm gone, boxing will be nothing again. The fans with the cigars and the hats turned down'll be there, but no more housewives and little men in the street and foreign presidents. It's goin' to be back to the fighter who comes to town, smells a flower, visits a hospital, blows a horn and says he's in shape. Old hat. I was the onliest boxer in history people asked questions like a senator. -- Muhammad Ali From akuchlin@mems-exchange.org Tue Jun 6 04:07:51 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Mon, 5 Jun 2000 23:07:51 -0400 Subject: [XML-SIG] 4DOM checked into PyXML CVS Message-ID: <20000605230751.A19541@newcnri.cnri.reston.va.us> I've just checked in a snapshot of 4DOM into the XML-SIG's CVS tree. This will probably result in massive breakage of code that uses the DOM, or one of the modules in xml.dom that got deleted. Current goals are to test more of the new DOM code, and fix any breakages. I've already spotted a few minor glitches, and will send the patches back to FourThought. --amk From rob@hooft.net Tue Jun 6 07:41:09 2000 From: rob@hooft.net (Rob W. W. Hooft) Date: Tue, 6 Jun 2000 08:41:09 +0200 (CEST) Subject: [XML-SIG] Zope and DOM In-Reply-To: <393BC493.A64D291C@FourThought.com> References: <14650.48406.666444.439901@lindm.dm> <393BC493.A64D291C@FourThought.com> Message-ID: <14652.40197.837931.957911@temoleh.chem.uu.nl> >>>>> "MO" == Mike Olson writes: >> : Right -- XMLDocument does this. It parses the XML into a >> DOM-like tree, : storing the XML nodes as objects in the >> database. You still get an XML : view on it, but it's actually all >> objects. MO> This is the old XMLDOcument prototype. We are adding this as a MO> feature to the new prototype. Unlike Amos's implementation, we MO> hope to give the user a very wide range of flexibility over what MO> portions of the document are converted to objects, and which are MO> stored as attributes on objects. We have a bit of hashing out to MO> yet but hope to have something useful out in a week or two.... I'll be waiting for this, as currently a 37kB XBEL file which I read into Zope using XMLDocument (actually a Zsubclass [or is that subZclass?] XBELDocument that I wrote) turned into a 550kB ZODB.... I guess with the power you are describing here that can be made a lot more efficient. Rob -- ===== rob@hooft.net http://www.hooft.net/people/rob/ ===== ===== R&D, Nonius BV, Delft http://www.nonius.nl/ ===== ===== PGPid 0xFA19277D ========================== Use Linux! ========= From Juergen Hermann" On Mon, 5 Jun 2000 20:05:01 -0400, Andrew Kuchling wrote: >I've made a quickie 0.5.5 release, containing 3 changes: >1) Updated version of qp_xdml.py, pointed out by Greg Stein >2) Patches to pyexpat.c, for correct compilation on Windows >3) Bugfix to dom/core.py, to escape ' in attribute values. This is the quickie fix to get it to build with VC++6 and the RELEASE version of Python 1.5.2. :) --- pyexpat.c.orig Mon Jun 05 23:48:06 2000 +++ pyexpat.c Tue Jun 06 08:41:13 2000 @@ -471,7 +471,7 @@ int i; xmlparseobject *self; - self = PyObject_New(xmlparseobject, &Xmlparsetype); + self = PyObject_NEW(xmlparseobject, &Xmlparsetype); if (self == NULL) return NULL; @@ -512,7 +512,11 @@ for( i=0; handler_info[i].name!=NULL; i++ ){ Py_XDECREF( self->handlers[i] ); } - PyObject_Del(self); +#ifndef PyObject_DEL + free(self); +#else + PyObject_DEL(self); +#endif } static int handlername2int( const char *name ){ From walter@livinglogic.de Tue Jun 6 09:58:14 2000 From: walter@livinglogic.de (Walter Doerwald) Date: Tue, 06 Jun 2000 10:58:14 +0200 Subject: [XML-SIG] PyXML 0.5.5 release In-Reply-To: <20000605200501.A18966@newcnri.cnri.reston.va.us> Message-ID: <4.3.1.0.20000606105744.00aee450@mail.tmt.de> At 02:05 06.06.00, you wrote: >I've made a quickie 0.5.5 release, containing 3 changes: >1) Updated version of qp_xdml.py, pointed out by Greg Stein >2) Patches to pyexpat.c, for correct compilation on Windows >3) Bugfix to dom/core.py, to escape ' in attribute values. What about a new version of sgmlop? Bye, Walter D=F6rwald From paul@prescod.net Tue Jun 6 11:14:31 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 06 Jun 2000 05:14:31 -0500 Subject: [XML-SIG] Zope and DOM References: <14650.48406.666444.439901@lindm.dm> <393BC493.A64D291C@FourThought.com> <14652.40197.837931.957911@temoleh.chem.uu.nl> Message-ID: <393CCF07.3E8E3A8E@prescod.net> "Rob W. W. Hooft" wrote: > > I'll be waiting for this, as currently a 37kB XBEL file which I read > into Zope using XMLDocument (actually a Zsubclass [or is that > subZclass?] XBELDocument that I wrote) turned into a 550kB ZODB.... I > guess with the power you are describing here that can be made a lot > more efficient. Unless you do some tricky stuff, a 10* blowup is par for the course in importing XML documents into object databases. All of a sudden you've got pointers all over the place from elements to their parents and to their siblings and so forth. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Alabama's constitution is 100 years old, 300 pages long and has more than 600 amendments. Highlights include "Amendment 393: Amendment of Amendment No. 351", "Validation of Laws Regulating Court Costs in Randolph County", "Miscegenation laws", "Bingo Games in Russell County", "Suppression of dueling". - http://www.legislature.state.al.us/ALISHome.html From rob@hooft.net Tue Jun 6 13:10:37 2000 From: rob@hooft.net (Rob W. W. Hooft) Date: Tue, 6 Jun 2000 14:10:37 +0200 (CEST) Subject: [XML-SIG] Zope and DOM In-Reply-To: <393CCF07.3E8E3A8E@prescod.net> References: <14650.48406.666444.439901@lindm.dm> <393BC493.A64D291C@FourThought.com> <14652.40197.837931.957911@temoleh.chem.uu.nl> <393CCF07.3E8E3A8E@prescod.net> Message-ID: <14652.59965.690356.688151@temoleh.chem.uu.nl> >>>>> "PP" == Paul Prescod writes: PP> "Rob W. W. Hooft" wrote: >> I'll be waiting for this, as currently a 37kB XBEL file which I >> read into Zope using XMLDocument (actually a Zsubclass [or is that >> subZclass?] XBELDocument that I wrote) turned into a 550kB >> ZODB.... I guess with the power you are describing here that can >> be made a lot more efficient. PP> Unless you do some tricky stuff, a 10* blowup is par for the PP> course in importing XML documents into object databases. All of a PP> sudden you've got pointers all over the place from elements to PP> their parents and to their siblings and so forth. Reading the original message, I thought that the "tricky stuff" might become available. If the dom-ification could stop at the "bookmark" level in an XBEL file, the total amount of objects would drop significantly. Rob From wb104@mole.bio.cam.ac.uk Tue Jun 6 14:49:37 2000 From: wb104@mole.bio.cam.ac.uk (Wayne Boucher) Date: Tue, 6 Jun 2000 14:49:37 +0100 (BST) Subject: [XML-SIG] problem compiling pyexpat.c with Irix 6 cc Message-ID: Hello, Hopefully a bug not reported too often. I am compiling lots of Python libraries with the Irix 6 MIPSpro Compiler Version 7.30. In pyexpat.c in the PyXML-0.5.4 distribution at line 82 there is a forward declaration static struct HandlerInfo handler_info[]; which the compiler does not like, the output to the screen is: Running command: make cc -Iexpat/xmlparse -O -OPT:Olimit=0 -I/dogmatix/wb104/python/Python-1.5.2/include/python1.5 -I/dogmatix/wb104/python/Python-1.5.2/include/python1.5 -DHAVE_CONFIG_H -c ./pyexpat.c cc-1081 cc: ERROR File = ./pyexpat.c, Line = 82 More than one storage class specifier appears in a declaration. extern static struct HandlerInfo handler_info[]; ^ 1 error detected in the compilation of "./pyexpat.c". I re-arranged the code and used some other forward declarations to eventually compile the code. Wayne Boucher From uogbuji@fourthought.com Tue Jun 6 16:59:04 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 06 Jun 2000 09:59:04 -0600 Subject: [XML-SIG] Zope and DOM In-Reply-To: Message from rob@hooft.net (Rob W. W. Hooft) of "Tue, 06 Jun 2000 14:10:37 +0200." <14652.59965.690356.688151@temoleh.chem.uu.nl> Message-ID: <200006061559.JAA03541@localhost.localdomain> > >>>>> "PP" == Paul Prescod writes: > > PP> "Rob W. W. Hooft" wrote: > >> I'll be waiting for this, as currently a 37kB XBEL file which I > >> read into Zope using XMLDocument (actually a Zsubclass [or is that > >> subZclass?] XBELDocument that I wrote) turned into a 550kB > >> ZODB.... I guess with the power you are describing here that can > >> be made a lot more efficient. > > PP> Unless you do some tricky stuff, a 10* blowup is par for the > PP> course in importing XML documents into object databases. All of a > PP> sudden you've got pointers all over the place from elements to > PP> their parents and to their siblings and so forth. > > Reading the original message, I thought that the "tricky stuff" might > become available. If the dom-ification could stop at the "bookmark" level > in an XBEL file, the total amount of objects would drop significantly. This is precisely the tricky stuff we have in mind. The current thinking is to use a (lightly) specialized XSLT transform to specify how elements are converted to ZObjects. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From molson@fourthought.com Tue Jun 6 17:02:42 2000 From: molson@fourthought.com (Mike Olson) Date: Tue, 6 Jun 2000 10:02:42 -0600 (MDT) Subject: [XML-SIG] Zope and DOM In-Reply-To: <14652.40197.837931.957911@temoleh.chem.uu.nl> Message-ID: On Tue, 6 Jun 2000, Rob W. W. Hooft wrote: > > I'll be waiting for this, as currently a 37kB XBEL file which I read > into Zope using XMLDocument (actually a Zsubclass [or is that > subZclass?] XBELDocument that I wrote) turned into a 550kB ZODB.... I > guess with the power you are describing here that can be made a lot > more efficient. Correct, this is one of the exact problems we encountered with the old XMLDocument. We called it zBloat :) Very high on our list of features. You'll notice that the new XMLDocument takes the other extreme and stores the entire XML in a single Object. Mike > > Rob > > From paul@prescod.net Tue Jun 6 17:07:58 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 06 Jun 2000 11:07:58 -0500 Subject: [XML-SIG] Zope and DOM References: <200006061559.JAA03541@localhost.localdomain> Message-ID: <393D21DE.ED9A576A@prescod.net> Uche Ogbuji wrote: > > ... > > > > Reading the original message, I thought that the "tricky stuff" might > > become available. If the dom-ification could stop at the "bookmark" level > > in an XBEL file, the total amount of objects would drop significantly. > > This is precisely the tricky stuff we have in mind. The current thinking is > to use a (lightly) specialized XSLT transform to specify how elements are > converted to ZObjects. ZObjects or Zope DOM objects? There are two different use cases. In one you want to get some objects into Zope and don't care about the XML representation. Then you can throw away a lot of information -- not much trickery involved. In the second, you want to be able to get back out your XML document after storing it in Zope as DOM objects. Now you've got conflicting goals of trying to keep the object count down while trying not to throw away information about, e.g. the order of paragraphs, the order of attributes, entity reference boundaries and so forth. That's the really tricky situation. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Alabama's constitution is 100 years old, 300 pages long and has more than 600 amendments. Highlights include "Amendment 393: Amendment of Amendment No. 351", "Validation of Laws Regulating Court Costs in Randolph County", "Miscegenation laws", "Bingo Games in Russell County", "Suppression of dueling". - http://www.legislature.state.al.us/ALISHome.html From jjp@connix.com Tue Jun 6 17:10:15 2000 From: jjp@connix.com (John Posner) Date: Tue, 6 Jun 2000 12:10:15 -0400 Subject: [XML-SIG] Need an equivalent to Perl's XML::Parser "Tree" style Message-ID: <003301bfcfd1$b508cab0$6e64fea9@jake> Hi -- Back when I was a Perl hacker (more of a dabbler, really), I had good luck creating an object that represents an entire XML document, using this call: new XML::Parser(Style => Tree) Here's a description, in Python terms, of the recursive data structure created by the above call: -------------------------------------------------------------------- The E-NODE representing the root element is a 2-item list: * item 0 = string containing name of element * item 1 = another 2-item list: * item 0 = a dictionary containing element's attributes * item 1 = a list containing multiple 2-item lists: * one of these 2-item lists captures the element's character data: * item 0 = string "0" * item 1 = string containing element's character data * each other 2-item list is an E-NODE representing a subelement -------------------------------------------------------------------- QUESTION: what set of Python tools comes closest to creating a data structure similar to, or exactly like, the above? Thanks! John -- John Posner, Editor jjp@oreilly.com O'Reilly & Associates 860-663-3147 From ivanlan@home.com Tue Jun 6 17:22:12 2000 From: ivanlan@home.com (Ivan Van Laningham) Date: Tue, 06 Jun 2000 10:22:12 -0600 Subject: [XML-SIG] Pyexpat References: <393C2455.1B9685B@home.com> Message-ID: <393D2534.9EFE4763@home.com> Hi All-- [cc'd to Python-List in order to get this into FAQTS] Ivan Van Laningham wrote: > > Hi All-- > I've "installed" the PyXml distribution (0.5.4) and downloaded pyxie > (from Sean McGrath) which uses pyexpat. Despite having the pyexpat.dll > in my winnt directory, it refuses to import: > > >>> import pyexpat > Traceback (innermost last): > File "", line 1, in ? > ImportError: No module named pyexpat > >>> > > I know that when I get the answer to this, I'll be deeply embarassed > since it is almost certainly something stupid that I'm doing/not doing > (have we left anything out?), but I am prepared. > Tom Passin replied, yesterday: > Subject: Re: [XML-SIG] Pyexpat > Date: Mon, 5 Jun 2000 22:10:16 -0400 > From: > To: "Python xml-sig" > > I had trouble getting pyexpat to import too, on Win95/98. Right now I have > it in both C:\Program FIles\python\dlls and C:\Program > Files\Python\xml\parsers. One of them works, but after all these months, I > forget which one did the trick. You need it in a python directory, rather > than just a windows path directory, I think. There were a couple of other suggestions, but they didn't work. PyXml installed into c:\Python\Lib\xml on my system, but failed to install pyexpat.dll. I copied it into c:\Python\Lib\xml\parsers, without renaming it, and that fixed the problem. The install should copy pyexpat.dll to the correct place, especially since the install package comes with a pre-built one. It should also be in the FAQ, shouldn't it? Thanks, everyone, for your help. -ly y'rs, Ivan;-) ---------------------------------------------- Ivan Van Laningham Axent Technologies, Inc. http://www.pauahtun.org http://www.foretec.com/python/workshops/1998-11/proceedings.html Army Signal Corps: Cu Chi, Class of '70 Author: Teach Yourself Python in 24 Hours From molson@fourthought.com Tue Jun 6 17:22:55 2000 From: molson@fourthought.com (Mike Olson) Date: Tue, 6 Jun 2000 10:22:55 -0600 (MDT) Subject: [XML-SIG] Zope and DOM In-Reply-To: <393D21DE.ED9A576A@prescod.net> Message-ID: On Tue, 6 Jun 2000, Paul Prescod wrote: > Uche Ogbuji wrote: > > > > ... > > > > ZObjects or Zope DOM objects? There are two different use cases. In one > you want to get some objects into Zope and don't care about the XML > representation. Then you can throw away a lot of information -- not much > trickery involved. In the second, you want to be able to get back out > your XML document after storing it in Zope as DOM objects. Now you've > got conflicting goals of trying to keep the object count down while > trying not to throw away information about, e.g. the order of > paragraphs, the order of attributes, entity reference boundaries and so > forth. That's the really tricky situation. Actually we are shooting for both. Since all Zope Objects will support the DOM nterface, we are hoping to mix the children of a document with some specialized Zope Objects (persistent) and some transient 4DOM objects. The user will not know the difference. The 4DOM objects will only be stored as fragments and recreated as needed. Mike > > From case@appliedtheory.com Tue Jun 6 17:38:16 2000 From: case@appliedtheory.com (Benjamin Saller) Date: Tue, 6 Jun 2000 12:38:16 -0400 (EDT) Subject: [XML-SIG] Need an equivalent to Perl's XML::Parser "Tree" style In-Reply-To: <003301bfcfd1$b508cab0$6e64fea9@jake> Message-ID: Today, John Posner wrote: Hi -- Back when I was a Perl hacker (more of a dabbler, really), I had good luck creating an object that represents an entire XML document, using this call: new XML::Parser(Style => Tree) Here's a description, in Python terms, of the recursive data structure created by the above call: QUESTION: what set of Python tools comes closest to creating a data structure similar to, or exactly like, the above? I have something somewhat similar. Might be a good staring point if nothing else. Below is how to get this out of sourceforge's CVS where it is a just a piece of a larger system. The idea is to make it behave more like a native object. Rather than lots of tuples you can use dotted notation to drill down into an object and dictionary access to get at metadata. There are test cases that show how to use it, but the quick of it is. For a file like this ===================== somehost.com 127.0.0.1 foo.bar.com baz.bar.com ====================== import xmlConfig container = xmlConfig.xmlConfig().parse(filename) port = container.container1.listen['port'] hosts = container.container2.allow.get('host') print port # 9000 print hosts # ['127.0.0.1', 'foo.bar.com', 'baz.bar.com'] ====================== I am not sure how useful this will be, but if it looks helpful give it a try. I realize this is not a general purpose solution and doesn't allow you to address all the data with out getting intermediate handles, but it works well for simple tasks. cvs -d:pserver:anonymous@cvs.PASS.sourceforge.net:/cvsroot/PASS login cvs -z3 -d:pserver:anonymous@cvs.PASS.sourceforge.net:/cvsroot/PASS co src/xmlConfig -- Benjamin Saller Technical Strategist AppliedTheory Where tire hits pavement on the Information super-highway, that's where my head is... From gstein@lyra.org Tue Jun 6 19:47:00 2000 From: gstein@lyra.org (Greg Stein) Date: Tue, 6 Jun 2000 11:47:00 -0700 (PDT) Subject: [XML-SIG] Need an equivalent to Perl's XML::Parser "Tree" style In-Reply-To: <003301bfcfd1$b508cab0$6e64fea9@jake> Message-ID: Quite an easy answer, actually :-) In the PyXML distro, take a look at xml.utils.qp_xml. If you have PyXML 0.5.5, then you're fine. Otherwise, the most recent copy can be fetched from: http://www.lyra.org/greg/python/qp_xml.py It constructs very lightweight Python objects for the elements. It also does so quite quickly :-), although I've got even more speed improvements on deck from Bjorn Pettersen. Cheers, -g On Tue, 6 Jun 2000, John Posner wrote: > Hi -- > > Back when I was a Perl hacker (more of a dabbler, really), I had good luck > creating an object that represents an entire XML document, using this call: > > new XML::Parser(Style => Tree) > > Here's a description, in Python terms, of the recursive data structure > created by the above call: > > -------------------------------------------------------------------- > The E-NODE representing the root element is a 2-item list: > > * item 0 = string containing name of element > > * item 1 = another 2-item list: > * item 0 = a dictionary containing element's attributes > * item 1 = a list containing multiple 2-item lists: > > * one of these 2-item lists captures the element's character data: > * item 0 = string "0" > * item 1 = string containing element's character data > > * each other 2-item list is an E-NODE representing a subelement > -------------------------------------------------------------------- > > QUESTION: what set of Python tools comes closest to creating a data > structure similar to, or exactly like, the above? > > Thanks! > John > > -- > John Posner, Editor jjp@oreilly.com > O'Reilly & Associates 860-663-3147 > > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig > -- Greg Stein, http://www.lyra.org/ From larsga@garshol.priv.no Wed Jun 7 09:05:41 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 07 Jun 2000 10:05:41 +0200 Subject: [XML-SIG] Need an equivalent to Perl's XML::Parser "Tree" style In-Reply-To: <003301bfcfd1$b508cab0$6e64fea9@jake> References: <003301bfcfd1$b508cab0$6e64fea9@jake> Message-ID: * John Posner | | QUESTION: what set of Python tools comes closest to creating a data | structure similar to, or exactly like, the above? Probably demo/xmlproc/doctree.py in the XML-SIG package. The structure there is a three-tuple for every element (name, att dict, child list), which PCDATA appearing as strings. It was just written as a demo and never used for anything by me (and is also xmlproc-specific), but it's exactly what you asked for. Rewriting it for SAX or pyexpat should take all of 30 minutes. :) --Lars M. From akuchlin@mems-exchange.org Wed Jun 7 17:48:41 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Wed, 7 Jun 2000 12:48:41 -0400 (EDT) Subject: [XML-SIG] CVS tree moved to SourceForge Message-ID: <200006071648.MAA08274@amarok.cnri.reston.va.us> The XML-SIG's CVS tree has now moved to SourceForge, making it much easier to add new developers to the project. I'm going to update the relevant Web pages later today with the new checkout instructions; see http://sourceforge.net/project/?group_id=6473 For now, I'd just like to get the SourceForge IDs of people who want check-in privileges; please register at www.sourceforge.net if you don't already have an ID, and then mail me your ID. Greg Stein and Fred Drake have already been added. -- A.M. Kuchling http://starship.python.net/crew/amk/ 1, 2, 3, 4, 5, 5, 6, 7, 8, 10, 10, 11, 12, 14, 15, 16, 17, 18, 19, 20, 20, 22, 23, 24, 25, 26, 28, 29, 30, 30, 31, 32, 34, 35, 37 ... -- Kito counting, in Peter Greenaway's _8 1/2 Women_ (1999) From uogbuji@fourthought.com Wed Jun 7 20:12:46 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 07 Jun 2000 13:12:46 -0600 Subject: [XML-SIG] CVS tree moved to SourceForge In-Reply-To: Message from "Andrew M. Kuchling" of "Wed, 07 Jun 2000 12:48:41 EDT." <200006071648.MAA08274@amarok.cnri.reston.va.us> Message-ID: <200006071912.NAA02278@localhost.localdomain> > The XML-SIG's CVS tree has now moved to SourceForge, making it much > easier to add new developers to the project. I'm going to update the > relevant Web pages later today with the new checkout instructions; see > http://sourceforge.net/project/?group_id=6473 > > For now, I'd just like to get the SourceForge IDs of people who want > check-in privileges; please register at www.sourceforge.net if you > don't already have an ID, and then mail me your ID. Greg Stein and > Fred Drake have already been added. If you add ids uche and jkloth, we can keep 4DOM synced (after you do the initial replacement). Also, question for all, should we add xml/xpath and xml/xslt to the Sourceforge CVS, synced to 4XSLT and 4XPath? -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From akuchlin@mems-exchange.org Wed Jun 7 21:00:06 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Wed, 7 Jun 2000 16:00:06 -0400 Subject: [XML-SIG] CVS tree moved to SourceForge In-Reply-To: <200006071912.NAA02278@localhost.localdomain>; from uogbuji@fourthought.com on Wed, Jun 07, 2000 at 01:12:46PM -0600 References: <200006071912.NAA02278@localhost.localdomain> Message-ID: <20000607160006.A8416@amarok.cnri.reston.va.us> On Wed, Jun 07, 2000 at 01:12:46PM -0600, Uche Ogbuji wrote: >If you add ids uche and jkloth, we can keep 4DOM synced (after you do the >initial replacement). Done. I'll try to get the latest version checked in tonight. >Also, question for all, should we add xml/xpath and xml/xslt to the >Sourceforge CVS, synced to 4XSLT and 4XPath? I have no problem with that, if other SIG people agree with it. -- A.M. Kuchling http://starship.python.net/crew/amk/ If every man is supposed to think of sex once every nine minutes, what on earth does he think of in the other eight? -- Peter Greenaway, introductory quotation from _8 1/2 Women_ (1999) From gstein@lyra.org Wed Jun 7 23:30:21 2000 From: gstein@lyra.org (Greg Stein) Date: Wed, 7 Jun 2000 15:30:21 -0700 Subject: [XML-SIG] CVS tree moved to SourceForge In-Reply-To: <20000607160006.A8416@amarok.cnri.reston.va.us>; from akuchlin@cnri.reston.va.us on Wed, Jun 07, 2000 at 04:00:06PM -0400 References: <200006071912.NAA02278@localhost.localdomain> <20000607160006.A8416@amarok.cnri.reston.va.us> Message-ID: <20000607153021.D3348@lyra.org> On Wed, Jun 07, 2000 at 04:00:06PM -0400, Andrew M. Kuchling wrote: > On Wed, Jun 07, 2000 at 01:12:46PM -0600, Uche Ogbuji wrote: >... > >Also, question for all, should we add xml/xpath and xml/xslt to the > >Sourceforge CVS, synced to 4XSLT and 4XPath? > > I have no problem with that, if other SIG people agree with it. I'd be fine with that, and might encourage you to use that as your center of development (i.e. avoid the sync hassle). Cheers, -g -- Greg Stein, http://www.lyra.org/ From akuchlin@mems-exchange.org Thu Jun 8 01:56:40 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Wed, 7 Jun 2000 20:56:40 -0400 Subject: [XML-SIG] PyXML 0.5.5 release In-Reply-To: <20000606084514553.AAA599.581@hermes.cinetic.de>; from jhe@webde-ag.de on Tue, Jun 06, 2000 at 10:45:02AM +0100 References: <20000606084514553.AAA599.581@hermes.cinetic.de> Message-ID: <20000607205640.A7959@newcnri.cnri.reston.va.us> >This is the quickie fix to get it to build with VC++6 and the RELEASE >version of Python 1.5.2. :) Argh... I've modified your patch to work with either 1.5.2 or the 1.6 CVS tree, and added the new version of sgmlop.c. Can people please try out the 0.5.5.1 version at http://www.python.org/sigs/xml-sig/files/ and let me know if it fixes all the compilation problems? --amk From gstein@lyra.org Thu Jun 8 02:22:55 2000 From: gstein@lyra.org (Greg Stein) Date: Wed, 7 Jun 2000 18:22:55 -0700 Subject: [XML-SIG] PyXML 0.5.5 release In-Reply-To: <20000607205640.A7959@newcnri.cnri.reston.va.us>; from akuchlin@cnri.reston.va.us on Wed, Jun 07, 2000 at 08:56:40PM -0400 References: <20000606084514553.AAA599.581@hermes.cinetic.de> <20000607205640.A7959@newcnri.cnri.reston.va.us> Message-ID: <20000607182255.M3348@lyra.org> On Wed, Jun 07, 2000 at 08:56:40PM -0400, Andrew Kuchling wrote: > >This is the quickie fix to get it to build with VC++6 and the RELEASE > >version of Python 1.5.2. :) > > Argh... I've modified your patch to work with either 1.5.2 or the 1.6 > CVS tree, and added the new version of sgmlop.c. Can people please > try out the 0.5.5.1 version at > http://www.python.org/sigs/xml-sig/files/ and let me know if it fixes > all the compilation problems? hehe... let me step in with my regular mantra here. Why go thru the noise of calling it 0.5.5.1 instead of simply 0.5.6? x.x.6 is already "way down" on the iteration scale. Kinda weird to go further. And it isn't like somebody has deployed 0.5.5 and would be scared to go to 0.5.6 because "wow. that's a big jump... it might destabilize us." :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From uogbuji@fourthought.com Thu Jun 8 02:25:17 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 07 Jun 2000 19:25:17 -0600 Subject: [XML-SIG] CVS tree moved to SourceForge In-Reply-To: Message from Greg Stein of "Wed, 07 Jun 2000 15:30:21 PDT." <20000607153021.D3348@lyra.org> Message-ID: <200006080125.TAA03524@localhost.localdomain> > On Wed, Jun 07, 2000 at 04:00:06PM -0400, Andrew M. Kuchling wrote: > > On Wed, Jun 07, 2000 at 01:12:46PM -0600, Uche Ogbuji wrote: > >... > > >Also, question for all, should we add xml/xpath and xml/xslt to the > > >Sourceforge CVS, synced to 4XSLT and 4XPath? > > > > I have no problem with that, if other SIG people agree with it. > > I'd be fine with that, and might encourage you to use that as your center of > development (i.e. avoid the sync hassle). Worth considering. We'll go ahead just syncing up for now and see how it goes, then discuss it again in a few months. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From fdrake@acm.org Thu Jun 8 04:48:53 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 7 Jun 2000 23:48:53 -0400 (EDT) Subject: [XML-SIG] CVS tree moved to SourceForge In-Reply-To: <200006071912.NAA02278@localhost.localdomain> References: <200006071648.MAA08274@amarok.cnri.reston.va.us> <200006071912.NAA02278@localhost.localdomain> Message-ID: <14655.6053.515341.324962@cj42289-a.reston1.va.home.com> Uche Ogbuji writes: > Also, question for all, should we add xml/xpath and xml/xslt to the > Sourceforge CVS, synced to 4XSLT and 4XPath? I'd love to see the xml package as the definitive, one-stop shop for all standard XMLish things, so I'd say yes. -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From pieter@nagel.co.za Thu Jun 8 11:28:10 2000 From: pieter@nagel.co.za (Pieter Nagel) Date: Thu, 08 Jun 2000 12:28:10 +0200 Subject: [XML-SIG] character enitity references in ESIS DOM builder Message-ID: <3bsujs0n7vv26b85jloj0094c36pmegsim@4ax.com> Currently, xml/dom/esis_builder.py has only a small hardcoded map to transform only 11 of the ISOlat1 entity references to their ISO8859-1 representation. I have a patch to do the following: 1) make the SDATA map user extensible 2) make unknown entity references an error, instead of silently injecting "unknown" into the line * 3) move the global methods in the file into the class (needed for 1) 4) Very minor speedup (currently a regular expression is being recompiled twice for each line of ESIS data, instead of once per builder instantiation). 5) Code cleanup (meaningfull identifiers, remove commented dead code) Who should I send this to? * This has nasty effects on foreign text. As the sage said: "My ounknown wil dit nie hunkown en ek sunkown ook nunkownnunknown vir julle stop dit!" -- Pieter Nagel From larsga@garshol.priv.no Thu Jun 8 11:44:05 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 08 Jun 2000 12:44:05 +0200 Subject: [XML-SIG] character enitity references in ESIS DOM builder In-Reply-To: <3bsujs0n7vv26b85jloj0094c36pmegsim@4ax.com> References: <3bsujs0n7vv26b85jloj0094c36pmegsim@4ax.com> Message-ID: * Pieter Nagel | | Who should I send this to? This sounds like a good patch. You can post it here. --Lars M. From pedretti@roguewave.com Thu Jun 8 17:17:03 2000 From: pedretti@roguewave.com (John A. Pedretti) Date: Thu, 08 Jun 2000 10:17:03 -0600 Subject: [XML-SIG] Problems building python XML module on Linux Redhat 5.1 Message-ID: <393FC6FF.DE42995F@roguewave.com> I ran into the following problems when trying to build the XML package (release candidate PyXML-0.5.5.tar.gz, dated June 5, 2000, downloaded from http://www.python.org/sigs/xml-sig/status.html) on a Linux Redhat 5.1 box, with Python 1.52 installed in /usr/local: 1. Needed to fix an error in the source distribution PyXML-0.5.5/extensions/pyexpat.c: at line 474, replaced 'PyObject_New' with 'PyObject_NEW' (see include file /usr/local/include/python1.5/objimpl.h); 2. The directions in the README say installation of dist-utils is recommended - in fact, it is required or the install doesn't work (complains it can't find the directory /usr/local/lib/python1.5/site-packages/xml; the /usr/local/lib/python1.5/site-packages directory is created during the installation of dist-utils). From shao_lo@eudoramail.com Fri Jun 9 20:28:46 2000 From: shao_lo@eudoramail.com (shao lo) Date: Fri, 09 Jun 2000 12:28:46 -0700 Subject: [XML-SIG] windowsNT install problems Message-ID: After running "python setup.py build" and "python setup.py install", I wind up with a copy of the distribution in the "build\lib.win32" directory. There are no pyc files there and there is no XML directory in my "program files\python" tree. When I try running the samples the sax lib can not be found. I am kind of new to this means of setup, so any help you can offer would be much appreciated! Join 18 million Eudora users by signing up for a free Eudora Web-Mail account at http://www.eudoramail.com From ivanlan@home.com Fri Jun 9 20:40:58 2000 From: ivanlan@home.com (Ivan Van Laningham) Date: Fri, 09 Jun 2000 13:40:58 -0600 Subject: [XML-SIG] windowsNT install problems References: Message-ID: <3941484A.C8F74A62@home.com> Shao, I had a problem similar to this just a few days ago. I copied my findings into this message, below. It's possible that getting the very latest xml distribution might fix this, but I dunno. Metta, Ivan shao lo wrote: > > After running "python setup.py build" > and "python setup.py install", I wind up with a copy of the distribution in the "build\lib.win32" directory. There are no pyc files there and there is no XML directory in my "program files\python" tree. > > When I try running the samples the sax lib can not be found. > > I am kind of new to this means of setup, so any help you can offer would be much appreciated! > -------------------see below-------------------- Hi All-- [cc'd to Python-List in order to get this into FAQTS] Ivan Van Laningham wrote: > > Hi All-- > I've "installed" the PyXml distribution (0.5.4) and downloaded pyxie > (from Sean McGrath) which uses pyexpat. Despite having the pyexpat.dll > in my winnt directory, it refuses to import: > > >>> import pyexpat > Traceback (innermost last): > File "", line 1, in ? > ImportError: No module named pyexpat > >>> > > I know that when I get the answer to this, I'll be deeply embarassed > since it is almost certainly something stupid that I'm doing/not doing > (have we left anything out?), but I am prepared. > Tom Passin replied, yesterday: > Subject: Re: [XML-SIG] Pyexpat > Date: Mon, 5 Jun 2000 22:10:16 -0400 > From: > To: "Python xml-sig" > > I had trouble getting pyexpat to import too, on Win95/98. Right now I have > it in both C:\Program FIles\python\dlls and C:\Program > Files\Python\xml\parsers. One of them works, but after all these months, I > forget which one did the trick. You need it in a python directory, rather > than just a windows path directory, I think. There were a couple of other suggestions, but they didn't work. PyXml installed into c:\Python\Lib\xml on my system, but failed to install pyexpat.dll. I copied it into c:\Python\Lib\xml\parsers, without renaming it, and that fixed the problem. The install should copy pyexpat.dll to the correct place, especially since the install package comes with a pre-built one. It should also be in the FAQ, shouldn't it? Thanks, everyone, for your help. -ly y'rs, Ivan;-) ---------------------------------------------- Ivan Van Laningham Axent Technologies, Inc. http://www.pauahtun.org http://www.foretec.com/python/workshops/1998-11/proceedings.html Army Signal Corps: Cu Chi, Class of '70 Author: Teach Yourself Python in 24 Hours From arhodes@psionic.com Mon Jun 12 17:24:20 2000 From: arhodes@psionic.com (Aaron Rhodes) Date: Mon, 12 Jun 2000 11:24:20 -0500 Subject: [XML-SIG] Thread safety of parsers? Message-ID: <39450EB4.9B55A384@psionic.com> Hola! Does anyone know if the parser modules in the PyXML package are safe to use in threads? I briefly grepped through the source to see if the Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS macros were in the c extensions...but didn't find them. Thanks in advance... Aaron arhodes@psionic.com From m.favas@per.dem.csiro.au Tue Jun 13 22:38:41 2000 From: m.favas@per.dem.csiro.au (Mark Favas) Date: Wed, 14 Jun 2000 05:38:41 +0800 Subject: [XML-SIG] PyXML 0.5.5.1 pyexpat compilation errors Message-ID: <3946A9E1.3480681A@per.dem.csiro.au> I mailed a message to this group about similar issues with 0.5.4 last month, but I guess it got buried... ;) The issues are (details below): 1) pyexpat.c fails to compile without changes, and still has warnings when changed so that compilation does succeed 2) the linking step has a wildcard ("*") quoting problem 3) build_ext complains about old-style usage 4) parsers/xmlproc is an older version than that available from Lars's website Platform: DEC Alpha, Tru64 Unix V4.0F, Compaq C V6.1-110, Python 1.6a2 (#111, Jun 13 2000, 03:39:26) [C] on osf1V4 from CVS 13th June "python setup.py build" produces the following errors: running build_ext warning: build_ext: old-style (ext_name, build_info) tuple found in ext_modules for extension 'xml.parsers.pyexpat'-- please convert to Extension instance building 'xml.parsers.pyexpat' extension creating build/temp.osf1V-alpha creating build/temp.osf1V-alpha/extensions cc -c -Iextensions/expat/xmltok -Iextensions/expat/xmlparse -I/usr/local/include/python1.6 -O -Olimit 1500 extensions/pyexpat.c -o build/temp.osf1V-alpha/extensions/pyexpat.o cc: Error: extensions/pyexpat.c, line 82: The static declaration of "handler_info" is a tentative definition and specifies an incomplete type. (incompstat) staticforward struct HandlerInfo handler_info[]; ---------------------------------^ error: command 'cc' failed with exit status 1 Specifying an actual size for handler_info[] by replacing line 82 with staticforward struct HandlerInfo handler_info[64]; allows the compilation to proceed, with the following warnings: cc: Warning: extensions/pyexpat.c, line 834: In the initializer for handler_info [0].handler, the referenced type of the pointer value "my_StartElementHandler" i s "function (pointer to void, pointer to const char, pointer to pointer to const char) returning void", which is not compatible with "void". (ptrmismatch) my_StartElementHandler}, --------^ cc: Warning: extensions/pyexpat.c, line 837: In the initializer for handler_info [1].handler, the referenced type of the pointer value "my_EndElementHandler" is "function (pointer to void, pointer to const char) returning void", which is not compatible with "void". (ptrmismatch) my_EndElementHandler}, --------^ cc: Warning: extensions/pyexpat.c, line 840: In the initializer for handler_info [2].handler, the referenced type of the pointer value "my_ProcessingInstructionH andler" is "function (pointer to void, pointer to const char, pointer to const c har) returning void", which is not compatible with "void". (ptrmismatch) my_ProcessingInstructionHandler}, --------^ cc: Warning: extensions/pyexpat.c, line 843: In the initializer for handler_info [3].handler, the referenced type of the pointer value "my_CharacterDataHandler" is "function (pointer to void, pointer to const char, int) returning void", whic h is not compatible with "void". (ptrmismatch) my_CharacterDataHandler}, --------^ cc: Warning: extensions/pyexpat.c, line 846: In the initializer for handler_info [4].handler, the referenced type of the pointer value "my_UnparsedEntityDeclHand ler" is "function (pointer to void, pointer to const char, pointer to const char , pointer to const char, pointer to const char, pointer to const char) returning void", which is not compatible with "void". (ptrmismatch) my_UnparsedEntityDeclHandler }, --------^ cc: Warning: extensions/pyexpat.c, line 849: In the initializer for handler_info [5].handler, the referenced type of the pointer value "my_NotationDeclHandler" i s "function (pointer to void, pointer to const char, pointer to const char, poin ter to const char, pointer to const char) returning void", which is not compatib le with "void". (ptrmismatch) my_NotationDeclHandler }, --------^ cc: Warning: extensions/pyexpat.c, line 852: In the initializer for handler_info [6].handler, the referenced type of the pointer value "my_StartNamespaceDeclHand ler" is "function (pointer to void, pointer to const char, pointer to const char ) returning void", which is not compatible with "void". (ptrmismatch) my_StartNamespaceDeclHandler }, --------^ cc: Warning: extensions/pyexpat.c, line 855: In the initializer for handler_info [7].handler, the referenced type of the pointer value "my_EndNamespaceDeclHandle r" is "function (pointer to void, pointer to const char) returning void", which is not compatible with "void". (ptrmismatch) my_EndNamespaceDeclHandler }, --------^ cc: Warning: extensions/pyexpat.c, line 858: In the initializer for handler_info [8].handler, the referenced type of the pointer value "my_CommentHandler" is "fu nction (pointer to void, pointer to const char) returning void", which is not co mpatible with "void". (ptrmismatch) my_CommentHandler}, --------^ cc: Warning: extensions/pyexpat.c, line 861: In the initializer for handler_info [9].handler, the referenced type of the pointer value "my_StartCdataSectionHandl er" is "function (pointer to void) returning void", which is not compatible with "void". (ptrmismatch) my_StartCdataSectionHandler}, --------^ cc: Warning: extensions/pyexpat.c, line 864: In the initializer for handler_info [10].handler, the referenced type of the pointer value "my_EndCdataSectionHandle r" is "function (pointer to void) returning void", which is not compatible with "void". (ptrmismatch) my_EndCdataSectionHandler}, --------^ cc: Warning: extensions/pyexpat.c, line 867: In the initializer for handler_info [11].handler, the referenced type of the pointer value "my_DefaultHandler" is "f unction (pointer to void, pointer to const char, int) returning void", which is not compatible with "void". (ptrmismatch) my_DefaultHandler}, --------^ cc: Warning: extensions/pyexpat.c, line 870: In the initializer for handler_info [12].handler, the referenced type of the pointer value "my_DefaultHandlerExpandH andler" is "function (pointer to void, pointer to const char, int) returning voi d", which is not compatible with "void". (ptrmismatch) my_DefaultHandlerExpandHandler}, --------^ cc: Warning: extensions/pyexpat.c, line 873: In the initializer for handler_info [13].handler, the referenced type of the pointer value "my_NotStandaloneHandler" is "function (pointer to void) returning int", which is not compatible with "vo id". (ptrmismatch) my_NotStandaloneHandler}, --------^ cc: Warning: extensions/pyexpat.c, line 876: In the initializer for handler_info [14].handler, the referenced type of the pointer value "my_ExternalEntityRefHand ler" is "function (pointer to void, pointer to const char, pointer to const char , pointer to const char, pointer to const char) returning int", which is not com patible with "void". (ptrmismatch) my_ExternalEntityRefHandler }, --------^ The link step also appears to have a wildcard quoting problem. The ld command used is: ld -shared -expect_unresolved "*" build/temp.osf1V-alpha/extensions/pyexpat.o build/temp.osf1V-alpha/extensions/expat/xmltok/xmltok.o build/temp.osf1V-alpha/extensions/expat/xmltok/xmlrole.o build/temp.osf1V-alpha/extensions/expat/xmlwf/xmlfile.o build/temp.osf1V-alpha/extensions/expat/xmlwf/xmlwf.o build/temp.osf1V-alpha/extensions/expat/xmlwf/codepage.o build/temp.osf1V-alpha/extensions/expat/xmlparse/xmlparse.o build/temp.osf1V-alpha/extensions/expat/xmlparse/hashtable.o build/temp.osf1V-alpha/extensions/expat/xmlwf/unixfilemap.o -o build/lib.osf1V-alpha/xml/parsers/pyexpat.so which works correctly if put into a /bin/sh script produces pyexpat.so without warnings of unresolved externals (the -expect_unresolved "*" pattern matches all). However, when run by Python via the "python setup.py build" command, ld complains about all the unresolved externals: ld: Warning: Unresolved: fread strlen strncpy strcmp free malloc PyType_Type PyObject_GetAttrString _Py_NoneStruct PyObject_Init as if the pattern that ld is trying to match is literally "*" (double-quote-*-double-quote) instead of * -- Email - m.favas@per.dem.csiro.au Mark C Favas Phone - +61 8 9333 6268, 0418 926 074 CSIRO Exploration & Mining Fax - +61 8 9383 9891 Private Bag No 5, Wembley WGS84 - 31.95 S, 115.80 E Western Australia 6913 From davecosta@netscape.net Thu Jun 15 21:05:34 2000 From: davecosta@netscape.net (Dave Costa) Date: 15 Jun 00 13:05:34 PDT Subject: [XML-SIG] Problem installing Python XML Message-ID: <20000615200534.29377.qmail@www0v.netaddress.usa.net> I am attempting to install the Python XML package (0.5.2) on a Windows 95= machine, with distutils installed. I do have a C compiler already instal= led (Borland), but I can't figure out how to give the PyXML setup script the information it needs to use it. The specific error I am getting is: running build_ext building 'sgmlop' extension cl.exe /c /nologo /Ox /MD /W3 "-IC:\PROGRAM FILES\PYTHON\Include" /Tcextensions/sgmlop.c /Fobuild\temp.win32\Release\extensions/sgmlop.obj= error: command 'cl.exe' failed: No such file or directory If you have any advice to get this to work, I would appreciate. = Alternatively, is there a way to bypass the compilation of the C extensio= ns so that I can at least get the Python components installed? Thanks, Dave Costa ____________________________________________________________________ Get your own FREE, personal Netscape WebMail account today at http://webm= ail.netscape.com. From akuchlin@mems-exchange.org Fri Jun 16 03:27:52 2000 From: akuchlin@mems-exchange.org (A.M. Kuchling) Date: Thu, 15 Jun 2000 22:27:52 -0400 Subject: [XML-SIG] PyExpat changes Message-ID: I've just checked in the changes to make the Expat module return Unicode or 8-bit strings, depending on the setting of the returns_unicode attribute. Adding this proved to be messy; if anyone can suggest a neater way to do this, please let me know. Basically, the relevant calls to Py_BuildValue("s") become calls with "O&", returns_unicode ? : . Worse, creating the dictionary containing attributes requires two very similar parallel functions for each case. Also, I'm starting to find the capitalized method and attribute names .Parse, .StartElementHandler) annoying; the methods are more annoying than the attribute names. Think this is worth fixing? -- A.M. Kuchling http://starship.python.net/crew/amk/ That's the world as Sutekh would leave it: a desolate planet circling a dead sun. -- The Doctor, in "The Pyramids of Mars" From pete.black@metering.co.nz Fri Jun 16 06:29:37 2000 From: pete.black@metering.co.nz (Pete Black) Date: Fri, 16 Jun 2000 17:29:37 +1200 Subject: [XML-SIG] PyXML wierdness Message-ID: <000801bfd753$dde39440$0600a8c0@angela> This is a multi-part message in MIME format. ------=_NextPart_000_0005_01BFD7B8.722EB080 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi there, i'm trying to use PyXML to parse some XML files, and i get some odd = behaviour. using this code: from xml.dom import core,utils reader =3D utils.FileReader('c:\\TMLIntranet\\headlines\\headlines.xml') doc =3D reader.document Storage =3D "" print "." for n in doc.documentElement.childNodes: if n.nodeType=3D=3Dcore.TEXT: Storage=3DStorage+ n.nodeValue =20 #print Storage i get the following output: [] [] [] [] [] [] [] [] [] . What is going on here? It seems that the utils.FileReader function is = outputting these '[]'s to the screen when i run a file through it, but = surely this is not intended? Anyone know why this might happen and how i might fix it? (I am using Python 1.52 (the version shipped with Zope 2.16) on Win = 2000. I installed the PyXML module from the binary installer that was posted = to this list a while back. Could it be that this build has debugging = code in it or something? Regards -Pete (please respond directly, as i only read the archives of this list) ------=_NextPart_000_0005_01BFD7B8.722EB080 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable

Hi there,

i'm trying to use PyXML to parse some = XML files,=20 and i get some odd behaviour.

using this code:

from xml.dom import = core,utils

reader =3D=20 utils.FileReader('c:\\TMLIntranet\\headlines\\headlines.xml')
doc =3D = reader.document
Storage =3D ""
print "."
for n in=20 doc.documentElement.childNodes:
if=20 n.nodeType=3D=3Dcore.TEXT:
Storage=3DStorage+=20 n.nodeValue

#print Storage

i get the following = output:

[]

What is going on here? It seems that = the=20 utils.FileReader function is outputting these '[]'s to the screen when i = run a=20 file through it, but surely this is not intended?

Anyone know why this might happen and = how i might=20 fix it?

(I am using Python 1.52 (the version = shipped with=20 Zope 2.16) on Win 2000.

I installed the PyXML module from the = binary=20 installer that was posted to this list a while back. Could it be that = this build=20 has debugging code in it or something?

Regards

-Pete

(please respond directly, as i only = read the=20 archives of this list)

------=_NextPart_000_0005_01BFD7B8.722EB080-- From larsga@garshol.priv.no Fri Jun 16 09:31:21 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 16 Jun 2000 10:31:21 +0200 Subject: [XML-SIG] PyExpat changes In-Reply-To: References: Message-ID: * A. M. Kuchling | | Also, I'm starting to find the capitalized method and attribute | names .Parse, .StartElementHandler) annoying; the methods are more | annoying than the attribute names. Think this is worth fixing? I would be happy to see this change. However, I assume that this will break a lot of code. Perhaps we should make a backwards-compatible Python wrapper to avoid that? Then we might at the same time get rid of the ParserCreate function and replace it with a ExpatParser class constructor or something like it (ie: ParserCreate would return the backwards-compatible Python wrapper class). --Lars M. From akuchlin@mems-exchange.org Fri Jun 16 15:44:55 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Fri, 16 Jun 2000 10:44:55 -0400 Subject: [XML-SIG] PyExpat changes In-Reply-To: ; from larsga@garshol.priv.no on Fri, Jun 16, 2000 at 10:31:21AM +0200 References: Message-ID: <20000616104455.C15577@amarok.cnri.reston.va.us> On Fri, Jun 16, 2000 at 10:31:21AM +0200, Lars Marius Garshol wrote: >I would be happy to see this change. However, I assume that this will >break a lot of code. Perhaps we should make a backwards-compatible I was thinking of just leaving the old function & method names in place, and only documenting the new names. I can leave the attribute names alone. >Python wrapper to avoid that? Then we might at the same time get rid >of the ParserCreate function and replace it with a ExpatParser class >constructor or something like it (ie: ParserCreate would return the >backwards-compatible Python wrapper class). I really don't see the need of a Python wrapper class. Plus, that would require creating a pyexpat.py module and renaming pyexpat.c to _pyexpat.c. (Oh, maybe we could just create expat.py; I'm not enthralled by having "py" in the module name, since I know perfectly well that I'm using Python.) -- A.M. Kuchling http://starship.python.net/crew/amk/ There are two kinds of large software systems: those that evolved from small systems and those that don't work. -- Seen on slashdot.org From akuchlin@mems-exchange.org Fri Jun 16 15:53:41 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Fri, 16 Jun 2000 10:53:41 -0400 Subject: [XML-SIG] PyXML wierdness In-Reply-To: <000801bfd753$dde39440$0600a8c0@angela>; from pete.black@metering.co.nz on Fri, Jun 16, 2000 at 05:29:37PM +1200 References: <000801bfd753$dde39440$0600a8c0@angela> Message-ID: <20000616105341.D15577@amarok.cnri.reston.va.us> On Fri, Jun 16, 2000 at 05:29:37PM +1200, Pete Black wrote: >What is going on here? It seems that the utils.FileReader function is >outputting these '[]'s to the screen when i run a file through it, but >surely this is not intended? You're probably correct in that this is a debugging print that was forgotten. I don't remember what version was used in that binary distribution, but it shouldn't be difficult to track down the errant statement. -- A.M. Kuchling http://starship.python.net/crew/amk/ He found himself able to see each falling grain, distinct and unique; and he knew then he was dreaming. -- From SANDMAN #39: "Soft Places" From larsga@garshol.priv.no Fri Jun 16 16:08:38 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 16 Jun 2000 17:08:38 +0200 Subject: [XML-SIG] PyExpat changes In-Reply-To: <20000616104455.C15577@amarok.cnri.reston.va.us> References: <20000616104455.C15577@amarok.cnri.reston.va.us> Message-ID: * Lars Marius Garshol | | I would be happy to see this change. However, I assume that this will | break a lot of code. * Andrew M. Kuchling | | I was thinking of just leaving the old function & method names in | place, and only documenting the new names. I can leave the attribute | names alone. Aha. That works fine for me, although it might be better to document the old names as being deprecated. Some people learn from other people's code, while others learn from documentation. | I really don't see the need of a Python wrapper class. With your solution I agree that there is no need. But I still think it would be nice to call ParserCreate something else to give the illusion that it is a class. I think the entire API as it now stands is simply importing the ugliness of the expat C API into Python and I would be happier if we could make it look more like a normal Python class. I have no idea how much work is required, but this is what I would like to see. --Lars M. From wunder@ultraseek.com Fri Jun 16 16:53:16 2000 From: wunder@ultraseek.com (Walter Underwood) Date: Fri, 16 Jun 2000 08:53:16 -0700 Subject: [XML-SIG] PyExpat changes In-Reply-To: <20000616104455.C15577@amarok.cnri.reston.va.us> Message-ID: <202839.3170134396@[192.168.8.114]> --On Friday, June 16, 2000 10:44 AM -0400 "Andrew M. Kuchling" wrote: >... (Oh, maybe we could just create expat.py; I'm not > enthralled by having "py" in the module name, since I know perfectly > well that I'm using Python.) Do it. I changed that two years ago in the version integrated into Ultraseek Server, partly because I made some other compatibility changes and wanted to avoid conflicts. We're using a near-stock version of 0.5.5 with Python 1.6 in the current development code. Changing to dicts for the attributes was a bigger client code impact than changing attribute/method names, so I don't object to that. Last week, I also made the changes to use UTF-16 and Python Unicode objects. I'll send you what I did. There are some tricky bits, especially on Solaris, where wchar_t is int (32-bits) and Python uses unsigned short. But it basically works on NT, Solaris, Linux, and HP-UX. We've only exercised the most common handlers. It also needed one of the infernal dllexport declarations on the init function to be properly loaded from a DLL in 1.6a2. I'll send it as soon as I can go to it (I'm on the Mac at home, the bits are in CVS at work). Oh, and the pyexpat README in the dist seems to be the wrong version. wunder -- Walter Underwood Senior Staff Engineer, Ultraseek Corp. http://www.ultraseek.com/ From gvwilson@nevex.com Mon Jun 19 14:21:01 2000 From: gvwilson@nevex.com (Greg Wilson) Date: Mon, 19 Jun 2000 09:21:01 -0400 (EDT) Subject: [XML-SIG] Python XML docs / questions Message-ID: Hi, everyone. I'm playing with the Python SAX library, and have a couple of questions. I'd be happy to turn their answers into contributions to the docs, if you think that it's worth adding to the SAX-1 docs at this point. (Alternatively, I'd be happy to help with SAX-2 docs if that would be more useful.) First, is there a standard 'EntityResolver' in the library that will handle or define all of the basics HTML entities, such as < and (or is there an example of how to create such a beast)? Second, is there a way to access the current document location (line and column number) from within the handler, for tracing/debugging purposes? The home page for SAX talks about a 'Locator' interface, but I can't find hooks for this in the Python version. Thanks very much, Greg From YeaNews@YouthEnterNet.org Mon Jun 19 15:00:59 2000 From: YeaNews@YouthEnterNet.org (YeaNews@YouthEnterNet.org) Date: Mon, 19 Jun 2000 10:00:59 -0400 (EDT) Subject: [XML-SIG] L.A. Kids Message-ID: <20000619140059.C784D1CED2@dinsdale.python.org> Question: What do. . . 500 Kids The Shrine 12 Youth Organizations Mike Milken Danny Glover Edward James Olmos A Circus June 6, 2000 Los Angeles Chlo� Sevigny Michael Clarke Duncan A Carnival First Lady of California Sharon Davis Cyril Drabinsky Charlie Fleischer . . . have in common? Answer: Youth EnterNet of America's "A Celebration to Benefit Los Angeles Kids" Check out www.YouthEnterNet.org/la to see the smiles, the celebrities and the fun! **** You have been selected to receive this notification. If you wish to receive no further contact from YouthEnterNet.org, simply go to www.YouthEnterNet.org/optout.htm to unsubscribe. **** From larsga@garshol.priv.no Mon Jun 19 17:09:48 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 19 Jun 2000 18:09:48 +0200 Subject: [XML-SIG] Python XML docs / questions In-Reply-To: References: Message-ID: Hi Greg, * Greg Wilson | | Hi, everyone. I'm playing with the Python SAX library, and have a | couple of questions. I'd be happy to turn their answers into | contributions to the docs, if you think that it's worth adding to | the SAX-1 docs at this point. (Alternatively, I'd be happy to help | with SAX-2 docs if that would be more useful.) At this stage contributions to the SAX 2 docs would definitely be the most useful thing to have. | First, is there a standard 'EntityResolver' in the library that will | handle or define all of the basics HTML entities, such as < and | (or is there an example of how to create such a beast)? This can't be done, because the EntityResolver is only for external entites, not for internal ones like the HTML character entities. And in any case, the XML parser should take care of those for you when it reads the DTD. In SAX 2.0 you can deal with entities skipped by the SAX parser by overriding the skippedEntity callback which should be fired by XML parsers that do not read the external DTD subset and thus haven't seen the definitions for the internal entities. Unfortunately, I can't see any way to implement it with pyexpat. (It is implemented for xmllib in my private CVS tree, and not needed for xmlproc.) | Second, is there a way to access the current document location (line | and column number) from within the handler, for tracing/debugging | purposes? The home page for SAX talks about a 'Locator' interface, | but I can't find hooks for this in the Python version. The ContentHandler has a setDocumentLocator callback that the parser calls to give the application a Locator. This exists in the Python version as well and is implemented by all the drivers. Using it should be straightforward. --Lars M. From paul@prescod.net Tue Jun 20 12:21:03 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 20 Jun 2000 13:21:03 +0200 Subject: [XML-SIG] Feed and startDocument Message-ID: <394F539F.5DE9069F@prescod.net> If an application uses the "feed" interface of our extended SAX rather than the parse interface, is it the application's job to call its own startDocument handler or should the parser/driver recognize the first call to feed and do the right thing? If the latter, Lars, could you check whether your Sax2 driver for PyExpat does the right thing? -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "Music is the stuff between the notes." - Claude Debussy From paul@prescod.net Tue Jun 20 12:22:24 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 20 Jun 2000 13:22:24 +0200 Subject: [XML-SIG] PyExpat changes References: Message-ID: <394F53F0.96141C51@prescod.net> "A.M. Kuchling" wrote: > > ... > Basically, > the relevant calls to Py_BuildValue("s") become calls with "O&", > returns_unicode ? : that makes an 8-bit string>. Worse, creating the dictionary > containing attributes requires two very similar parallel functions for > each case. Can't think of anything prettier off the top of my head. > Also, I'm starting to find the capitalized method and attribute names > .Parse, .StartElementHandler) annoying; the methods are more annoying > than the attribute names. Think this is worth fixing? No. We should not document the PyExpat API. I do not believe that there is any performance loss in using the SAX API instead. Typing the upper case letters is a reminder that you shouldn't be typing those method names at all (except in creating the SAX driver). Yes, this means that we need to finish SAX 2 for Python 1.6. I'm not clear on where we are with that but I should have some time to help really soon. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "Music is the stuff between the notes." - Claude Debussy From fdrake@beopen.com Tue Jun 20 14:02:11 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Tue, 20 Jun 2000 09:02:11 -0400 (EDT) Subject: [XML-SIG] PyExpat changes In-Reply-To: <394F53F0.96141C51@prescod.net> References: <394F53F0.96141C51@prescod.net> Message-ID: <14671.27475.885443.148910@cj42289-a.reston1.va.home.com> Paul Prescod writes: > Yes, this means that we need to finish SAX 2 for Python 1.6. I'm not > clear on where we are with that but I should have some time to help > really soon. pyexpat is in the core distribution, but not built by default for all the usual reasons. I think everything we need to do for the module has been done (Andrew?) -- the remaining "issues" have everything to do with expat build/configuration/API issues and not the Python bindings. I think Lars is still working on SAX2 support; I presume he intends his module to become the saxlib module distributed with Python (Lars?). We still need documentation. -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From larsga@garshol.priv.no Tue Jun 20 14:09:39 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 20 Jun 2000 15:09:39 +0200 Subject: [XML-SIG] Feed and startDocument In-Reply-To: <394F539F.5DE9069F@prescod.net> References: <394F539F.5DE9069F@prescod.net> Message-ID: * Paul Prescod | | If an application uses the "feed" interface of our extended SAX | rather than the parse interface, is it the application's job to call | its own startDocument handler or should the parser/driver recognize | the first call to feed and do the right thing? It is the responsibility of the parser/driver. | If the latter, Lars, could you check whether your Sax2 driver for | PyExpat does the right thing? It does not. Me fix. Thanks for reporting it! --Lars M. From larsga@garshol.priv.no Tue Jun 20 14:13:21 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 20 Jun 2000 15:13:21 +0200 Subject: [XML-SIG] PyExpat changes In-Reply-To: <394F53F0.96141C51@prescod.net> References: <394F53F0.96141C51@prescod.net> Message-ID: * Paul Prescod | | Yes, this means that we need to finish SAX 2 for Python 1.6. I'm not | clear on where we are with that Pretty far along. There are some weak spots that need firming up in the definition and the drivers need to be taken those last 5 % and then tested. For my own part I am bound up with the book, but expect to be done with that within ~3 weeks. | but I should have some time to help really soon. Great! Let me know and I will synchronize the CVS so you get access to the latest stuff. (Developing at home forces this arrangement, alas.) --Lars M. From larsga@garshol.priv.no Tue Jun 20 14:16:20 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 20 Jun 2000 15:16:20 +0200 Subject: [XML-SIG] PyExpat changes In-Reply-To: <14671.27475.885443.148910@cj42289-a.reston1.va.home.com> References: <394F53F0.96141C51@prescod.net> <14671.27475.885443.148910@cj42289-a.reston1.va.home.com> Message-ID: * Fred L. Drake, Jr. | | I think Lars is still working on SAX2 support; Yes. | I presume he intends his module to become the saxlib module | distributed with Python (Lars?). That was the idea, yes. --Lars M. From paul@prescod.net Tue Jun 20 14:20:06 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 20 Jun 2000 15:20:06 +0200 Subject: [XML-SIG] Python 1.6 XML APIs Message-ID: <394F6F86.CF603637@prescod.net> We need a little more concentrated coordination on XML in Python 1.6. I'll do what I can over the next two weeks. I did some thinking about what I consider a coherent strategy while I was on a plane recently. Here is what I'm thinking: ==== We know that no single Python processing toolkit can be everything to everyone. Each must make performance/ease of use trade-offs that will not always be applicable. Therefore we need more than one XML parsing API. I think that there are two main axes where performance and ease of use are traded off. The one axis is labelled "full tree versus streaming." DOM and qp_xml are "full tree". SAX is streaming. There is a strong concensus that we need both full tree and streaming APIs in Python 1.6. For various quasi-technical reasons, I think that must people expect those APIs to be SAX 2 (or some subset) and some sort of miniature DOM. Therefore I am in the process of cleaning up minidom. I am almost done -- the last step is SAX 2 integration. I think Lars is almost done with SAX 2 so we are doing pretty well. The other axis is labelled "friendly XML-specific objects" versus "primitive Python objects". The DOM uses "friendly XML-specific objects" whereas qp uses primitive objects. I think that both options are important so I favor putting qp into Python 1.6 if it can be made SAX 2 compatible and properly documented in time. (I am willing to work on this) tree/primitive objs = qp tree/XML objs = minidom streaming/primitive objs = SAX streaming/XML objs = ??? In the fourth quandrant are libraries like my EventDOM which are streaming but use friendly objects. Right now, EventDOM is way too heavyweight for the standard distribution because it is dispatcher is so sophisticated (and slow!!) Nevertheless, in only 150 lines I have implemented a streaming API that uses friendly DOM objects. I call it PullDOM. It has the following characteristics: * it builds heavily on minidom, which is why it is so small * minidom itself is only 600 lines (it might grow by a third once we add convenience functions and other such junk) * it uses a "pull" methodology which is a little more flexible and easy to learn than the traditional "push". In the documentation we can describe how to build a ten-line dispatch engine. (see below) * the API is brain-dead simple (see below) * every node knows its parent nodes so context-based checking is easy * any node can easily be expanded into a "subtree" easily -- you get some of the benefits of a tree-API with much less overhead * processes Hamlet in 2 seconds on P3/450 * simple! simple! simple! convenient! convenient! convenient! In general, I think that it is a really nice simplicity/performance middle ground. Much, much, much easier to use then straight SAX and much, much, much more performant (esp. for large documents) than DOM. Right now the API consists of basically two functions and one class with one method and one protocol. Functions: parse( stream_or_filename_or_url) parseXML( string ) Each of these returns a DOMEventStream object. It can be used in one of two ways: 1. for (token_type, node) in pulldom.parse( "hamlet.xml" ): print token_type, node 2. events=pulldom.parse( "hamlet.xml" ) while token: token=events.getEvent( ) if token: (token_type, node)=token print token_type, node token_types are: ("START_ELEMENT", "END_ELEMENT", "COMMENT", "START_DOCUMENT", "END_DOCUMENT", "PROCESSING_INSTRUCTION", "IGNORABLE_WHITESPACE", "CHARACTERS") At any point you can build a subtree: if token_type=="START_ELEMENT" and node.tagName=="TABLE" \ and node.namespaceURI="http://www.w3.org/...": events.expandNode( node ) print node.child_nodes Now the node has children. The next call to getEvent (or __getitem__) returns the node that follows this one, not a child node. ==== Why didn't I put in a dispatcher? I've been down this path many times before. First you want to dispatch on node types. Then element types. Then namespace-qualified element types. Then namespaces with no element type and element types with no namespaces. Then context. Then attribute values. Then context AND attribute values. Eventually you end up reinventing XSLT in Python syntax. I am totally in favor of reinventing XSLT in Python but *not* as part of the standard distribution (at least not yet). Therefore, I will write documentation that *demonstrates* a few of these dispatching strategies and let the user use their imagination. Using simple DOM commands you can get from children to parents, check attributes, check namespaces, etc. You don't need to learn some kind of addressing "sublanguage" -- just use the same old DOM properties. === I would like the same two parse methods to be available in minidom, qp, and pulldom. So minidom.parse("hamlet.xml") gives you a DOM qp.parse( "hamlet.xml" ) gives you a qp data structure pulldom.parse( "hamlet.xml" ) gives you a DOM event stream sax.parse( "hamlet.xml" ) doesn't really return anything, but it processes your document with your document handler. Under the covers, all APIs use SAX, so behavior should be extremely consistent between all modules. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "Music is the stuff between the notes." - Claude Debussy From paul@prescod.net Tue Jun 20 15:04:24 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 20 Jun 2000 16:04:24 +0200 Subject: [XML-SIG] PyExpat changes References: <394F53F0.96141C51@prescod.net> <14671.27475.885443.148910@cj42289-a.reston1.va.home.com> <14671.29667.226155.251237@cj42289-a.reston1.va.home.com> Message-ID: <394F79E8.24F04519@prescod.net> If there is a positive response (or no response :) from the SIG, I can have minidom and pulldom ready for Python 1.6 beta testing tommorrow. minidom and pulldom are both pretty small modules (less than 400 lines put together!) so I don't expect a large number of subtle bugs to arise during testing. qp would take another few days to integrate with SAX 2. I'm not as committed to qp but I know people are using it productively and it is also pretty tiny (300 lines). I'd like some more opinions on the wisdom of trying to put that into Python 1.6. -- Paul Prescod "Music is the stuff between the notes." - Claude Debussy From fdrake@beopen.com Tue Jun 20 16:12:17 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Tue, 20 Jun 2000 11:12:17 -0400 (EDT) Subject: [XML-SIG] Talk at Washington Area SGML/XML Users Group Message-ID: <14671.35281.252785.155569@cj42289-a.reston1.va.home.com> I'll be giving a talk at the Washington Area SGML/XML Users Group (http://www.eccnet.com/sgmlug/) tomorrow night. If you're in the area, feel free to drop in! Even if you can't make it, you might want to check back at the group's Web page occaisionally to see who will be speaking; there are some interesting talks! (And the cookies are excellent!) I'll be talking about what Python 1.6 will offer the XML community, and what the formation of PythonLabs at BeOpen means to the Python community. -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From fdrake@beopen.com Tue Jun 20 16:29:16 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Tue, 20 Jun 2000 11:29:16 -0400 (EDT) Subject: [XML-SIG] Python 1.6 XML APIs In-Reply-To: <394F6F86.CF603637@prescod.net> References: <394F6F86.CF603637@prescod.net> Message-ID: <14671.36300.815139.963941@cj42289-a.reston1.va.home.com> Paul, This all sounds really interesting! I am hesitant to say we'll accept several new modules for 1.6 at this point, however. Perhaps it makes sense to pick a couple that offer the flexibility (saxlib?) and ease-of-use (pulldom? minidom?). If we can narrow it down quickly, I can talk to Guido about what should go in, but we're essentially at feature-freeze now. We can be a little more flexible for library modules, but each module that gets added is essentially a promise that it'll be maintained for at least half of all eternity. More, if anyone uses it. ;) Does it make sense to make the XML support in the core a package? It sounds like we'll have as many as three new modules: pyexpat, saxlib, and ????dom. There's also the legacy xmllib to think about. We don't want to clobber the "xml" package namespace, either. Perhaps xmllib should become a package which imports its current contents from a sub-module, and also contains the new modules (except pyexpat)? -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From paul@prescod.net Tue Jun 20 16:56:15 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 20 Jun 2000 17:56:15 +0200 Subject: [XML-SIG] Python 1.6 XML APIs References: <394F6F86.CF603637@prescod.net> <14671.36300.815139.963941@cj42289-a.reston1.va.home.com> Message-ID: <394F941F.670DF33E@prescod.net> "Fred L. Drake, Jr." wrote: > > Paul, > This all sounds really interesting! > I am hesitant to say we'll accept several new modules for 1.6 at > this point, however. Perhaps it makes sense to pick a couple that > offer the flexibility (saxlib?) and ease-of-use (pulldom? minidom?). Pulldom depends on minidom so you get two for the price of one. Actually, if I was smart, I would have just concatenated the code and called it one module...is it too late for me to pretend that they were never two modules. :) :) Once you have minidom, pulldom gives you so much bang per buck that I would hate to lose it. I really think that it is alot easier to use than SAX because it isn't as generalized and optimized. I am willing to lose qp considering our timelines. We can try again for 1.7. I am actually not that tied to SAX 2 either. PyExpat needs to expose a SAX 2-compatible interface but Python doesn't have first class "interface modules" so that doesn't imply much more than a little bit of wrapper code. If we DO want to put SAX in, then we would still probably lose the drivers. Drivers could be distributed with parsers. That takes us down to basically 4 files: saxlib - the core library -- probably necessary saxexts - most of this is not useful until you have more than one parser saxmisc - this is basically interface documentation not code saxutils - useful, but maybe overkill for the builtin library I could go either way on including any of them. You *can use SAX* with only saxlib. In fact, the only thing I can think of that you really, really need is SAXException and SAXParseException. Most of the rest is interface documentation (empty base classes) and methods to help you find parsers (not relevant until you have multiple of them). Of course Lars has thought about this alot more than I have. I think his opinion would be valuable. So I'm willing to push for: mini/pushdom "minisax" pyexpat and leave the rest to the XML-sig distribution. > If we can narrow it down quickly, I can talk to Guido about what > should go in, but we're essentially at feature-freeze now. We can be > a little more flexible for library modules, but each module that gets > added is essentially a promise that it'll be maintained for at least > half of all eternity. More, if anyone uses it. ;) The *dom modules are really so tiny and easy to maintain...no C code, few external dependencies...few published methods...I wish I had thought about this 80/20 point a year ago! I'll send them to you tonight and you'll see what I mean. I just need a tiny bit more testing. Too bad about your talk!!! > Does it make sense to make the XML support in the core a package? > It sounds like we'll have as many as three new modules: pyexpat, > saxlib, and ????dom. There's also the legacy xmllib to think about. > We don't want to clobber the "xml" package namespace, either. Perhaps > xmllib should become a package which imports its current contents from > a sub-module, and also contains the new modules (except pyexpat)? I think that the last time we got into this discussion it became a rathole of "shouldn't we packagize the whole Python library?" and "what does this mean for the Python xml distribution". I don't want to "go there" unless these questions have been answered. The path of least resistance is to packagize later with everything else. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "Music is the stuff between the notes." - Claude Debussy From fdrake@beopen.com Tue Jun 20 20:28:09 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Tue, 20 Jun 2000 15:28:09 -0400 (EDT) Subject: [XML-SIG] Extension modules in PyXML CVS Message-ID: <14671.50633.42518.626021@cj42289-a.reston1.va.home.com> The CVS repository still contains the intl.c, pyexpat.c, sgmlop.c, and wstrop.c modules. Are these still needed? Are we still trying to support Python 1.5.X? If we're only interested in 1.6 support, we can drop the entire extensions/ directory and simplify installation. -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From akuchlin@mems-exchange.org Tue Jun 20 20:31:20 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Tue, 20 Jun 2000 15:31:20 -0400 Subject: [XML-SIG] Extension modules in PyXML CVS In-Reply-To: <14671.50633.42518.626021@cj42289-a.reston1.va.home.com>; from fdrake@beopen.com on Tue, Jun 20, 2000 at 03:28:09PM -0400 References: <14671.50633.42518.626021@cj42289-a.reston1.va.home.com> Message-ID: <20000620153120.J3142@amarok.cnri.reston.va.us> On Tue, Jun 20, 2000 at 03:28:09PM -0400, Fred L. Drake, Jr. wrote: > The CVS repository still contains the intl.c, pyexpat.c, sgmlop.c, >and wstrop.c modules. Are these still needed? Are we still trying to >support Python 1.5.X? Since 1.6 is still in alpha, yes, 1.5.2 should still be supported, though I've been negligent in actually testing that. Once 1.6b1 is released, then I would make a final 1.5.2-compatible tarball, and then start simplifying the CVS tree by using 1.6's new features. --amk From larsga@garshol.priv.no Tue Jun 20 21:16:55 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 20 Jun 2000 22:16:55 +0200 Subject: [XML-SIG] Python 1.6 XML APIs In-Reply-To: <394F941F.670DF33E@prescod.net> References: <394F6F86.CF603637@prescod.net> <14671.36300.815139.963941@cj42289-a.reston1.va.home.com> <394F941F.670DF33E@prescod.net> Message-ID: * Paul Prescod | | If we DO want to put SAX in, then we would still probably lose the | drivers. Agreed, except for the parser that come with Python. | That takes us down to basically 4 files: | | saxlib - the core library -- probably necessary It is. | saxexts - most of this is not useful until you have more than one | parser Actually, this is essential, because it allows you to write code that is independent of a specific installation. If this is in the core distribution you can always use it without worrying about what the user may or may not have installed. If it's not in the core it's almost useless. Also, it's really sax2exts we need. | saxmisc - this is basically interface documentation not code This is obsolete. | saxutils - useful, but maybe overkill for the builtin library Some of the stuff could probably be moved to another module, although that might give us installation headaches. I'm very tired and stressed with the book right now, so I'll think more about this and get back to it later. Once I get some feedback on how slim we want the saxlib in the core to be it is easier to get an idea of what to do. * Fred L. Drake, jr. | | Does it make sense to make the XML support in the core a package? Well, the stuff is already a package in the XML-SIG distro, so I'm in favour of keeping that as it is. --Lars M. From fdrake@beopen.com Tue Jun 20 21:20:29 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Tue, 20 Jun 2000 16:20:29 -0400 (EDT) Subject: [XML-SIG] Extension modules in PyXML CVS In-Reply-To: <20000620153120.J3142@amarok.cnri.reston.va.us> References: <14671.50633.42518.626021@cj42289-a.reston1.va.home.com> <20000620153120.J3142@amarok.cnri.reston.va.us> Message-ID: <14671.53773.211130.566306@cj42289-a.reston1.va.home.com> Andrew M. Kuchling writes: > Since 1.6 is still in alpha, yes, 1.5.2 should still be supported, > though I've been negligent in actually testing that. Once 1.6b1 is > released, then I would make a final 1.5.2-compatible tarball, and then > start simplifying the CVS tree by using 1.6's new features. Go ahead and cut the package; as soon as SRE passes test_re, we're beta. See Tim's note to python-dev. -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From paul@prescod.net Tue Jun 20 21:45:44 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 20 Jun 2000 15:45:44 -0500 Subject: [XML-SIG] Python 1.6 XML APIs References: <394F6F86.CF603637@prescod.net> <14671.36300.815139.963941@cj42289-a.reston1.va.home.com> <394F941F.670DF33E@prescod.net> Message-ID: <394FD7F8.D84A27BF@prescod.net> Lars Marius Garshol wrote: > > ... > > | saxexts - most of this is not useful until you have more than one > | parser > > Actually, this is essential, because it allows you to write code that > is independent of a specific installation. If this is in the core > distribution you can always use it without worrying about what the > user may or may not have installed. If they have Python then we know they have PyExpat. We can use that by default. If they want to be able to move magically (as opposed to "easily") between multiple parsers then they can install the XML distribution and get all of the magic with choosing parsers based on features automatically etc. > Once I get some feedback on how slim we want the saxlib in the core to > be it is easier to get an idea of what to do. I don't know either. My target would be "one module and one driver". We don't have a benevolent dictator in this area so we're on our own. Breaking the xml distribution is to be avoided but it is a relatively minor issue compared to adding modules to Python core "the right way". The XML distro version should install "around" the core libraries and build on them. Of all this stuff, the only thing I consider essential is that PyExpat be able to expose a SAX2 API. Helper functions are gravy. According to the version I have, drv_pyexpat has one minor dependency on saxutils and major ones on saxlib. saxlib has no dependencies. Therefore I propose that we include drv_pyexpat and saxlib. We can break the minor dependency by moving a class. We should rename drv_pyexpat to "saxparser" and I would like a brain-dead simple "import saxparser; saxparser.parse( filename, handler, otherargs ) to do the right thing. I could do this work tonight but I may hurt more than help in terms of multiple versions of files floating around. It's up to you. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "Music is the stuff between the notes." - Claude Debussy From btusdin@mulberrytech.com Tue Jun 20 22:48:37 2000 From: btusdin@mulberrytech.com (B. Tommie Usdin) Date: Tue, 20 Jun 2000 17:48:37 -0400 Subject: [XML-SIG] Extreme Markup Conference - Call For Late Breaking News Message-ID: If you would like to give a Late Breaking News presentation at Extreme Markup Languages send in your submission by June 30th! The organizers of the Extreme Markup Languages Conference have reserved a few of the speaking slots in order to provide the freshest possible salad again this year. These slots will be awarded entirely at the discretion of the conference chairs and co-chairs only a couple of weeks before the conference actually occurs, just as the final programs go to press. Preference will be given to the most technically newsworthly submissions. --------------------------------------------------------- ************* Call for Participation ************** ************ Late Breaking News ************* *********** Extreme Markup Languages 2000 ************ --------------------------------------------------------- WHAT: Call for Late Breaking News WHEN: Late Breaking Proposals due: June 30, 2000 Conference: August 15-18, 2000 Tutorials August 13-14, 2000 WHERE: Montreal, Canada SPONSOR: Graphic Communications Association (GCA) Chairs: Steven R. Newcomb, TechnoTeacher, Inc. B. Tommie Usdin, Mulberry Technologies, Inc. Co-Chairs: Deborah A. Lapeyre, Mulberry Technologies, Inc. C. M. Sperberg-McQueen, World Wide Web Consortium/MIT Laboratory for Computer Sciences HOW: Submit the following information to: Extreme@mulberrytech.com - Name - Affiliation - Email address - Presentation Title - Abstract: 100 words suitable for distribution - Where, and when, this information been presented - Additional Information: any further information you wish to provide to the conference committee to help us in our selections and deliberations. QUESTIONS: Email to Extreme@mulberrytech.com or call Tommie Usdin +1 301/315-9631 MORE INFORMATION: For updated information on the program and plans for the conference, see http://www.extrememarkup.net Extreme is a new, highly technical conference concentrating on the evolving abstractions that underlie modern information management solutions, how those abstractions enhance human productivity, and how they are being applied. Abstract and concrete information models, systems built on them, software to exploit them, SGML, XML, XSL, XLink, schemas, Topic Maps, query languages, and other markup-related topics are in scope for this conference. Speakers will have 30 minutes for their prepared remarks followed by fifteen minutes during which the audience will pose questions. ====================================================================== Extreme Markup Languages 2000 mailto:extreme@mulberrytech.com August 13-18, 2000 details: http:www.gca.org Montreal, Canada author info: http://www.mulberrytech.com/Extreme ====================================================================== ====================================================================== B. Tommie Usdin mailto:btusdin@mulberrytech.com Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Phone: 301/315-9631 Suite 207 Direct Line: 301/315-9634 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ====================================================================== From tpassin@home.com Wed Jun 21 05:35:33 2000 From: tpassin@home.com (tpassin@home.com) Date: Wed, 21 Jun 2000 00:35:33 -0400 Subject: [XML-SIG] Python 1.6 XML APIs References: <394F6F86.CF603637@prescod.net> Message-ID: <003e01bfdb3a$24029b80$7cac1218@reston1.va.home.com> Paul Prescod wrote about qp_xml and his minidom API. I'm basically in favor both of his proposals and his reasoning behnd them. They sound like the kind of thing I'd want to use myself. But I'm a little concerned that this might be rushing a bit too much, to get two pretty new systems into a new distribution in such a short time. Especially the new API. How could they get enough exposure and testing in time? If the test/exposure aspect can be handled, I'd say yes, do it! And many thanks to Paul for his effort and persistence. Regards, Tom Passin From tpassin@home.com Wed Jun 21 05:42:09 2000 From: tpassin@home.com (tpassin@home.com) Date: Wed, 21 Jun 2000 00:42:09 -0400 Subject: [XML-SIG] Extension modules in PyXML CVS References: <14671.50633.42518.626021@cj42289-a.reston1.va.home.com> Message-ID: <005501bfdb3b$123e7620$7cac1218@reston1.va.home.com> Fred L. Drake, Jr, asks > > The CVS repository still contains the intl.c, pyexpat.c, sgmlop.c, > and wstrop.c modules. Are these still needed? Are we still trying to > support Python 1.5.X? > If we're only interested in 1.6 support, we can drop the entire > extensions/ directory and simplify installation. > > Fred, we're not all going to suddenly drop 1.5.2 and rush to 1.6, I don't think. Not with Unicode and all the other changes I gather are in there, including the packaging system. I'm certainly not too clear on backward compatibility of some of these things. Not to mention likely buginess of major new releases. Please take people's caution into account. Tom Passin From fdrake@beopen.com Wed Jun 21 06:58:50 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Wed, 21 Jun 2000 01:58:50 -0400 (EDT) Subject: [XML-SIG] Python 1.6 XML APIs In-Reply-To: <003e01bfdb3a$24029b80$7cac1218@reston1.va.home.com> References: <394F6F86.CF603637@prescod.net> <003e01bfdb3a$24029b80$7cac1218@reston1.va.home.com> Message-ID: <14672.22938.254950.324465@cj42289-a.reston1.va.home.com> tpassin@home.com writes: > I'm basically in favor both of his proposals and his reasoning behnd them. > They sound like the kind of thing I'd want to use myself. But I'm a little > concerned that this might be rushing a bit too much, to get two pretty new > systems into a new distribution in such a short time. Especially the new > API. How could they get enough exposure and testing in time? My biggest concern lies in the potential bugginess of the implementation; I expect Paul knows more about XML & actually applying it than most of us (though possibly not all), and has sufficient experience that he can craft a solid API. If the code he's talking about is stuff that he's been using a while, I expect its well tested in practice. On the other hand, getting anything new documented in time will be difficult! ;) -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From jack@oratrix.nl Wed Jun 21 09:21:50 2000 From: jack@oratrix.nl (Jack Jansen) Date: Wed, 21 Jun 2000 10:21:50 +0200 Subject: [XML-SIG] Python 1.6 XML APIs In-Reply-To: Message by Paul Prescod , Tue, 20 Jun 2000 15:20:06 +0200 , <394F6F86.CF603637@prescod.net> Message-ID: <20000621082151.0A2DE37186A@snelboot.oratrix.nl> Should we rationalize the names of the XML packages for Python? Paul's breakdown probably gives a good starting point for names that could actually mean something to the end user in stead of having to be memorized... -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From wunder@ultraseek.com Wed Jun 21 15:38:46 2000 From: wunder@ultraseek.com (Walter Underwood) Date: Wed, 21 Jun 2000 07:38:46 -0700 Subject: [XML-SIG] Extension modules in PyXML CVS In-Reply-To: <005501bfdb3b$123e7620$7cac1218@reston1.va.home.com> Message-ID: <112175.3170561926@[192.168.8.114]> --On Wednesday, June 21, 2000 12:42 AM -0400 tpassin@home.com wrote: > > Fred, we're not all going to suddenly drop 1.5.2 and rush to 1.6, I don't > think. Not with Unicode and all the other changes I gather are in there, > including the packaging system. Except that processing XML requires Unicode support. Technically, you can be legal if you parse UTF-8 and UTF-16 encodings, then throw away any characters outside of Latin-1 in later processing, but that is only useful in really specific applications. Anything vaguely general should handle Unicode. So, it makes sense to focus Python XML development on 1.6. And this doesn't address your argument (conservatism), but 1.6a2 has been very stable for us in the past month of development. We run on multiple platforms, test on multi-CPU machines, and have a multi-threaded search engine with big native modules. I don't think we've seen a Python bug. And yes, we did find some in 1.4 and 1.5. wunder -- Walter Underwood Senior Staff Engineer, Ultraseek Corp. http://www.ultraseek.com/ From paul@prescod.net Wed Jun 21 12:03:23 2000 From: paul@prescod.net (Paul Prescod) Date: Wed, 21 Jun 2000 06:03:23 -0500 Subject: [XML-SIG] DOM Extension Proposal Message-ID: <3950A0FB.D44DC52F@prescod.net> Insofar as the DOM does not address Python's syntax overloading, it does not say what we must do in our overloading. I propose that we extend the DOM with a new type AttributeList that is a subclass of NamedNodeMap: It would override __getitem__ to return the *value* of the reference attribute node instead of a (often useless and annoying) attribute node object. The behvaior of all DOM-defined methods (item, namedItem, namedItemNS) would be unchanged. The practical implication is that Python users could write code like this: url=img[src] rather than: url=img[src].value -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "Music is the stuff between the notes." - Claude Debussy From Mike.Olson@fourthought.com Wed Jun 21 18:57:57 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Wed, 21 Jun 2000 11:57:57 -0600 Subject: [XML-SIG] DOM Extension Proposal References: <3950A0FB.D44DC52F@prescod.net> Message-ID: <39510225.F1620210@FourThought.com> Paul Prescod wrote: > > Insofar as the DOM does not address Python's syntax overloading, it does > not say what we must do in our overloading. > > I propose that we extend the DOM with a new type AttributeList that is a > subclass of NamedNodeMap: > > It would override __getitem__ to return the *value* of the reference > attribute node instead of a (often useless and annoying) attribute node > object. We inherit NamedNodeMap from UserDict now so we are not too far off. However, we do return the Attribute node. I suppose we could override this to return just the value of the node. Anyone else's thoughts? Mike > -- > Paul Prescod - ISOGEN Consulting Engineer speaking for himself > "Music is the stuff between the notes." - Claude Debussy > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From paul@prescod.net Wed Jun 21 18:54:14 2000 From: paul@prescod.net (Paul Prescod) Date: Wed, 21 Jun 2000 12:54:14 -0500 Subject: [XML-SIG] Pulldom example Message-ID: <39510146.87ECD2C4@prescod.net> """This code shows how to use pulldom with a very simple, "hand-coded" dispatcher and a few helper functions to do an X->Y translation. It is deliberately coded in a manner that is not as intelligent as it could be because I want to emphasize that there is no rocket science. It's simple enough to use in a simple manner and can easily be ramped up to something more advanced with sophisticated dispatchers (which any sophisticated Python programmer could write in half an hour or so).""" import pulldom paper= \ """ From Markup To Object Model The XML Abstraction Problem and XML Property Objects PaulPrescod Consulting Engineer

ISOGEN/DataChannel 2200 North Lamar DallasTexasUSA75202 214 953 0004214 953 3152 paul@isogen.com www.isogen.com

Paul Prescod - Paul Prescod is a leading researcher and implementor of markup technologies. His formal education was in mathematics and computer science at the University of Waterloo. His research interests include formalisms for document modeling, queries and schemata. As a consulting engineer at ISOGEN, he helps organizations apply ISO and W3C standards to large-scale documentation problems. Mechanisms for building abstractions over XML documents tend to be more complex and less flexible than techniques available in domains such as relational databases and object models. This paper reviews several existing strategies and suggests a new one. XML Property Objects allow a flexible, user-defined mapping from complex XML attributed element tree structures to directed labeled graph structures.

Overview Software engineering is dominated by two tasks. The first is the design of algorithms (and necessary data structures) required to automate the solutions to particular problems. The second is the design of abstractions. Abstractions allow us to reuse software code and thus make software solutions that can grow and be maintained over time. In a world where only implementation and algorithms mattered, everything could be programmed in assembly language and every project would be approached as if from scratch. There would be no operating systems, no programming languages, no code libraries and no relational databases. A programmer's job would be analogous to that of a carpenter. The fact that a carpenter has hammered a nail a thousand times before does not remove the requirement to do it again. Reuse is at the level of ideas and skills, not implementation. To some extent, creators of tiny "embedded systems" live in this world. Thankfully, the rest of us can use the ever-expanding RAM in our computers to build abstractions on top of abstractions on top of abstractions: programs on top of programming languages on top of interpreters on top of other programming languages on top of operating systems. Each level can itself be decomposed into many abstractions. The popular UML diagramming standard exists precisely to help manage these levels of abstraction. ...

""" events=pulldom.parseString( paper ) def doit(): for token, node in events: if matchStart( "gcapaper", token, node ): print "" elif matchEnd( "gcapaper", token, node ): print "" elif matchTextIn( ("title", "gcapaper"), token, node ): print "%s" % node.data elif matchEnd( "author", token, node ): print "

By: %s %s" % ( firstname, lastname ) elif matchTextIn( "fname", token, node ): firstname=node.data +" " elif matchTextIn( "surname", token, node ): lastname=node.data elif matchTextIn( "jobtitle", token, node ): print "

Job Title: %s

" % node.data elif matchTextIn( "affil", token, node ): affil=node.data elif matchTextIn( "aline", token, node ): aline=node.data elif matchTextIn( "city", token, node ): city=node.data elif matchTextIn( "state", token, node ): state=node.data elif matchTextIn( "cntry", token, node ): cntry=node.data elif matchTextIn( "postcode", token, node ): postcode=node.data elif matchTextIn( "phone", token, node ): phone=node.data elif matchTextIn( "fax", token, node ): fax=node.data elif matchTextIn( "email", token, node ): email=node.data elif matchTextIn( "web", token, node ): web=node.data elif matchEnd( "address", token, node ): print "

" print "

"%affil print "

"%aline print "

%s, %s

"% (city, state) print "

"%postcode print "

Phone: %s

"%phone print "

Fax: %s

"%fax print "

Email: %s

"%email print "

Web: %s

"%web print "

" elif matchStart( "para", token, node ): print "

" elif matchEnd( "para", token, node ): print "

" elif matchStart( "highlight", token, node ): print "" elif matchEnd( "highlight", token, node ): print "" elif matchStart( "bio", token, node ): print "

" elif matchEnd( "bio", token, node ): print "

" elif matchStart( "abstract", token, node ): print "

" elif matchEnd( "abstract", token, node ): print "

" # I could have counted on the way down, # but I want to show code that walks up the tree elif matchTextIn( ("title", "section" ), token, node ): level=0 titleNode=node.parentNode sectionNode=node.parentNode while sectionNode.tagName=="section": level=level+1 sectionNode=sectionNode.parentNode outtag="h"+`level` print "<%s>%s" % (outtag, node.data, outtag ) elif token==pulldom.CHARACTERS: if( node.data.strip()): print node.data else: pass # a few simple helper functions def matchStart( tagName, token, node ): return token==pulldom.START_ELEMENT and node.tagName==tagName def matchEnd( tagName, token, node ): return token==pulldom.END_ELEMENT and node.tagName==tagName def matchTextIn( tagName, token, node ): if type( tagName )==type( "" ): return token==pulldom.CHARACTERS and node.parentNode.tagName==tagName elif type( tagName ) == type((1,)): return token==pulldom.CHARACTERS and \ matchTagContext( tagName, node.parentNode ) def matchTagContext( tagNames, node ): this,rest=tagNames[0], tagNames[1:] imatch = node.tagName==this if not rest: return imatch else: return imatch and matchTagContext( rest, node.parentNode ) doit() -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "Music is the stuff between the notes." - Claude Debussy From paul@prescod.net Wed Jun 21 16:40:29 2000 From: paul@prescod.net (Paul Prescod) Date: Wed, 21 Jun 2000 10:40:29 -0500 Subject: [XML-SIG] Python 1.6 XML APIs References: <394F6F86.CF603637@prescod.net> <003e01bfdb3a$24029b80$7cac1218@reston1.va.home.com> <14672.22938.254950.324465@cj42289-a.reston1.va.home.com> Message-ID: <3950E1ED.1E3052F3@prescod.net> "Fred L. Drake, Jr." wrote: > > My biggest concern lies in the potential bugginess of the > implementation; I expect Paul knows more about XML & actually applying > it than most of us (though possibly not all), and has sufficient > experience that he can craft a solid API. The DOM stuff is almost entirely "by the book." The "pulldom" stuff is only about three methods: 1. Parse my file please 2. Give me a node please 3. Fill in this node's children and descendants please. It's simpler than XMLLib, SAX or anything else I've ever seen. No magical method names, no special base classes, nothing. Alot of the credit goes to /Fredrik. His trick about turning push parsers into pull parsers allowed me to write the last two methods. > If the code he's talking > about is stuff that he's been using a while, I expect its well tested > in practice. Well, Fred, I like you too much to lie to you. Adding namespace support required a lot of fiddly changes here and there that I have not tested. On the other hand, I promise, promise, promise to test the hell out of it over the next few days and especially once the beta comes out. I'll post some sample programs here so that everyone can get a feeling for it. My last day at my current job is Friday and I chose that date in part to have time to do this testing and documenting. I'm pushing this because for the first time I think I've got an API that I can teach a person new to Python in 10 minutes -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "Music is the stuff between the notes." - Claude Debussy From paul@prescod.net Wed Jun 21 16:40:29 2000 From: paul@prescod.net (Paul Prescod) Date: Wed, 21 Jun 2000 10:40:29 -0500 Subject: [XML-SIG] Python 1.6 XML APIs References: <394F6F86.CF603637@prescod.net> <003e01bfdb3a$24029b80$7cac1218@reston1.va.home.com> <14672.22938.254950.324465@cj42289-a.reston1.va.home.com> Message-ID: <3950E1ED.1E3052F3@prescod.net> "Fred L. Drake, Jr." wrote: > > My biggest concern lies in the potential bugginess of the > implementation; I expect Paul knows more about XML & actually applying > it than most of us (though possibly not all), and has sufficient > experience that he can craft a solid API. The DOM stuff is almost entirely "by the book." The "pulldom" stuff is only about three methods: 1. Parse my file please 2. Give me a node please 3. Fill in this node's children and descendants please. It's simpler than XMLLib, SAX or anything else I've ever seen. No magical method names, no special base classes, nothing. Alot of the credit goes to /Fredrik. His trick about turning push parsers into pull parsers allowed me to write the last two methods. > If the code he's talking > about is stuff that he's been using a while, I expect its well tested > in practice. Well, Fred, I like you too much to lie to you. Adding namespace support required a lot of fiddly changes here and there that I have not tested. On the other hand, I promise, promise, promise to test the hell out of it over the next few days and especially once the beta comes out. I'll post some sample programs here so that everyone can get a feeling for it. My last day at my current job is Friday and I chose that date in part to have time to do this testing and documenting. I'm pushing this because for the first time I think I've got an API that I can teach a person new to Python in 10 minutes -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "Music is the stuff between the notes." - Claude Debussy From paul@prescod.net Wed Jun 21 21:55:55 2000 From: paul@prescod.net (Paul Prescod) Date: Wed, 21 Jun 2000 15:55:55 -0500 Subject: [XML-SIG] Pulldom example References: <39510146.87ECD2C4@prescod.net> Message-ID: <39512BDB.5B06FD81@prescod.net> We need feedback on pulldom as soon as possible. Please don't hesitate to tell me you don't like it. Just do keep in mind that it is intended as a simple, convenient replacement for xmllib, not for something sophisticated like XSLT nor ultra-efficient like SAX. Please take a look at the example code and give an opinion....negative ones are fine. If you think SAX or xmllib is easier or otherwise better, you should say so now... -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Floggings will continue until morale improves. From paul@prescod.net Wed Jun 21 22:29:33 2000 From: paul@prescod.net (Paul Prescod) Date: Wed, 21 Jun 2000 16:29:33 -0500 Subject: [XML-SIG] Python 1.6 XML APIs References: <20000621082151.0A2DE37186A@snelboot.oratrix.nl> Message-ID: <395133BD.4EC207E2@prescod.net> Well the first priority would be the four proposed for Python 1.6. PyExpat is pretty obvious and it is supposed to be used through a SAX interface so I won't worry about it. The SAX interface needs a name and drv_pyexpat isn't going to cut it. I propose just "saxparser". Today it uses Pyexpat under the covers. Maybe a year from now it would user Xerces. minidom is a miniature DOM: I can't get much better than that. pulldom could be pulldom or eventdom or streamdom or streamobjects or xmlobjects or xmlstreams or streamxml or xmltokens or tokenstreams or tokenxml ... And then there is "saxlib", a sax library. All in all the only name I am not 100% happy with is pulldom (and probably drv_pyexpat). -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Floggings will continue until morale improves. From paul@prescod.net Thu Jun 22 02:07:47 2000 From: paul@prescod.net (Paul Prescod) Date: Wed, 21 Jun 2000 20:07:47 -0500 Subject: [XML-SIG] Pulldom opinions? Message-ID: <395166E3.D765E39@prescod.net> I sent an email asking for pulldom opinions but I didn't see it come back. This is actually a cleverly disguised test message. If I don't see it come through I'll know there is a problem with the xml-sig. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Floggings will continue until morale improves. From tpassin@home.com Thu Jun 22 02:24:42 2000 From: tpassin@home.com (tpassin@home.com) Date: Wed, 21 Jun 2000 21:24:42 -0400 Subject: [XML-SIG] Extension modules in PyXML CVS References: <112175.3170561926@[192.168.8.114]> Message-ID: <002901bfdbe8$a805a0e0$7cac1218@reston1.va.home.com> Walter Underwood replied to my post - > > > > Fred, we're not all going to suddenly drop 1.5.2 and rush to 1.6, I don't > > think. Not with Unicode and all the other changes I gather are in there, > > including the packaging system. > > Except that processing XML requires Unicode support. Technically, you > can be legal if you parse UTF-8 and UTF-16 encodings, then throw away > any characters outside of Latin-1 in later processing, but that is only > useful in really specific applications. Anything vaguely general should > handle Unicode. > > So, it makes sense to focus Python XML development on 1.6. > Yes, of course we want to be able to use unicode. And yes, I did express conservatism. It's just that I'm not too clear on what code is going to keep working and what is not because of unicode and other changes. So I don't want to switch everything over until I get a chance to try it all. On the other side of this matter, I decided not to try anything with COM and the Windows additions until 1.6 is out and the Windows stuff is working again. > And this doesn't address your argument (conservatism), but 1.6a2 has > been very stable for us in the past month of development. We run on > multiple platforms, test on multi-CPU machines, and have a multi-threaded > search engine with big native modules. I don't think we've seen a Python > bug. And yes, we did find some in 1.4 and 1.5. > Stability - that's good to hear. My thanks to everyone who is working so hard on this! Regards, Tom Passin From larsga@garshol.priv.no Thu Jun 22 09:23:11 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 22 Jun 2000 10:23:11 +0200 Subject: [XML-SIG] DOM Extension Proposal In-Reply-To: <39510225.F1620210@FourThought.com> References: <3950A0FB.D44DC52F@prescod.net> <39510225.F1620210@FourThought.com> Message-ID: * Mike Olson | | We inherit NamedNodeMap from UserDict now so we are not too far off. | However, we do return the Attribute node. I suppose we could override | this to return just the value of the node. | | Anyone else's thoughts? I have been wondering why we even have Attribute nodes in the DOM tree at all. They are mainly useful for representing entity references in attribute values, something that is very rarely useful. :-) So I think there would be definite performance benefits (in terms of both speed and memory use) in keeping a dictionary of names -> string values instead of names -> nodes. Attribute nodes could be lazily instantiated when someone calls getAttributeNode. If we want to support entity references inside attributes we can do this by using lists in the dictionary for those cases instead of strings. Most likely this would be used in much less than 1% of the cases and I wouldn't complain if we decided not to support this stuff at all. --Lars M. From gstein@lyra.org Thu Jun 22 09:21:55 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 22 Jun 2000 01:21:55 -0700 Subject: [XML-SIG] PyExpat changes In-Reply-To: <14671.27475.885443.148910@cj42289-a.reston1.va.home.com>; from fdrake@beopen.com on Tue, Jun 20, 2000 at 09:02:11AM -0400 References: <394F53F0.96141C51@prescod.net> <14671.27475.885443.148910@cj42289-a.reston1.va.home.com> Message-ID: <20000622012155.A29590@lyra.org> On Tue, Jun 20, 2000 at 09:02:11AM -0400, Fred L. Drake, Jr. wrote: > > Paul Prescod writes: > > Yes, this means that we need to finish SAX 2 for Python 1.6. I'm not > > clear on where we are with that but I should have some time to help > > really soon. > > pyexpat is in the core distribution, but not built by default for > all the usual reasons. I think everything we need to do for the > module has been done (Andrew?) -- the remaining "issues" have > everything to do with expat build/configuration/API issues and not the > Python bindings. IOW, pyexpat should be fully documented. It would also be great if it had a "Pythonic" API, much as AMK suggested. +1 on renaming those attributes. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Thu Jun 22 14:24:25 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 22 Jun 2000 06:24:25 -0700 Subject: [XML-SIG] Python 1.6 XML APIs In-Reply-To: <14672.22938.254950.324465@cj42289-a.reston1.va.home.com>; from fdrake@beopen.com on Wed, Jun 21, 2000 at 01:58:50AM -0400 References: <394F6F86.CF603637@prescod.net> <003e01bfdb3a$24029b80$7cac1218@reston1.va.home.com> <14672.22938.254950.324465@cj42289-a.reston1.va.home.com> Message-ID: <20000622062425.L29590@lyra.org> On Wed, Jun 21, 2000 at 01:58:50AM -0400, Fred L. Drake, Jr. wrote: > > tpassin@home.com writes: > > I'm basically in favor both of his proposals and his reasoning behnd them. > > They sound like the kind of thing I'd want to use myself. But I'm a little > > concerned that this might be rushing a bit too much, to get two pretty new > > systems into a new distribution in such a short time. Especially the new > > API. How could they get enough exposure and testing in time? > > My biggest concern lies in the potential bugginess of the > implementation; I expect Paul knows more about XML & actually applying > it than most of us (though possibly not all), and has sufficient > experience that he can craft a solid API. If the code he's talking > about is stuff that he's been using a while, I expect its well tested > in practice. > On the other hand, getting anything new documented in time will be > difficult! ;) qp_xml is solid (it is used by my davlib module and a number of other people have been using it for stuff), but there are some outstanding optimizations on deck (from Bjorn Pettersen). I'm firefighting in Apache-land right now :-), so I haven't applied and tested the optimizations yet. Bjorn used a StringIO object to speed up the text concatenations. I want to try a similar trick with string.join and compare the two. I like the API and consider it solid/done, but there have been a number of suggestions for changes. Particularly a number of them from Laurent Szyster. It would behoove us to consider some of the low-hanging fruit (but only those which don't alter the basic character/purpose of the module/API!) before inclusion into Python. There is no doc yet (and I have an item in the xml package's TODO). While people's interest in the module is great and flattering, and I'd like to see it in Python... I'm also not too sure on whether this is the right time. On the opposing side: it is a pretty brain-dead simple module. Not much can go wrong, and any "extra" stuff can just build on top of it. [ Paul does give a great breakdown on the "four quadrants" and how qp_xml fits in there... good stuff ] Oh: and I might suggest a name change before inclusion into Python proper :-) Regarding the larger question: punt on including SAX in 1.6. Why? The only parser is pyexpat (presuming xmllib is deprecated). The utility of SAX is therefore quite minor... what else will you swap in there? Point people at the XML distro if they want it. Cheers, -g -- Greg Stein, http://www.lyra.org/ From paul@prescod.net Thu Jun 22 15:29:52 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 22 Jun 2000 09:29:52 -0500 Subject: [XML-SIG] PyExpat changes References: <394F53F0.96141C51@prescod.net> <14671.27475.885443.148910@cj42289-a.reston1.va.home.com> <20000622012155.A29590@lyra.org> Message-ID: <395222E0.D624A97E@prescod.net> Greg Stein wrote: > > ... > > I think everything we need to do for the > > module has been done (Andrew?) -- the remaining "issues" have > > everything to do with expat build/configuration/API issues and not the > > Python bindings. > > IOW, pyexpat should be fully documented. I don't see your point. I especially don't see the benefit in adding yet another XML API to the documentation. > It would also be great if it had a "Pythonic" API, much as AMK suggested. > > +1 on renaming those attributes. If we are going to rename attributes and break code, we might as well use the SAX names and conventions. There is no performance cost. The only reason I didn't do this from the start was because I didn't want to break code. In particular, I didn't know how much code you had based on the old API because I think that you (and maybe the Zope guys) are the primary user(s). A big virtue of SAX (beyond being performant) is that it is forward-compatible with namespaces stuff (although I'm not predicting that we will use Expat's namespace support today...it is probably better to do namespaces in Python). -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Floggings will continue until morale improves. From paul@prescod.net Thu Jun 22 15:40:28 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 22 Jun 2000 09:40:28 -0500 Subject: [XML-SIG] Python 1.6 XML APIs References: <394F6F86.CF603637@prescod.net> <003e01bfdb3a$24029b80$7cac1218@reston1.va.home.com> <14672.22938.254950.324465@cj42289-a.reston1.va.home.com> <20000622062425.L29590@lyra.org> Message-ID: <3952255C.A780AB8A@prescod.net> Greg Stein wrote: > > ... > > qp_xml is solid (it is used by my davlib module and a number of other people > have been using it for stuff), but there are some outstanding optimizations > on deck (from Bjorn Pettersen). I'm firefighting in Apache-land right now > :-), so I haven't applied and tested the optimizations yet. My main issue is the cascading dependencies: PyExpat name changes, namespace handling should be moved into the core SAX library, we should standardize the parse APIs and so forth. > Oh: and I might suggest a name change before inclusion into Python proper :-) Agreed. Unless it stood for "quick Python objects" or "quick primitive objects". > Regarding the larger question: punt on including SAX in 1.6. Why? The only > parser is pyexpat (presuming xmllib is deprecated). The utility of SAX is > therefore quite minor... what else will you swap in there? Point people at > the XML distro if they want it. What you'll swap in is a validating parser (*from* the XML distro), or the Python 1.7 or 1.8 parser, which might be Xerces, or a more namespace-friendly version of Expat. Namespace-forward compatibility is an important issue. There is basically no cost to supporting SAX other than breaking existing PyExpat code which you seem not to be worried about. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Floggings will continue until morale improves. From paul@prescod.net Thu Jun 22 16:02:45 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 22 Jun 2000 10:02:45 -0500 Subject: [XML-SIG] DOM Extension Proposal References: <3950A0FB.D44DC52F@prescod.net> <39510225.F1620210@FourThought.com> Message-ID: <39522A95.F6D392C2@prescod.net> Lars Marius Garshol wrote: > > ... > > I have been wondering why we even have Attribute nodes in the DOM tree > at all. They are mainly useful for representing entity references in > attribute values, something that is very rarely useful. :-) Attributes can also be independently addressed as objects in XPath, XSLT, Schematron, etc. > So I think there would be definite performance benefits (in terms of > both speed and memory use) in keeping a dictionary of names -> string > values instead of names -> nodes. I did that in older versions of minidom. Then I implemented namespaces and it started to get really hairy. You need to be able to look things up by tagname and localname/URI prefix. Then you need to be able to get from tagnames to localname/URI and back in case you need to delete or update an attribute. The right hand side become some kind of a tuple that you need to split and you are almost back to objects again...ugh. In general, namespaces make XML a lot harder to work with. A lot harder. Really. So I eventually backed it out on the argument that getting it right but inefficient for Python 1.6 was more important than being tricky but efficient. The idea is still good but it requires more thought. > Attribute nodes could be lazily instantiated when someone calls > getAttributeNode. If we want to support entity references inside > attributes we can do this by using lists in the dictionary for those > cases instead of strings. Most likely this would be used in much less > than 1% of the cases and I wouldn't complain if we decided not to > support this stuff at all. Few of the parsers generate entity reference events in attributes...we certainly don't support this now. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Floggings will continue until morale improves. From wunder@ultraseek.com Thu Jun 22 17:37:56 2000 From: wunder@ultraseek.com (Walter Underwood) Date: Thu, 22 Jun 2000 09:37:56 -0700 Subject: [XML-SIG] speed question re DOM parsing In-Reply-To: <3935D13D.F4EAD64B@roguewave.com> Message-ID: <3888969936.961666676@serrano.infoseek.com> I know this is a reply to a really old post, but the talk about speedup reminded me that I hadn't answered it. This is worth posting generally, because it is applicable to lots of SAX handlers (add it to the documentation?). --On Wednesday, May 31, 2000 8:58 PM -0600 Bjorn Pettersen wrote: > > After some profiling, I found that most of the time was going into the > else branch in the cdata method. This branch is growing a string > character by character by saying: > > elem.first_cdata = elem.first_cdata + data I had one of those in my character data handler too. Parsing the Old Testament took about 45 min, as I remember. The copies and reallocs in concatenation are O(n**2). Save all the strings in a list, then use string.join at the end. This is linear. Here are the relevant fragments of the class with the handlers: class XMLToText: def __init__(self): self.text = [] def cdata(self, data): self.text.append(data) def finish(self): self.text = string.join(self.text,u'') Note the Unicode string constant -- remove that for Python 1.5, and add code to handle the UTF-8, if necessary. wunder -- Walter R. Underwood Senior Staff Engineer, Ultraseek Corp. http://www.ultraseek.com/ From gvwilson@nevex.com Thu Jun 22 18:31:49 2000 From: gvwilson@nevex.com (Greg Wilson) Date: Thu, 22 Jun 2000 13:31:49 -0400 (EDT) Subject: [XML-SIG] re: events for DTDs? Message-ID: I'm preparing material based on SAX-1 to use in a course I'll be teaching later this summer. The first exercise is to build something that echoes the contents of an XML file, and in order to do this correctly, I need to be able to intercept the first couple of lines in my input file: I can't find any mention in the docs or tutorial of a method in HandlerBase, its parents, or the parser classes that'll let me capture this information. I figure I must have overlooked something, and I'd be grateful for pointers to the answer. Thanks, Greg p.s. please reply directly to 'gvwilson@nevex.com', as I'm not a regular reader of this group (yet). From paul@prescod.net Thu Jun 22 19:13:42 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 22 Jun 2000 13:13:42 -0500 Subject: [XML-SIG] re: events for DTDs? References: Message-ID: <39525756.C2CBB97F@prescod.net> Greg Wilson wrote: > > I'm preparing material based on SAX-1 to use in a course I'll be teaching > later this summer. SAX 1 doesn't have support for that stuff. SAX 2, yes. > The first exercise is to build something that echoes > the contents of an XML file, and in order to do this correctly, Believe it or not, exactly echoing the contents of any old XML file is one of the hardest things to do with XML parsers. Part of an XML parser's job is to *simplify* the data and only present you with the parts that are "important" according to the parser programmer's definition of important. Even SAX 2 will not be quite sufficient. If all you want to echo is the basic logical structure, you are good to go, but if you want the whole kit and kaboodle, you'll have to use a proprietary API (actually, the open groves API can do it for you, but I think that the only sufficiently rich implementation (for Python or any language) is commercial!). -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "So there you have it folks, the Millenium: warring and whoring but never boring." - Dennis Miller From paul@prescod.net Thu Jun 22 20:00:15 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 22 Jun 2000 14:00:15 -0500 Subject: [XML-SIG] Pulldom code avail Message-ID: <3952623F.BBD52885@prescod.net> http://www.prescod.net/python/pulldom.html -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "So there you have it folks, the Millenium: warring and whoring but never boring." - Dennis Miller From ken@bitsko.slc.ut.us Thu Jun 22 22:50:15 2000 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 22 Jun 2000 16:50:15 -0500 Subject: [XML-SIG] Pulldom example In-Reply-To: Paul Prescod's message of "Wed, 21 Jun 2000 15:55:55 -0500" References: <39510146.87ECD2C4@prescod.net> <39512BDB.5B06FD81@prescod.net> Message-ID: Paul Prescod writes: > We need feedback on pulldom as soon as possible. Please don't hesitate > to tell me you don't like it. Just do keep in mind that it is intended > as a simple, convenient replacement for xmllib, not for something > sophisticated like XSLT nor ultra-efficient like SAX. > > Please take a look at the example code and give an opinion....negative > ones are fine. If you think SAX or xmllib is easier or otherwise better, > you should say so now... If I read right in another article, pulldom is based on SAX (possibly the pull-parser modified version of SAX -- not relevant to my question tho): is pulldom just a SAX handler or is it something else? -- Ken From paul@prescod.net Thu Jun 22 23:01:46 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 22 Jun 2000 17:01:46 -0500 Subject: [XML-SIG] Pulldom example References: <39510146.87ECD2C4@prescod.net> <39512BDB.5B06FD81@prescod.net> Message-ID: <39528CCA.7EA7BC2B@prescod.net> Ken MacLeod wrote: > > ... > > If I read right in another article, pulldom is based on SAX (possibly > the pull-parser modified version of SAX -- not relevant to my question > tho): is pulldom just a SAX handler or is it something else? Pulldom is a SAX handler but it depends on an incremental parser which is, strictly speaking, an extension to SAX. The client does not see Pulldom as either a SAX parser or filter. It isn't a SAX parser because you don't have to register a handler or anything. I think you Perl guys have always used pull-parsers, probably for much the same reason's that I decided to do pulldom. I've put up preliminary info and code here: http://www.prescod.net/python/pulldom.html There is no rocket science involved in either the API nor the implementation. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Out of timber so crooked as that which man is made nothing entirely straight can be built. - Immanuel Kant From gstein@lyra.org Fri Jun 23 03:41:53 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 22 Jun 2000 19:41:53 -0700 Subject: [XML-SIG] speed question re DOM parsing In-Reply-To: <3888969936.961666676@serrano.infoseek.com>; from wunder@ultraseek.com on Thu, Jun 22, 2000 at 09:37:56AM -0700 References: <3935D13D.F4EAD64B@roguewave.com> <3888969936.961666676@serrano.infoseek.com> Message-ID: <20000622194153.C29590@lyra.org> On Thu, Jun 22, 2000 at 09:37:56AM -0700, Walter Underwood wrote: >... > --On Wednesday, May 31, 2000 8:58 PM -0600 Bjorn Pettersen > wrote: > > > > After some profiling, I found that most of the time was going into the > > else branch in the cdata method. This branch is growing a string > > character by character by saying: > > > > elem.first_cdata = elem.first_cdata + data > > I had one of those in my character data handler too. Parsing the > Old Testament took about 45 min, as I remember. The copies and > reallocs in concatenation are O(n**2). Save all the strings in > a list, then use string.join at the end. This is linear. Exactly. Bjorn solved this with StringIO. A timing comparison against string.join is an important test before using either approach. I haven't had the time (unfortunately) to test these out myself. But that doesn't preclude somebody from running the two tests, listing the values, and providing the right patch. Another XML committer might be able to get to it before me. When could I get to it? eek. I *will*, but dunno when. It is amazing just how much stuff can fall on a person's plate despite having no job :-). I've got some layered I/O in Apache, mod_dav integration, a new httplib, imputil issues, these qp_xml upgrades, ViewCVS stuff, edna releases, free threading changes, Python/Apache integration, and coding for Subversion. Fuggin frightening. Cheers, -g -- Greg Stein, http://www.lyra.org/ From Juergen Hermann" Message-ID: <200006231012.MAA02846@statistik.cinetic.de> On Thu, 22 Jun 2000 19:41:53 -0700, Greg Stein wrote: >Exactly. Bjorn solved this with StringIO. A timing comparison against >string.join is an important test before using either approach. The two runs I gave it (on Win/NT)... Length of testtext is 1292 adding 39.687 format 189.71 join 47.034 chararray 67.323 stringio 33.011 Length of testtext is 1292 adding 40.573 format 191.327 join 47.09 chararray 65.256 stringio 32.65 The result is obvious, and also what I expected. ---%<---------------------------------- # Timings on char-wise string growing import time, string, sys, cStringIO, array testtext = open(sys.argv[0], "rt").read() def timing(f, n, a): print "%20s" % (f.__name__,), r = range(n) t1 = time.clock() for i in r: f(a); f(a); f(a); f(a); f(a); f(a); f(a); f(a); f(a); f(a) t2 = time.clock() print "\t", round(t2-t1, 3) def adding(x): result = "" for ch in testtext: result = result + ch #print "adding(): len =", len(result) def format(x): result = "" for ch in testtext: result = "%s%s" % (result, ch) #print "format(): len =", len(result) def join(x): chars = [] for ch in testtext: chars.append(ch) result = string.join(chars, '') #print "format(): len =", len(result) def chararray(x): chars = array.array("c") for ch in testtext: chars.append(ch) result = chars.tostring() #print "format(): len =", len(result) def stringio(x): chars = cStringIO.StringIO() for ch in testtext: chars.write(ch) result = chars.getvalue() #print "stringio(): len =", len() print "Length of testtext is", len(testtext) n=1000 timing(adding, n, None) timing(format, n, None) timing(join, n, None) timing(chararray, n, None) timing(stringio, n, None) From Juergen Hermann" Message-ID: <200006231018.MAA02895@statistik.cinetic.de> On Fri, 23 Jun 2000 12:12:08 +0200, Juergen Hermann wrote: >The result is obvious, and also what I expected. Only I forgot about name lookup rules, so... Length of testtext is 967 adding 27.935 stringio 24.717 stringio2 19.703 Length of testtext is 967 adding 27.583 stringio 24.43 stringio2 19.544 ---%<---------------------------------- # Timings on char-wise string growing import time, string, sys, cStringIO, array testtext = open(sys.argv[0], "rt").read() def timing(f, n, a): print "%20s" % (f.__name__,), r = range(n) t1 = time.clock() for i in r: f(a); f(a); f(a); f(a); f(a); f(a); f(a); f(a); f(a); f(a) t2 = time.clock() print "\t", round(t2-t1, 3) def adding(x): result = "" for ch in testtext: result = result + ch #print "adding(): len =", len(result) def stringio(x): chars = cStringIO.StringIO() for ch in testtext: chars.write(ch) result = chars.getvalue() #print "stringio(): len =", len() def stringio2(x): chars = cStringIO.StringIO() push = chars.write for ch in testtext: push(ch) result = chars.getvalue() #print "stringio(): len =", len() print "Length of testtext is", len(testtext) n=1000 timing(adding, n, None) timing(stringio, n, None) timing(stringio2, n, None) From gstein@lyra.org Fri Jun 23 12:14:11 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 23 Jun 2000 04:14:11 -0700 Subject: [XML-SIG] speed question re DOM parsing In-Reply-To: <200006231012.MAA02846@statistik.cinetic.de>; from jhe@webde-ag.de on Fri, Jun 23, 2000 at 12:12:08PM +0200 References: <20000622194153.C29590@lyra.org> <200006231012.MAA02846@statistik.cinetic.de> Message-ID: <20000623041410.I29590@lyra.org> On Fri, Jun 23, 2000 at 12:12:08PM +0200, Juergen Hermann wrote: > On Thu, 22 Jun 2000 19:41:53 -0700, Greg Stein wrote: > > >Exactly. Bjorn solved this with StringIO. A timing comparison against > >string.join is an important test before using either approach. > > The two runs I gave it (on Win/NT)... > > Length of testtext is 1292 > adding 39.687 > format 189.71 > join 47.034 > chararray 67.323 > stringio 33.011 > > Length of testtext is 1292 > adding 40.573 > format 191.327 > join 47.09 > chararray 65.256 > stringio 32.65 > > The result is obvious, and also what I expected. well... not so obvious. You're appending characters. I commented out all but the join and stringio tests, cut the iterations down some, and changed testtext to read: testtext = ['x'*1000] * 100 That produced the following numbers: join 3.42 stringio 4.67 Changing testtext to "testtext = ['x'*100] * 1000" produced: join 12.52 stringio 10.35 In other words, the fastest mechanism depends on the length of the input pieces. The balance seems to occur right around 500 characters in my off-the-cuff tests. I think that I'd choose cStringIO when present; otherwise choose .join(). Unfortunately, the code would get ugly for that, so it really means going with one pattern. Assuming that cStringIO is always present is probably best (it is enabled by default). The plain StringIO package uses .join, so that is a nice fallback. oh... and regarding the patch: adding a __getattr__ to the element seems wrong. I'd recommend instantiating a StringIO in start() and placing it into the elem instance as _buf. On a call to end(), do a getvalue(), store the value into first_cdata, and toss the object. (have to toss since there isn't a common way to "reset and truncate" a StringIO) Cheers, -g -- Greg Stein, http://www.lyra.org/ From walter@livinglogic.de Fri Jun 23 12:49:46 2000 From: walter@livinglogic.de (Walter Doerwald) Date: Fri, 23 Jun 2000 13:49:46 +0200 Subject: [XML-SIG] DOM Extension Proposal In-Reply-To: <39522A95.F6D392C2@prescod.net> References: <3950A0FB.D44DC52F@prescod.net> <39510225.F1620210@FourThought.com> Message-ID: <4.3.1.0.20000623134115.00b1c3c0@mail.tmt.de> At 17:02 22.06.00, Paul Prescod wrote: >[...] > > Attribute nodes could be lazily instantiated when someone calls > > getAttributeNode. If we want to support entity references inside > > attributes we can do this by using lists in the dictionary for those > > cases instead of strings. Most likely this would be used in much less > > than 1% of the cases and I wouldn't complain if we decided not to > > support this stuff at all. > >Few of the parsers generate entity reference events in attributes...we >certainly don't support this now. But this could be easily built on top of a parser that delivers the attribute value as a plain string, simply call the parser again with the attribute value string. The problem is keeping the correct namespace context. Bye, Walter D=F6rwald -- Walter D=F6rwald =B7 LivingLogic AG =B7 Bayreuth, Germany =B7 www.livinglogi= c.de From Juergen Hermann" Message-ID: <200006231150.NAA03720@statistik.cinetic.de> On Fri, 23 Jun 2000 04:14:11 -0700, Greg Stein wrote: >well... not so obvious. You're appending characters. Well, Bjorn's original message contained the words "growing a string cha= racter by character", so I assumed... :) Given this input n=3D100 testtext =3D ['x'*1000] * 100 testtext =3D ['x'*100] * 1000 testtext =3D ['x'*50] * 2000 I get Length of testtext is 100 join 2.573 stringio2 3.969 Length of testtext is 1000 join 5.729 stringio2 6.043 Length of testtext is 2000 join 9.137 stringio2 7.823 So on my machine, the break-even is later than on yours. Ciao, J=FCrgen -- J=FCrgen Hermann (jhe@webde-ag.de) WEB.DE AG, Amalienbadstr.41, D-76227 Karlsruhe Tel.: 0721/94329-0, Fax: 0721/94329-22 From walter@livinglogic.de Fri Jun 23 13:00:26 2000 From: walter@livinglogic.de (Walter Doerwald) Date: Fri, 23 Jun 2000 14:00:26 +0200 Subject: [XML-SIG] Bug in sgmlop? (_ in names) Message-ID: <4.3.1.0.20000623135801.00b0ae10@mail.tmt.de> Hello all! I think I found a bug in sgmlop (from PyXML 0.5.5.1). It doesn't recognize _ in element names. The following code: import sgmlop class Handler: def finish_starttag(self,name,attrs): print name p =3D sgmlop.SGMLParser() p.register(Handler()) p.parse("") only prints "foo". Bye, Walter D=F6rwald -- Walter D=F6rwald =B7 LivingLogic AG =B7 Bayreuth, Germany =B7 www.livinglogi= c.de From GSMiros@netscape.net Fri Jun 23 15:39:31 2000 From: GSMiros@netscape.net (Rosalie Dieteman) Date: 23 Jun 00 07:39:31 PDT Subject: [XML-SIG] Paul Prescod's pulldom Message-ID: <20000623143931.14879.qmail@www0l.netaddress.usa.net> I'm very new to XML, so I think I'd be a perfect test case for Paul's ide= a of pulldom being a fast and easy way to teach someone XML. I volunteer! My application: Writing a form and filling it with the values from an XM= L file. = My reasons for using XML: Hierarchical, human readable. I intend to kee= p the datafile size small, but have a multitude of them (which means I'll proba= bly have a HUGE data file which contains all their file names!). A couple ideas, probably half baked: I was thinking of using attributes i= n the XML file to indicate the fields/values which the users should not change = and elements for those that will provide text fields or other controls. I'd = also like to use a schema file to indicate what sort of client-side validation= (probably JScript) to perform, such as all characters numeric, certain formats, etc. Do you think I could just parse the schema file as if it w= ere an XML file to get the validation rules? I'd half decided to use PHP for ease of use, but from Paul's example, pul= ldom looks approximately as easy to use. ____________________________________________________________________ Get your own FREE, personal Netscape WebMail account today at http://webm= ail.netscape.com. From faassen@vet.uu.nl Fri Jun 23 16:09:13 2000 From: faassen@vet.uu.nl (Martijn Faassen) Date: Fri, 23 Jun 2000 17:09:13 +0200 Subject: [XML-SIG] repost: getElementsByTagName interpretation Message-ID: <20000623170913.A26630@vet.uu.nl> [I tried posting this message yesterday but somehow it didn't appear on the list -- did anyone receive it? Here I'll try again.] Hi there, Is there any official Python DOM policy on implementing getElementsByTagName()? According to the official DOM spec (and the level 2 proposed recommendation), getElementsByTagName() is supposed to return a live NodeList. Studying the 4DOM however, it looks like the NodeList returned by getElementsByTagName is *not* live. Looking at the archives I can only find a message by Andrew Kuchling in october '98 complaining about the DOM spec in this regard. I hearthily agree with him. Has since then any decision been reached on how to approach this in Python DOMs? I'd like to know what I ought to do. On the one hand I desperately want to implement the simple non-live behavior, as doing the other is a major pain. On the other hand, I'd risk introducing an incompatibility with other DOMs. Oh, by the way, I'm implementing a DOM on top of MetaKit, for no real particular purpose. Regards, Martijn From paul@prescod.net Fri Jun 23 16:28:21 2000 From: paul@prescod.net (Paul Prescod) Date: Fri, 23 Jun 2000 10:28:21 -0500 Subject: [XML-SIG] repost: getElementsByTagName interpretation References: <20000623170913.A26630@vet.uu.nl> Message-ID: <39538215.D9E32589@prescod.net> Minidom does not do live nodelists. Are you implementing a DOM object document storage for MetaKit or a mapping from MetaKit's "object model" to the DOM. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Out of timber so crooked as that which man is made nothing entirely straight can be built. - Immanuel Kant From akuchlin@mems-exchange.org Fri Jun 23 16:30:04 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Fri, 23 Jun 2000 11:30:04 -0400 Subject: [XML-SIG] repost: getElementsByTagName interpretation In-Reply-To: <20000623170913.A26630@vet.uu.nl>; from faassen@vet.uu.nl on Fri, Jun 23, 2000 at 05:09:13PM +0200 References: <20000623170913.A26630@vet.uu.nl> Message-ID: <20000623113004.A4805@amarok.cnri.reston.va.us> On Fri, Jun 23, 2000 at 05:09:13PM +0200, Martijn Faassen wrote: >On the one hand I desperately want to implement the simple non-live behavior, >as doing the other is a major pain. On the other hand, I'd risk introducing >an incompatibility with other DOMs. I'd be interested in knowing if there's any DOM that *does* implement the live NodeList behaviour for getElementsByTagName(); when I looked at IBM's and Sun's around that time, they certainly didn't seem to. Mozilla's doesn't seem to, either. Can anyone point to a DOM that actually does provide this liveness feature? I suspect there are none... Frankly, the DOM spec is broken in this respect; supporting a live NodeList from getElementsByTagName() would add a lot of complexity, a lot of bookkeeping, and more bugs. I don't think it's worth the pain. --amk From paul@prescod.net Fri Jun 23 16:30:46 2000 From: paul@prescod.net (Paul Prescod) Date: Fri, 23 Jun 2000 10:30:46 -0500 Subject: [XML-SIG] DOM Extension Proposal References: <3950A0FB.D44DC52F@prescod.net> <39510225.F1620210@FourThought.com> <4.3.1.0.20000623134115.00b1c3c0@mail.tmt.de> Message-ID: <395382A6.5BFA1B0A@prescod.net> Walter Doerwald wrote: > > ... > > But this could be easily built on top of a parser that delivers the > attribute value as a plain string, simply call the parser again with > the attribute value string. The problem is keeping the correct namespace > context. Very few parsers return attribute values as plain strings and arguably they would be in violation of the XML spec....unless it were an option you could turn off it would be really annoying. Attempting to get the exact string representation of an XML document is swimming against the tide defined in the XML spec. That could be interpreted as a flaw in XML but it doesn't really matter. If you try it, it's going to be really, really painful...even with sgmlop/xmllib. (try getting the attributes in the right order...) -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Out of timber so crooked as that which man is made nothing entirely straight can be built. - Immanuel Kant From akuchlin@mems-exchange.org Fri Jun 23 17:10:37 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Fri, 23 Jun 2000 12:10:37 -0400 Subject: [XML-SIG] Bug in sgmlop? (_ in names) In-Reply-To: <4.3.1.0.20000623135801.00b0ae10@mail.tmt.de>; from walter@livinglogic.de on Fri, Jun 23, 2000 at 02:00:26PM +0200 References: <4.3.1.0.20000623135801.00b0ae10@mail.tmt.de> Message-ID: <20000623121037.C4805@amarok.cnri.reston.va.us> On Fri, Jun 23, 2000 at 02:00:26PM +0200, Walter Doerwald wrote: >I think I found a bug in sgmlop (from PyXML 0.5.5.1). It doesn't >recognize _ in element names. The following code: I believe this bug was fixed in the 2000/05/28 update of sgmlop, which is what's currently in the CVS tree. (And recent releases of 0.5.x should have the updated version, therefore.) --amk From Mike.Olson@fourthought.com Fri Jun 23 18:29:29 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Fri, 23 Jun 2000 11:29:29 -0600 Subject: [XML-SIG] repost: getElementsByTagName interpretation References: <20000623170913.A26630@vet.uu.nl> Message-ID: <39539E79.C9667D3F@FourThought.com> Martijn Faassen wrote: > > [I tried posting this message yesterday but somehow it didn't appear on > the list -- did anyone receive it? Here I'll try again.] > > Hi there, > > Is there any official Python DOM policy on implementing getElementsByTagName()? > According to the official DOM spec (and the level 2 proposed recommendation), > getElementsByTagName() is supposed to return a live NodeList. > > Studying the 4DOM however, it looks like the NodeList returned by > getElementsByTagName is *not* live. > Hi Martijn, We are compliant in some cases, but not all. We should probably document this better. Ex, we are compliant with childNodes, but not getElementsByTagName. To implement this on getElementsByTagName we are waiting until we have DOM Level II Events. This should make the task easier. > Looking at the archives I can only find a message by Andrew Kuchling in > october '98 complaining about the DOM spec in this regard. I hearthily > agree with him. Has since then any decision been reached on how to > approach this in Python DOMs? I'd like to know what I ought to do. I don't think there is a concensus, or even a question outstanding. The focus of 4DOM is to implement the spec as pythonic as possible. Other implementations, qp_xml, minidom, pulldom take more liberties. > > On the one hand I desperately want to implement the simple non-live behavior, > as doing the other is a major pain. On the other hand, I'd risk introducing > an incompatibility with other DOMs. The non living behaviour is pretty simple. Your right, it is a major pain to implement all node lists as live....It comes down to a trade off. Mike > > Regards, > > Martijn > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From jim@digicool.com Fri Jun 23 19:06:15 2000 From: jim@digicool.com (Jim Fulton) Date: Fri, 23 Jun 2000 14:06:15 -0400 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? Message-ID: <3953A717.5289DCC8@digicool.com> Traditionally, Python attributes (including methods) with names starting with '_' were treated as private. Why oh why then does the Python DOM implementation use method names beginning with '_'s in the public API (for getting attributes), as in '_get_nodeType'? Why not 'get_nodeType' or 'getNodeType'? Is the intent that these functions shouldn't be called by Python code? Is there are description somewhere of the Python DOM mapping, other than the DOM sources? Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From fdrake@acm.org Fri Jun 23 19:08:06 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 23 Jun 2000 11:08:06 -0700 (PDT) Subject: [XML-SIG] out of touch Message-ID: <14675.42886.423185.310220@mailhost.beopen.com> My laptop has died, so I'm catching up with a couple of days of mail. I've not had time to look at Paul's proposed DOM-like additions to the standard library. Hopefully Andrew can (or has) summarized the relevant portions of our discussion from yesterday; if I don't see it, I'll try to get another message out later. (But I don't have my saved mail or files, so there's still not a lot I can do. ;( ) -Fred -- Fred L. Drake, Jr. From just@letterror.com Fri Jun 23 20:34:00 2000 From: just@letterror.com (Just van Rossum) Date: Fri, 23 Jun 2000 20:34:00 +0100 Subject: [XML-SIG] repost: getElementsByTagName interpretation In-Reply-To: <39539E79.C9667D3F@FourThought.com> References: <20000623170913.A26630@vet.uu.nl> Message-ID: At 11:29 AM -0600 23-06-2000, Mike Olson wrote: >The non living behaviour is pretty simple. Your right, it is a major >pain to implement all node lists as live....It comes down to a trade >off. Dumb newbie question: What does "live" mean in this context? Ie. what makes a node "live" or not? Just From Mike.Olson@fourthought.com Fri Jun 23 19:52:51 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Fri, 23 Jun 2000 12:52:51 -0600 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <3953A717.5289DCC8@digicool.com> Message-ID: <3953B203.7108A95D@FourThought.com> Jim Fulton wrote: > > Traditionally, Python attributes (including methods) with > names starting with '_' were treated as private. > > Why oh why then does the Python DOM implementation use > method names beginning with '_'s in the public API (for > getting attributes), as in '_get_nodeType'? Why not > 'get_nodeType' or 'getNodeType'? Is the intent that these > functions shouldn't be called by Python code? The methods really are not part of the DOM API. Example: the API defines the attribute nodeType. the python mapping (based off the python CORBA mapping) translates this into _get_nodeType for accessors. This interface is not published, but people do use it. It is encouraged that you directly access the attributes. > Is there are description somewhere of the Python DOM mapping, > other than the DOM sources? I'm not sure if anyone has put together a formal document. We've based most of it off the CORBA mapping at http://www.python.org/sigs/do-sig/corbamap.html Mike > > Jim > > -- > Jim Fulton mailto:jim@digicool.com Python Powered! > Technical Director (888) 344-4332 http://www.python.org > Digital Creations http://www.digicool.com http://www.zope.org > > Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email > address may not be added to any commercial mail list with out my > permission. Violation of my privacy with advertising or SPAM will > result in a suit for a MINIMUM of $500 damages/incident, $1500 for > repeats. > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From akuchlin@mems-exchange.org Fri Jun 23 20:27:31 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Fri, 23 Jun 2000 15:27:31 -0400 Subject: [XML-SIG] repost: getElementsByTagName interpretation In-Reply-To: ; from just@letterror.com on Fri, Jun 23, 2000 at 08:34:00PM +0100 References: <20000623170913.A26630@vet.uu.nl> <39539E79.C9667D3F@FourThought.com> Message-ID: <20000623152731.D4805@amarok.cnri.reston.va.us> On Fri, Jun 23, 2000 at 08:34:00PM +0100, Just van Rossum wrote: >Dumb newbie question: What does "live" mean in this context? Ie. what makes >a node "live" or not? NodeLists are live, not nodes. It means that when you access element.childNodes and get a list of children, if you then add a child to element, the list you retrieved will also be updated to include the new child. This is easy in Python for childNodes, since childNodes is probably just a Python list anyway, so you just return the list and get the liveness property for free. Where this falls down is .getElementByTagName('X'), which returns a NodeList containing all 'X' elements in the tree. If this is live, then every time you modify the DOM tree by adding, deleting, or moving an element, you have to ask "Are there any .getElementByTagName() NodeLists out there that would change as a result of this?" If you consider a change that moves or deletes many elements, such as deleting a chapter from a book, this seems quite expensive and time-consuming. -- A.M. Kuchling http://starship.python.net/crew/amk/ Of course I'm here. I've always been here, there, and everywhere. -- Envelope Girl sounds rather Kosh-like, in ENIGMA #5: "Lizards and Ghosts" From jim@digicool.com Fri Jun 23 20:51:00 2000 From: jim@digicool.com (Jim Fulton) Date: Fri, 23 Jun 2000 15:51:00 -0400 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <3953A717.5289DCC8@digicool.com> <3953B203.7108A95D@FourThought.com> Message-ID: <3953BFA4.916D454D@digicool.com> Mike Olson wrote: > > Jim Fulton wrote: > > > > Traditionally, Python attributes (including methods) with > > names starting with '_' were treated as private. > > > > Why oh why then does the Python DOM implementation use > > method names beginning with '_'s in the public API (for > > getting attributes), as in '_get_nodeType'? Why not > > 'get_nodeType' or 'getNodeType'? Is the intent that these > > functions shouldn't be called by Python code? > > The methods really are not part of the DOM API. The DOM level 2 recommendation says: "1.Attributes defined in the IDL do not imply concrete objects which must have specific data members - in the language bindings, they are translated to a pair of get()/set() functions, not to a data member. Read-only attributes have only a get() function in the language bindings." This says to me that the DOM API specifies use of methods for interface attributes. > Example: the API defines the attribute nodeType. the python mapping > (based off the python CORBA mapping) translates this into _get_nodeType > for accessors. I looked at the Python CORBA mapping, but it didn't say anything about mapping interface-defined attributes. Structure members are treated as simple Python attributes. > This interface is not published, but people do use it. I think that the benefit of a Python DOM is greatly reduced if there isn't a well-known (ie published) Python mapping for it. > It is encouraged > that you directly > access the attributes. This needs to be specified, not encouraged. If the '_get_' methods are part of the interface, then this needs to be spelled out. If they are *not* part of the interface, then this should be known too. I personally, would like to have both method and attribute access, but I'm not happy with public method ames that start with '_'. > > Is there are description somewhere of the Python DOM mapping, > > other than the DOM sources? > > I'm not sure if anyone has put together a formal document. We've based > most of it off the CORBA mapping at > http://www.python.org/sigs/do-sig/corbamap.html I've looked at this several times and don't see a discussion of attribute mapping. What section of this are you refering to? I've also looked at http://www.omg.org/cgi-bin/doc?ptc/00-01-12, which is similar and also leaves attribute mapping unspecified. :( Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From fdrake@beopen.com Fri Jun 23 20:58:52 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Fri, 23 Jun 2000 12:58:52 -0700 (PDT) Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: <3953A717.5289DCC8@digicool.com> References: <3953A717.5289DCC8@digicool.com> Message-ID: <14675.49532.375238.979659@mailhost.beopen.com> Jim Fulton writes: > Traditionally, Python attributes (including methods) with > names starting with '_' were treated as private. Yes, and this works well. > Why oh why then does the Python DOM implementation use > method names beginning with '_'s in the public API (for > getting attributes), as in '_get_nodeType'? Why not > 'get_nodeType' or 'getNodeType'? Is the intent that these > functions shouldn't be called by Python code? This is a function of the CORBA IDL mapping; DOM is specified in IDL. I've looked at this and, however unfortunate, there are very good reasons for using the underscore in this way with the mapping. The names of the get and set methods must not map onto normal IDL identifiers, which can't start with an underscore. > Is there are description somewhere of the Python DOM mapping, > other than the DOM sources? The W3C documentation gives the IDL mapping, which requires the Python specific mapping. -Fred -- Fred L. Drake, Jr. From jim@digicool.com Fri Jun 23 21:20:54 2000 From: jim@digicool.com (Jim Fulton) Date: Fri, 23 Jun 2000 16:20:54 -0400 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <3953A717.5289DCC8@digicool.com> <3953B203.7108A95D@FourThought.com> <3953BFA4.916D454D@digicool.com> Message-ID: <3953C6A6.36650A9B@digicool.com> Jim Fulton wrote: > (snip) > > I looked at the Python CORBA mapping, but it didn't say anything > about mapping interface-defined attributes. Structure members > are treated as simple Python attributes. My mistake. The Python laguage binding does specify '_get_' and '_set_' methods. This is a mistake IMO, but I agree that the Python DOM is correct to follow this lead. :( Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From jim@digicool.com Fri Jun 23 21:35:47 2000 From: jim@digicool.com (Jim Fulton) Date: Fri, 23 Jun 2000 16:35:47 -0400 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <3953A717.5289DCC8@digicool.com> <14675.49532.375238.979659@mailhost.beopen.com> Message-ID: <3953CA23.590C2962@digicool.com> "Fred L. Drake, Jr." wrote: > > Jim Fulton writes: > > Traditionally, Python attributes (including methods) with > > names starting with '_' were treated as private. > > Yes, and this works well. > > > Why oh why then does the Python DOM implementation use > > method names beginning with '_'s in the public API (for > > getting attributes), as in '_get_nodeType'? Why not > > 'get_nodeType' or 'getNodeType'? Is the intent that these > > functions shouldn't be called by Python code? > > This is a function of the CORBA IDL mapping; DOM is specified in > IDL. I've looked at this and, however unfortunate, there are very > good reasons for using the underscore in this way with the mapping. > The names of the get and set methods must not map onto normal IDL > identifiers, which can't start with an underscore. Are you saying that there is a danger that there might be an interface with an attribute 'foo' and a method 'get_foo' (or 'getFoo' or whatever)? I believe that other language mappings don't worry about this. For example, the C++ mapping uses 'get_foo'. > > Is there are description somewhere of the Python DOM mapping, > > other than the DOM sources? > > The W3C documentation gives the IDL mapping, which requires the > Python specific mapping. I agree. IMO, the Python IDL mapping is broken. :( Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From jim@digicool.com Fri Jun 23 22:19:15 2000 From: jim@digicool.com (Jim Fulton) Date: Fri, 23 Jun 2000 17:19:15 -0400 Subject: [XML-SIG] Unimportant, CORBA C++ attribute mapping References: <3953A717.5289DCC8@digicool.com> <14675.49532.375238.979659@mailhost.beopen.com> <3953CA23.590C2962@digicool.com> Message-ID: <3953D453.494F533E@digicool.com> Jim Fulton wrote: > (snip) > For example, the C++ mapping uses 'get_foo'. Nope, wrong again. :[ The C++ mapping uses foo() and foo(v) to get and set an attribute named foo. Not that it matters. :) Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From paul@prescod.net Fri Jun 23 23:17:48 2000 From: paul@prescod.net (Paul Prescod) Date: Fri, 23 Jun 2000 15:17:48 -0700 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <3953A717.5289DCC8@digicool.com> <14675.49532.375238.979659@mailhost.beopen.com> Message-ID: <3953E20C.6239D946@prescod.net> "Fred L. Drake, Jr." wrote: > > ... > > > Is there are description somewhere of the Python DOM mapping, > > other than the DOM sources? > > The W3C documentation gives the IDL mapping, which requires the > Python specific mapping. Actually, the DOM can be mapped into a language in a manner that does not follow directly from the IDL and CORBA specs. That's why there is a formally defined java binding rather than just a reference to the IDL specs. Historically, though, 4DOM was really a CORBA tool so it really needed to follow the specs. I would vote for losing the leading underscore. > This says to me that the DOM API specifies use of methods > for interface attributes. I think it is safe to say that a binding should not require a particular underlying data structure but Python allows the use of a.b syntax even when the surface structure is wildly different than the underlying data structure. I'm preaching to the choir as you are the world leader in abuse of the dot notation. :) Visual Basic and ECMAScript (the latter is actually specified) also use dot notation for what could conceptually be a method invocation. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Out of timber so crooked as that which man is made nothing entirely straight can be built. - Immanuel Kant From fig@oreilly.com Fri Jun 23 23:34:57 2000 From: fig@oreilly.com (Stephen R. Figgins) Date: Fri, 23 Jun 2000 15:34:57 -0700 Subject: [XML-SIG] PyObject_NEW or PyObject_New? Message-ID: <200006232234.PAA01613@rock.west.ora.com> Sorry if this has been mentioned before... I didn't see this in a quick review of the archives: With PyXML-0.5.5.tar.gz I couldn't get the extensions/pyexpat.c file to compile on my Linux system. It complained about a parse error on line 474 before xmlparseobject. Should PyObject_New be PyObject_NEW? (That is what it looks like in the header file.) When I make this change, the program compiles okay, but I haven't tried using it with anything yet. Hope this is helpful. Stephen Figgins fig@oreilly.com From fig@oreilly.com Fri Jun 23 23:51:07 2000 From: fig@oreilly.com (Stephen R. Figgins) Date: Fri, 23 Jun 2000 15:51:07 -0700 Subject: [XML-SIG] PyXML Windows install Message-ID: <200006232251.PAA03502@rock.west.ora.com> I picked up the PyXML.exe install program pointed to by the Vaults to install on my Win98 system. It looks like it installed okay, but when I run sax, it prints out a bunch of lines with brackets. For example, querying the example quotation file I end up with lines like [] QUOTATION: [] [] [] [] [] This is not a technical issue so much as a human issue; we are limited and so is our time. (Is this a bug or a feature of time? Careful; trick question!) {Fred Drake on the Documentation SIG, 9 Sep 1998}... (208 bytes) When running the same code on my Linux box I do not get the brackets. Any idea what is up with that? Stephen Figgins fig@oreilly.com From dieter@handshake.de Fri Jun 23 19:11:31 2000 From: dieter@handshake.de (Dieter Maurer) Date: Fri, 23 Jun 2000 20:11:31 +0200 (CEST) Subject: [XML-SIG] DOM Extension Proposal In-Reply-To: <39510225.F1620210@FourThought.com> References: <3950A0FB.D44DC52F@prescod.net> <39510225.F1620210@FourThought.com> Message-ID: <200006231811.UAA00462@lindm.dm> Mike Olson writes: > Paul Prescod wrote: > > > > Insofar as the DOM does not address Python's syntax overloading, it does > > not say what we must do in our overloading. > > > > I propose that we extend the DOM with a new type AttributeList that is a > > subclass of NamedNodeMap: > > > > It would override __getitem__ to return the *value* of the reference > > attribute node instead of a (often useless and annoying) attribute node > > object. > > We inherit NamedNodeMap from UserDict now so we are not too far off. > However, we do return the Attribute node. I suppose we could override > this to return just the value of the node. > > Anyone else's thoughts? It will break some code. However, Unicode handling will in any case require some changes to existing code. Therefore, this is a good time to make any changes that seems to be worth. I would like to stress, however, that all DOM specified functions and attributes should behave as specified in DOM. In our case: while "attributes[name]" may well return the attribute's value, the DOM functions "item" and "getNamedItem" should of cause return "Node". Dieter From fig@oreilly.com Sat Jun 24 00:08:49 2000 From: fig@oreilly.com (Stephen R. Figgins) Date: Fri, 23 Jun 2000 16:08:49 -0700 Subject: [XML-SIG] PyObject_NEW or PyObject_New? Message-ID: <200006232308.QAA05514@rock.west.ora.com> > >Sorry if this has been mentioned before... I didn't see this in a >quick review of the archives: Following up my own post... I checked out the latest CVS and see that this is a compatibility problem between Python 1.5 and Python 1.6, and that it has been addressed. For 1.5 it should have been NEW. Nevermind.... Stephen R. Figgins fig@oreilly.com From paul@prescod.net Sat Jun 24 20:23:20 2000 From: paul@prescod.net (Paul Prescod) Date: Sat, 24 Jun 2000 12:23:20 -0700 Subject: [XML-SIG] PyExpat changes References: Message-ID: <39550AA8.8CBA8352@prescod.net> "A.M. Kuchling" wrote: > > I've just checked in the changes to make the Expat module return > Unicode or 8-bit strings, depending on the setting of the > returns_unicode attribute. How do I get this version? According to SourceForge, that module hasn't changed for 7 weeks: http://cvs.sourceforge.net/cgi-bin/cvsweb.cgi/python/dist/src/Modules/pyexpat.c?cvsroot=python I would like to add SAX2 support. I've figured out that it doesn't take many changes and I don't need to break old code. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Out of timber so crooked as that which man is made nothing entirely straight can be built. - Immanuel Kant From GSMiros@netscape.net Sun Jun 25 03:10:08 2000 From: GSMiros@netscape.net (Rosalie Dieteman) Date: 24 Jun 00 19:10:08 PDT Subject: [XML-SIG] Re: getElementsByTagName interpretation Message-ID: <20000625021008.1001.qmail@www0d.netaddress.usa.net> Where this falls down is .getElementByTagName('X'), which returns a NodeList containing all 'X' elements in the tree. If this is live, then every time you modify the DOM tree by adding, deleting, or movin= g an element, you have to ask "Are there any .getElementByTagName() NodeLists out there that would change as a result of this?" If you consider a change that moves or deletes many elements, such as deleting a chapter from a book, this seems quite expensive and time-consuming. Another nasty situation I can think of is stepping through a NodeList del= eting the nodes as you step through... in a "live" list, the list you're steppi= ng through would be changing as you step... The obvious answer is to step la= st to first, but it's one more argument against "live"ness. ____________________________________________________________________ Get your own FREE, personal Netscape WebMail account today at http://webm= ail.netscape.com. From uogbuji@fourthought.com Sun Jun 25 15:55:04 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 25 Jun 2000 08:55:04 -0600 Subject: [XML-SIG] Python 1.6 XML APIs In-Reply-To: Message from "Fred L. Drake, Jr." of "Tue, 20 Jun 2000 11:29:16 EDT." <14671.36300.815139.963941@cj42289-a.reston1.va.home.com> Message-ID: <200006251455.IAA13427@localhost.localdomain> Catching up... > I am hesitant to say we'll accept several new modules for 1.6 at > this point, however. Perhaps it makes sense to pick a couple that > offer the flexibility (saxlib?) and ease-of-use (pulldom? minidom?). > If we can narrow it down quickly, I can talk to Guido about what > should go in, but we're essentially at feature-freeze now. We can be > a little more flexible for library modules, but each module that gets > added is essentially a promise that it'll be maintained for at least > half of all eternity. More, if anyone uses it. ;) While I was one of those pulling for minidom in Python 1.6, Fred's comments do give me pause. Maybe we should first meld it into the XML distro and see if users like it. What if we want interface changes (for instance, in Paul's pulldom example it doesn't use the Python/DOM binding the xml-sig agreed on)? Then, all of a sudden we have legacy to consider. It would be nice to fill that feature-list check-box for packaged Python, but maybe it's worth goign cautiously and waiting for Python 1.7. If minidom is incorporated quickly into PyXML, it will be easy enough for any interested users to give a spin. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uogbuji@fourthought.com Sun Jun 25 16:11:41 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 25 Jun 2000 09:11:41 -0600 Subject: [XML-SIG] Python 1.6 XML APIs In-Reply-To: Message from Paul Prescod of "Wed, 21 Jun 2000 10:40:29 CDT." <3950E1ED.1E3052F3@prescod.net> Message-ID: <200006251511.JAA13468@localhost.localdomain> > "Fred L. Drake, Jr." wrote: > > > > My biggest concern lies in the potential bugginess of the > > implementation; I expect Paul knows more about XML & actually applying > > it than most of us (though possibly not all), and has sufficient > > experience that he can craft a solid API. > > The DOM stuff is almost entirely "by the book." I do feel a bit better about the API now that I've seent Paul's longer pulldom example, which does seem to conform to the Python/DOM mapping (I'll assume the "child_nodes" in the earlier example was a typo). However, there are still the inevitable extansions (ParseString, AttributeList, etc.), which, while they seem innocuous to me so far, may be worth scrutinizing over time. > The "pulldom" stuff is > only about three methods: > > 1. Parse my file please > 2. Give me a node please > 3. Fill in this node's children and descendants please. > > It's simpler than XMLLib, SAX or anything else I've ever seen. No > magical method names, no special base classes, nothing. I'll admit to its simplicity, and the opportunity for lazy instantiation is attractive, but I don't think I'd tend to use it a great deal, which, I imagine, is OK. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uogbuji@fourthought.com Sun Jun 25 16:16:10 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 25 Jun 2000 09:16:10 -0600 Subject: [XML-SIG] Pulldom example In-Reply-To: Message from Paul Prescod of "Wed, 21 Jun 2000 15:55:55 CDT." <39512BDB.5B06FD81@prescod.net> Message-ID: <200006251516.JAA13479@localhost.localdomain> > We need feedback on pulldom as soon as possible. Please don't hesitate > to tell me you don't like it. Just do keep in mind that it is intended > as a simple, convenient replacement for xmllib, not for something > sophisticated like XSLT nor ultra-efficient like SAX. > > Please take a look at the example code and give an opinion....negative > ones are fine. If you think SAX or xmllib is easier or otherwise better, > you should say so now... It seems rather tedious to me, but then again, I think I'm biased. As 4XSLT has matured and improved performance, and as we've improved support for extensions, I've found fewer and fewer tasks to be done otherwise. But as a consultant, my usage patterns tend to change sharply from time to time, so I imagine XSLT might someday soon just get in my way, and pulldom might be attractive (plain SAX/SAX2 is rather annoying because it is a pain to genericize). -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uogbuji@fourthought.com Sun Jun 25 16:17:19 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 25 Jun 2000 09:17:19 -0600 Subject: [XML-SIG] Pulldom example In-Reply-To: Message from Paul Prescod of "Wed, 21 Jun 2000 15:55:55 CDT." <39512BDB.5B06FD81@prescod.net> Message-ID: <200006251517.JAA13490@localhost.localdomain> I propose that Paul incorporate minidom/pulldom into the PyXML CVS soonest, so there is every opportunity to shake it out thoroughly whether or not we try to get it into Python 1.6. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uogbuji@fourthought.com Sun Jun 25 16:19:36 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 25 Jun 2000 09:19:36 -0600 Subject: [XML-SIG] DOM Extension Proposal In-Reply-To: Message from Lars Marius Garshol of "22 Jun 2000 10:23:11 +0200." Message-ID: <200006251519.JAA13508@localhost.localdomain> > > * Mike Olson > | > | We inherit NamedNodeMap from UserDict now so we are not too far off. > | However, we do return the Attribute node. I suppose we could override > | this to return just the value of the node. > | > | Anyone else's thoughts? > > I have been wondering why we even have Attribute nodes in the DOM tree > at all. They are mainly useful for representing entity references in > attribute values, something that is very rarely useful. :-) > > So I think there would be definite performance benefits (in terms of > both speed and memory use) in keeping a dictionary of names -> string > values instead of names -> nodes. > > Attribute nodes could be lazily instantiated when someone calls > getAttributeNode. If we want to support entity references inside > attributes we can do this by using lists in the dictionary for those > cases instead of strings. Most likely this would be used in much less > than 1% of the cases and I wouldn't complain if we decided not to > support this stuff at all. This is exactly an optimization we have in mind for 4DOM 1.0, inspired by a similar enhancement in Xalan's DOM. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uogbuji@fourthought.com Sun Jun 25 16:31:02 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 25 Jun 2000 09:31:02 -0600 Subject: [XML-SIG] DOM Extension Proposal In-Reply-To: Message from Paul Prescod of "Thu, 22 Jun 2000 10:02:45 CDT." <39522A95.F6D392C2@prescod.net> Message-ID: <200006251531.JAA13534@localhost.localdomain> Lars Marius Garshol wrote: > > I have been wondering why we even have Attribute nodes in the DOM tree > > at all. They are mainly useful for representing entity references in > > attribute values, something that is very rarely useful. :-) Paul Prescod: > Attributes can also be independently addressed as objects in XPath, > XSLT, Schematron, etc. For non-XPath purposes, we're planning to put in this optimization because lazy instanciation would cover cases where nodes are needed. For most XPath purposes, this optimization is useless, since we keep a node document-order index that speeds selection up several times, but requires that every node, including attributes, be evaluated. So, for 4XPath, we're planning to go even further: provide a small, read-only DOM with _many_ optimizations and an a special doc-order index that checks element/attribute value rather than attribute node ID (this is not an option for a read/write DOM). If you pass 4XPath an XML string, it will generate this "domlette", as we're calling it. If you pass 4XPath a straight DOM, it works as now, and you miss out on the performance benefits of domlette, but at least the legacy is intact. Does anyone have any comments on this? We're about to begin work on it. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uogbuji@fourthought.com Sun Jun 25 16:40:59 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 25 Jun 2000 09:40:59 -0600 Subject: [XML-SIG] speed question re DOM parsing In-Reply-To: Message from Greg Stein of "Thu, 22 Jun 2000 19:41:53 PDT." <20000622194153.C29590@lyra.org> Message-ID: <200006251540.JAA13556@localhost.localdomain> Greg Stein: > When could I get to it? eek. I *will*, but dunno when. It is amazing just > how much stuff can fall on a person's plate despite having no job :-). I've > got some layered I/O in Apache, mod_dav integration, a new httplib, imputil > issues, these qp_xml upgrades, ViewCVS stuff, edna releases, free threading > changes, Python/Apache integration, and coding for Subversion. Fuggin > frightening. Ooh! Ooh! Can you tell us more about the Python/Apache integration item? We've been discussing distributing a 4Suite kit bundled with PyApache, but that package is quite complex and I'm not sure how strongly maintained. Are you talking about improvements to PyApache? Another approach entirely (PyApache still has much more overhead than, say mod_perl)? Do you have an approximate time-line? A project URL? Thanks. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uogbuji@fourthought.com Sun Jun 25 16:48:57 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 25 Jun 2000 09:48:57 -0600 Subject: [XML-SIG] repost: getElementsByTagName interpretation In-Reply-To: Message from Martijn Faassen of "Fri, 23 Jun 2000 17:09:13 +0200." <20000623170913.A26630@vet.uu.nl> Message-ID: <200006251548.JAA13567@localhost.localdomain> > [I tried posting this message yesterday but somehow it didn't appear on > the list -- did anyone receive it? Here I'll try again.] > > Hi there, > > Is there any official Python DOM policy on implementing getElementsByTagName()? > According to the official DOM spec (and the level 2 proposed recommendation), > getElementsByTagName() is supposed to return a live NodeList. > > Studying the 4DOM however, it looks like the NodeList returned by > getElementsByTagName is *not* live. We are considering adding liveness once we finish implementing Level 2 mutation events, but I'm concerned about performance. We'll probably do so anyway for compliance. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Mike.Olson@fourthought.com Sun Jun 25 16:47:37 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Sun, 25 Jun 2000 09:47:37 -0600 Subject: [XML-SIG] Re: getElementsByTagName interpretation References: <20000625021008.1001.qmail@www0d.netaddress.usa.net> Message-ID: <39562999.781DCD15@FourThought.com> Rosalie Dieteman wrote: > > Where this falls down is .getElementByTagName('X'), which returns a > NodeList containing all 'X' elements in the tree. If this is live, > then every time you modify the DOM tree by adding, deleting, or moving > an element, you have to ask "Are there any .getElementByTagName() > NodeLists out there that would change as a result of this?" If you > consider a change that moves or deletes many elements, such as > deleting a chapter from a book, this seems quite expensive and > time-consuming. > > Another nasty situation I can think of is stepping through a NodeList deleting > the nodes as you step through... in a "live" list, the list you're stepping > through would be changing as you step... The obvious answer is to step last to > first, but it's one more argument against "live"ness. I'd thought of this one too, but luckily there is no delete interface defined on a node list. Only accessor functions. so you can't change the list with only a reference to the list. Mike > > ____________________________________________________________________ > Get your own FREE, personal Netscape WebMail account today at http://webmail.netscape.com. > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uogbuji@fourthought.com Sun Jun 25 16:50:53 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 25 Jun 2000 09:50:53 -0600 Subject: [XML-SIG] repost: getElementsByTagName interpretation In-Reply-To: Message from "Andrew M. Kuchling" of "Fri, 23 Jun 2000 11:30:04 EDT." <20000623113004.A4805@amarok.cnri.reston.va.us> Message-ID: <200006251550.JAA13589@localhost.localdomain> > On Fri, Jun 23, 2000 at 05:09:13PM +0200, Martijn Faassen wrote: > >On the one hand I desperately want to implement the simple non-live behavior, > >as doing the other is a major pain. On the other hand, I'd risk introducing > >an incompatibility with other DOMs. > > I'd be interested in knowing if there's any DOM that *does* implement > the live NodeList behaviour for getElementsByTagName(); when I looked > at IBM's and Sun's around that time, they certainly didn't seem to. > Mozilla's doesn't seem to, either. Can anyone point to a DOM that > actually does provide this liveness feature? I suspect there are > none... I think most do now, after long debate about it on www-dom. > Frankly, the DOM spec is broken in this respect; supporting a live > NodeList from getElementsByTagName() would add a lot of complexity, a > lot of bookkeeping, and more bugs. I don't think it's worth the pain. Actually, using Level 2 mutation events would lessen the pain, but the key is doing so without too much performance penalty on thise who (sensibly) don't want to mutate the DOM during traversal. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uogbuji@fourthought.com Sun Jun 25 16:53:53 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 25 Jun 2000 09:53:53 -0600 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: Message from Jim Fulton of "Fri, 23 Jun 2000 14:06:15 EDT." <3953A717.5289DCC8@digicool.com> Message-ID: <200006251553.JAA13600@localhost.localdomain> > Traditionally, Python attributes (including methods) with > names starting with '_' were treated as private. This is an informal tradition, not universal, and hardly normative. > Why oh why then does the Python DOM implementation use > method names beginning with '_'s in the public API (for > getting attributes), as in '_get_nodeType'? Why not > 'get_nodeType' or 'getNodeType'? Is the intent that these > functions shouldn't be called by Python code? We have it this way in order to follow the Python/CORBA mapping. > Is there are description somewhere of the Python DOM mapping, > other than the DOM sources? Just the xml-sig mailing list archives, I'm afraid. We should indeed work on this. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uogbuji@fourthought.com Sun Jun 25 17:08:38 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 25 Jun 2000 10:08:38 -0600 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: Message from Jim Fulton of "Fri, 23 Jun 2000 15:51:00 EDT." <3953BFA4.916D454D@digicool.com> Message-ID: <200006251608.KAA13668@localhost.localdomain> > > Jim Fulton wrote: > I've also looked at http://www.omg.org/cgi-bin/doc?ptc/00-01-12, > which is similar and also leaves attribute mapping unspecified. :( Just as a note to all, on the do-sig, Jim was already pointed to the right clause in the Python/CORBA mapping, and Martin von Lowis also explained why it is so (avoiding IDL name-clashes). The remaining debate is whether to follow the Python/CORBA mapping. I say, no reason not to do so. There is nothing normative about leading "_" being private that I know of. It ends up being a question of style versus spec unification, and I'm always wont to go for the latter. Note that 4DOM originally used "getChildNodes", etc, but I was happy to dump that because of its mismatch with Python attribute access, and in light of the Python/CORBA mapping. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uogbuji@fourthought.com Sun Jun 25 17:14:19 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 25 Jun 2000 10:14:19 -0600 Subject: [XML-SIG] Re: getElementsByTagName interpretation In-Reply-To: Message from Rosalie Dieteman of "24 Jun 2000 19:10:08 PDT." <20000625021008.1001.qmail@www0d.netaddress.usa.net> Message-ID: <200006251614.KAA13694@localhost.localdomain> > Where this falls down is .getElementByTagName('X'), which returns a > NodeList containing all 'X' elements in the tree. If this is live, > then every time you modify the DOM tree by adding, deleting, or moving > an element, you have to ask "Are there any .getElementByTagName() > NodeLists out there that would change as a result of this?" If you > consider a change that moves or deletes many elements, such as > deleting a chapter from a book, this seems quite expensive and > time-consuming. > > Another nasty situation I can think of is stepping through a NodeList deleting > the nodes as you step through... in a "live" list, the list you're stepping > through would be changing as you step... The obvious answer is to step last to > first, but it's one more argument against "live"ness. The DOM WG's answer to this is "use NodeIterator or TreeWalker", which is required to handle this situation gracefully. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From dieter@handshake.de Sun Jun 25 20:38:57 2000 From: dieter@handshake.de (Dieter Maurer) Date: Sun, 25 Jun 2000 21:38:57 +0200 (CEST) Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: <3953E20C.6239D946@prescod.net> References: <3953A717.5289DCC8@digicool.com> <3953E20C.6239D946@prescod.net> Message-ID: <14678.24248.681373.592096@lindm.dm> Paul Prescod writes: > > The W3C documentation gives the IDL mapping, which requires the > > Python specific mapping. > > Actually, the DOM can be mapped into a language in a manner that does > not follow directly from the IDL and CORBA specs. That's why there is a > formally defined java binding rather than just a reference to the IDL > specs. Historically, though, 4DOM was really a CORBA tool so it really > needed to follow the specs. > > I would vote for losing the leading underscore. I would vote against. DOM is specified in terms of IDL. Python has an IDL -> Python mapping. Deviating from this mapping for DOM only would require special knowledge -- a thing I do not like. I would not object, though, when the Python IDL mapping would use the Java approach: prepend '_' only if doing otherwise would introduce name clashes. Dieter From faassen@vet.uu.nl Mon Jun 26 00:04:30 2000 From: faassen@vet.uu.nl (Martijn Faassen) Date: Mon, 26 Jun 2000 01:04:30 +0200 Subject: [XML-SIG] repost: getElementsByTagName interpretation In-Reply-To: <39538215.D9E32589@prescod.net> References: <20000623170913.A26630@vet.uu.nl> <39538215.D9E32589@prescod.net> Message-ID: <20000626010430.A29091@vet.uu.nl> Paul Prescod wrote: > Minidom does not do live nodelists. Is minidom now the 'official Python DOM'? I.e. what's the status of 4DOM at the moment? > Are you implementing a DOM object document storage for MetaKit or a > mapping from MetaKit's "object model" to the DOM. The former; I'm building a DOM implementation on top of MetaKit, for no particular good reason (except as a learning experience, as who knows it might even turn out useful :) Regards, Martijn From faassen@vet.uu.nl Mon Jun 26 00:10:08 2000 From: faassen@vet.uu.nl (Martijn Faassen) Date: Mon, 26 Jun 2000 01:10:08 +0200 Subject: [XML-SIG] repost: getElementsByTagName interpretation In-Reply-To: <20000623113004.A4805@amarok.cnri.reston.va.us> References: <20000623170913.A26630@vet.uu.nl> <20000623113004.A4805@amarok.cnri.reston.va.us> Message-ID: <20000626011008.B29091@vet.uu.nl> Andrew M. Kuchling wrote: > On Fri, Jun 23, 2000 at 05:09:13PM +0200, Martijn Faassen wrote: > >On the one hand I desperately want to implement the simple non-live behavior, > >as doing the other is a major pain. On the other hand, I'd risk introducing > >an incompatibility with other DOMs. > > I'd be interested in knowing if there's any DOM that *does* implement > the live NodeList behaviour for getElementsByTagName(); when I looked > at IBM's and Sun's around that time, they certainly didn't seem to. That's interesting -- the new DOM level 2 proposal and the requirements proposal don't seem to be talking about getting rid of this 'feature' of the DOM; in fact the DOM level 2 clarifies it, I think. > Mozilla's doesn't seem to, either. Can anyone point to a DOM that > actually does provide this liveness feature? I suspect there are > none... That's interesting. :) (presumably the other DOMs _do_ implement live NodeLists for getChildNodes() and such?) > Frankly, the DOM spec is broken in this respect; supporting a live > NodeList from getElementsByTagName() would add a lot of complexity, a > lot of bookkeeping, and more bugs. I don't think it's worth the pain. Agreed, definitely. The more I think about possible implementation strategies, the more I agree. :) There just has to be somekind of reasoning behind making this list live, right? I wonder what.. Regards, Martijn From gstein@lyra.org Mon Jun 26 00:20:43 2000 From: gstein@lyra.org (Greg Stein) Date: Sun, 25 Jun 2000 16:20:43 -0700 Subject: Python/Apache stuff (was: Re: [XML-SIG] speed question re DOM parsing) In-Reply-To: <200006251540.JAA13556@localhost.localdomain>; from uogbuji@fourthought.com on Sun, Jun 25, 2000 at 09:40:59AM -0600 References: <200006251540.JAA13556@localhost.localdomain> Message-ID: <20000625162043.J29590@lyra.org> On Sun, Jun 25, 2000 at 09:40:59AM -0600, Uche Ogbuji wrote: > Greg Stein: > > When could I get to it? eek. I *will*, but dunno when. It is amazing just > > how much stuff can fall on a person's plate despite having no job :-). I've > > got some layered I/O in Apache, mod_dav integration, a new httplib, imputil > > issues, these qp_xml upgrades, ViewCVS stuff, edna releases, free threading > > changes, Python/Apache integration, and coding for Subversion. Fuggin > > frightening. > > Ooh! Ooh! Can you tell us more about the Python/Apache integration item? > We've been discussing distributing a 4Suite kit bundled with PyApache, but > that package is quite complex and I'm not sure how strongly maintained. Are > you talking about improvements to PyApache? Another approach entirely > (PyApache still has much more overhead than, say mod_perl)? Do you have an > approximate time-line? A project URL? There are several Python/Apache efforts (where Python is embedded into the Apache process): *) PyApache: essentially this is just a CGI accelerator. Take a standard CGI script and it will "run faster." (URL? dunno) *) mod_python: similar to mod_perl. Built for Apache 1.3. Despite its version 2.4, it is still a bit rough. I've been working with the author to improve the code. I did a code review a while back with a lot of suggestions. It is reasonable, but not as mature as mod_perl. http://www.modpython.org/ *) mod_snake: obvious misnomer :-). This is a module built for Apache 2.0, with the intent of making use of Apache 2.0's threadedness (plus a few of A2's other internal features). The code is very good looking. - available at SourceForge *) mod_slimpy: my name for an Apache 2.0 module which will be even lighter weight than mod_snake. There will be only the slimmest layer of C code to interface Apache and Python. Most/all operational logic will be deferred to the Python side. I've used this design to good effect in some of the Python/COM work and its univgw package in particular. Obviously, the latter two will compete, but oh well :-). The first two fit their problem environment without particular complications. When I start the mod_slimpy work, I'm also going to push on setting up python.apache.org. Essentially, it will host (under the ASF umbrella) the mod_slimpy work plus any other efforts that may want to operate there. I'm going to ask the authors of the other packages whether they would like to be hosted there, too. Of course, python.apache.org can host any Python project. It doesn't have to be related to the web server, or any other ASF project for that matter. Consider all the non-web stuff that operates under xml.apache.org, java.apache.org, and jakarta.apache.org. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Mon Jun 26 00:24:07 2000 From: gstein@lyra.org (Greg Stein) Date: Sun, 25 Jun 2000 16:24:07 -0700 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: <200006251608.KAA13668@localhost.localdomain>; from uogbuji@fourthought.com on Sun, Jun 25, 2000 at 10:08:38AM -0600 References: <200006251608.KAA13668@localhost.localdomain> Message-ID: <20000625162407.K29590@lyra.org> On Sun, Jun 25, 2000 at 10:08:38AM -0600, Uche Ogbuji wrote: > > > Jim Fulton wrote: > > > I've also looked at http://www.omg.org/cgi-bin/doc?ptc/00-01-12, > > which is similar and also leaves attribute mapping unspecified. :( > > Just as a note to all, on the do-sig, Jim was already pointed to the right > clause in the Python/CORBA mapping, and Martin von Lowis also explained why it > is so (avoiding IDL name-clashes). > > The remaining debate is whether to follow the Python/CORBA mapping. I say, no > reason not to do so. There is nothing normative about leading "_" being > private that I know of. It ends up being a question of style versus spec > unification, and I'm always wont to go for the latter. Nothing normative? Come on. That is simply an antagonistic position. "from foo import *" is bad taste, but it codifies the notion of "_" being private. (symbols starting with "_" are not imported) Years and years of "_" usage meaning "private" also codify it. And you say "normative" ... what? Do we need to write an RFC to satisfy you? And do you also recognize that many things that occur in the RFC are there simply because they *ARE* de facto standards? I believe it is an entirely untenable position to state that "_" does not mean "private". IMO, the Python/CORBA mapping is simply stupid if it demands leading underscores for any public item. Way stupid, and totally ignorant of Python's de facto standards. Cheers, -g -- Greg Stein, http://www.lyra.org/ From uogbuji@fourthought.com Mon Jun 26 01:13:14 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 25 Jun 2000 18:13:14 -0600 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: Message from Greg Stein of "Sun, 25 Jun 2000 16:24:07 PDT." <20000625162407.K29590@lyra.org> Message-ID: <200006260013.SAA17378@localhost.localdomain> > On Sun, Jun 25, 2000 at 10:08:38AM -0600, Uche Ogbuji wrote: > > > > Jim Fulton wrote: > > > > > I've also looked at http://www.omg.org/cgi-bin/doc?ptc/00-01-12, > > > which is similar and also leaves attribute mapping unspecified. :( > > > > Just as a note to all, on the do-sig, Jim was already pointed to the right > > clause in the Python/CORBA mapping, and Martin von Lowis also explained why it > > is so (avoiding IDL name-clashes). > > > > The remaining debate is whether to follow the Python/CORBA mapping. I say, no > > reason not to do so. There is nothing normative about leading "_" being > > private that I know of. It ends up being a question of style versus spec > > unification, and I'm always wont to go for the latter. > > Nothing normative? Come on. That is simply an antagonistic position. That is simply unfair. > "from foo import *" is bad taste, but it codifies the notion of "_" being > private. (symbols starting with "_" are not imported) I was not aware of this (I never use from foo import *). It is definitely a point in the leading-underscore-haters camp. > Years and years of "_" usage meaning "private" also codify it. And you say > "normative" ... what? Do we need to write an RFC to satisfy you? And do you > also recognize that many things that occur in the RFC are there simply > because they *ARE* de facto standards? The use of leading "__" to indicate private is not in any RFC as I know it, yet I'm understand it to be normative. So it's not so hard to convince me. > I believe it is an entirely untenable position to state that "_" does not > mean "private". So you believe. Maybe you can make me believe the same. > IMO, the Python/CORBA mapping is simply stupid if it demands leading > underscores for any public item. Way stupid, and totally ignorant of > Python's de facto standards. Strongly stated, but hardly conclusive, I think. The "from foo import *" is the closest you came to actually justifying why the Python/CORBA mapping is "stupid". Even that, IMO, is not conclusive because it has about the strength of its own obverse: "never use 'from foo import *'". -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uogbuji@fourthought.com Mon Jun 26 01:37:59 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 25 Jun 2000 18:37:59 -0600 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: Message from Greg Stein of "Sun, 25 Jun 2000 16:24:07 PDT." <20000625162407.K29590@lyra.org> Message-ID: <200006260037.SAA17412@localhost.localdomain> I should note that I quite hope that leading-underscore-means-private is indeed not normative, and never will be so. The leading underscore is the most readable way to escape symbol names that would clash with an applicable naming convention. It's far from a conclusive argument (though Jim Fulton tried to make a similar argument to excoriate the Python/CORBA binding), but most languages, such as C++ allow exactly this approach as put to good use by the C++/CORBA binding. The leading-underscore-is-private idea has the annoying effect that if I want to call a variable "class", I must instead use the silly "klass", rather than "_class", which is far more readable and self-explanatory. And if I want to call variables "def", "type" and "else", what then? "deph", "tipe" and "els"? (I suppose one could use trailing underscore). Most likely, Guido is already in his time machine writing "Thou shalt not use leading underscore except for private variables" on a stone tablet somewhere in the past to end the whole argument. But people have been saying nasty things about the Python/CORBA binding which wouldn't be as nasty as the things I'd say about such a restriction in Python. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From jim@digicool.com Mon Jun 26 14:07:10 2000 From: jim@digicool.com (Jim Fulton) Date: Mon, 26 Jun 2000 09:07:10 -0400 Subject: [XML-SIG] Re: [DO-SIG] Python language bidning January 2000 Draft References: <20000623200845.AA7F41CD0C@dinsdale.python.org> <3953C729.C17E75AF@digicool.com> <3953C9D1.410D04C9@digicool.com> <200006241707.TAA01594@loewis.home.cs.tu-berlin.de> Message-ID: <3957557E.4F843F0@digicool.com> Let me start by explaining to the do-sig why I all of a sudden care about mapping of IDL attributes in the Python CORBA mapping. I'm doing alot of work with XML these days and with the XML Document Object Model (DOM), in particular. Now, these DOM standard, http://www.w3.org/DOM/, is specified as an IDL interface. Further, key parts of this interface use attributes. :( Waaaaaa. The end result is that an important Python and Zope interface, DOM, is substantially affected by the choice of a language mapping for CORBA IDL attributes! This is ironic since many (most) applications of DOM in Python will have nothing to do with CORBA! "Martin v. Loewis" wrote: > > > OK, so, given that leading '_'s traditionally indicate private > > attributes/members in Python, *why* does the Python language > > mapping use leading '_'s in methods generated from IDL? > > People might have two problems with that mapping, one is, why map > attributes to methods at all (instead of mapping them to attributes); > it seems that you are not questioning that decision - if you do, I'll > happily elaborate on that. I have sympathies for both points of view on that. I'm not opposed to attributes with comptation behind them. We do this alot in Zope, however, without alot of infrastructure, they are a royal pain in the butt to implement, especially if you want to implement them efficiently. :) I'd be extremely interested in hearing your view on that. > As to the specific choice of names: get_ and set_ as prefixes are also > not debated, right? Now, consider mapping > > readonly attribute long foo; > > to > > def get_foo(self) > > This gives a (admittedly pedantic) opportunity to write: > > readonly attribute long foo; > void get_foo(in string bar); > > Now, which of those should be mapped to get_foo? I agree that this is a problem. However, if given the choice of having to deal with this problem or creating APIs with leading '_'s. I'd really rather just deal with this problem, especially given the advice not to use attributes in the first place. Note that I really don't care what spelling is used as long as it doesn't start with '_'s. The C++ mapping seems to address the problem above by using access functions of the same name as the attribute. > > This seems like a really bad choice to me. > > Given the alternatives (i.e. non-trivial conflict resolution > mechanisms), this seems the best choice to me. OK, we disagree. I think creating API's with leading underscores is too problematic. It breaks a standard Python idiom. (And, I admit, it goes against a Zope security rule that doesn't allow access to names with leading '_'s. :]) > I'd rather draw another conclusion: Using attributes in IDL is a bad > design choice. Languages need to support them, but I really consider > them bad design. For example, you can't give errors on attribute > access (except for system exceptions); instead of deprecating them, > the CCM people proposed to add exception support to attributes for > CORBA 3... I agree, however, there is a *very* important API, that I have to deal with, which is DOM. Many critical features in DOM are specified as IDL attributes. With the current Python mapping, this leads to a rather badly spelled API. Of course, another solution would be to ignore the fact that DOM is specifed in IDL and provide a CORBA-independent DOM API. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From ken@bitsko.slc.ut.us Mon Jun 26 14:42:18 2000 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 26 Jun 2000 08:42:18 -0500 Subject: [XML-SIG] Re: [DO-SIG] Python language bidning January 2000 Draft In-Reply-To: Jim Fulton's message of "Mon, 26 Jun 2000 09:07:10 -0400" References: <20000623200845.AA7F41CD0C@dinsdale.python.org> <3953C729.C17E75AF@digicool.com> <3953C9D1.410D04C9@digicool.com> <200006241707.TAA01594@loewis.home.cs.tu-berlin.de> <3957557E.4F843F0@digicool.com> Message-ID: Jim Fulton writes: > Let me start by explaining to the do-sig why I all of a sudden care > about mapping of IDL attributes in the Python CORBA mapping. > > I'm doing alot of work with XML these days and with the XML Document > Object Model (DOM), in particular. Now, these DOM standard, > http://www.w3.org/DOM/, is specified as an IDL interface. Further, > key parts of this interface use attributes. :( Waaaaaa. > > The end result is that an important Python and Zope interface, DOM, > is substantially affected by the choice of a language mapping for > CORBA IDL attributes! This is ironic since many (most) applications > of DOM in Python will have nothing to do with CORBA! There's a long thread on this from November, "foo.bar vs. foo.get_bar()" and "4DOM future": IIRC, the conclusion, that you'll probably be very happy with, is that all the Python DOMs support direct attribute access for attribute members in the DOM IDL, _in_addition_to_ using procedure call access. -- Ken From jim@digicool.com Mon Jun 26 15:09:57 2000 From: jim@digicool.com (Jim Fulton) Date: Mon, 26 Jun 2000 10:09:57 -0400 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <3953A717.5289DCC8@digicool.com> <14675.49532.375238.979659@mailhost.beopen.com> <3953E20C.6239D946@prescod.net> Message-ID: <39576435.36751D54@digicool.com> Paul Prescod wrote: > > "Fred L. Drake, Jr." wrote: > > > > ... > > > > > Is there are description somewhere of the Python DOM mapping, > > > other than the DOM sources? > > > > The W3C documentation gives the IDL mapping, which requires the > > Python specific mapping. > > Actually, the DOM can be mapped into a language in a manner that does > not follow directly from the IDL and CORBA specs. That's why there is a > formally defined java binding rather than just a reference to the IDL > specs. Historically, though, 4DOM was really a CORBA tool so it really > needed to follow the specs. Whatever we do, there needs to be a document somewhere that says what the Python DOM mapping is, even if it is not much more than a reference to the DOM IDL and the Python binding. > I would vote for losing the leading underscore. :) > > This says to me that the DOM API specifies use of methods > > for interface attributes. > > I think it is safe to say that a binding should not require a particular > underlying data structure but Python allows the use of a.b syntax even > when the surface structure is wildly different than the underlying data > structure. I'm preaching to the choir as you are the world leader in > abuse of the dot notation. :) Yup, however __getattr__ is a pain to utilize unless you have alot of infrustructure. Zope has support for computed attributes, which makes this pretty sane, especially for read-only attributes. I'm working on a new version of StructuredText, StructuredText NG, http://www.zope.org/Members/jim/StructuredTextWiki/StructuredTextNG, which, among other things, creates objects with a DOM interface. I want this to be independent of Zope, so I can't use any of the standard Zope getattr tricks. In general, I think it would be cool of lots of objects supported the DOM interface and I hate to make people jump over the getattr barrier to do that. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From dgrisby@uk.research.att.com Mon Jun 26 15:35:06 2000 From: dgrisby@uk.research.att.com (Duncan Grisby) Date: Mon, 26 Jun 2000 15:35:06 +0100 Subject: [XML-SIG] Re: [DO-SIG] Python language bidning January 2000 Draft In-Reply-To: Message from Ken MacLeod of "26 Jun 2000 08:42:18 CDT." Message-ID: <200006261435.PAA19126@pineapple.uk.research.att.com> On Monday 26 June, Ken MacLeod wrote: > There's a long thread on this from November, "foo.bar > vs. foo.get_bar()" and "4DOM future": > > > IIRC, the conclusion, that you'll probably be very happy with, is that > all the Python DOMs support direct attribute access for attribute > members in the DOM IDL, _in_addition_to_ using procedure call access. Is DOM intended to ever be used in a full distributed environment? If so, supporting direct attribute access is surely a bad idea. Any code which uses direct attribute access will have to be changed to use the _get and _set operations expected by a CORBA ORB. Looking at the IDL used by DOM, it looks like the W3C don't intend it to be used with CORBA. IDL like attribute DOMString nodeValue; // raises(DOMException) on setting // raises(DOMException) on retrieval shows a clear disregard for the semantics of CORBA IDL. That isn't a Python issue, of course. Even ignoring things like that, the IDL isn't CORBA 2.3 compliant. Just for the amusement value, here's a list of the errors in it. dom.idl:118: Identifier `supports' clashes with keyword `supports' html.idl:191: Identifier `readOnly' clashes with keyword `readonly' html.idl:211: Identifier `readOnly' clashes with keyword `readonly' html.idl:383: Identifier `valueType' clashes with keyword `valuetype' html.idl:395: Identifier `object' clashes with keyword `Object' css.idl:143: Identifier `valueType' clashes with keyword `valuetype' range.idl:38: Declaration of interface `Range' clashes with name of enclosing scope `range' range.idl:21: (`range' declared here) Aren't standards great! Cheers, Duncan. -- -- Duncan Grisby \ Research Engineer -- -- AT&T Laboratories Cambridge -- -- http://www.uk.research.att.com/~dpg1 -- From tpassin@home.com Mon Jun 26 15:43:59 2000 From: tpassin@home.com (tpassin@home.com) Date: Mon, 26 Jun 2000 10:43:59 -0400 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <3953A717.5289DCC8@digicool.com> <14675.49532.375238.979659@mailhost.beopen.com> <3953E20C.6239D946@prescod.net> <39576435.36751D54@digicool.com> Message-ID: <003001bfdf7c$f6c35b40$7cac1218@reston1.va.home.com> Jim Fulton wrote - > Paul Prescod wrote: > > > > "Fred L. Drake, Jr." wrote: > > > > > > ... > > > > > > > Is there are description somewhere of the Python DOM mapping, > > > > other than the DOM sources? > > > > > > The W3C documentation gives the IDL mapping, which requires the > > > Python specific mapping. > > > > Actually, the DOM can be mapped into a language in a manner that does > > not follow directly from the IDL and CORBA specs. That's why there is a > > formally defined java binding rather than just a reference to the IDL > > specs. Historically, though, 4DOM was really a CORBA tool so it really > > needed to follow the specs. > > Whatever we do, there needs to be a document somewhere that > says what the Python DOM mapping is, even if it is not > much more than a reference to the DOM IDL and the Python > binding. > Actually, the W3C DOM standard said that all the bindings were derived from an underlying XML original: "As stated earlier, all object definitions are specified in XML. The Java bindings, OMG IDL bindings, and ECMA Script bindings are all generated automatically from the XML source code. This is possible because the information specified in XML is a superset of what these other syntax need. This is a general observation, and the same kind of technique can be applied to many other areas: given rich structure, rich processing and conversion are possible. For Java and OMG IDL, it is basically just a matter of renaming syntactic keywords; for ECMA Script, the process is somewhat more involved." Perhaps if we were starting from scratch again, emphasizing the XML base instead of CORBA mappings (I assume these were worked out before the XML work was really off the ground), we'd get a different solution. It's probably too late for that since things are pretty far along. But since the W3C DOM was not developed in IDL, there would seem to be no strong reason to be limited by the some particular IDL mapping techinique instead of the underlying XML. As I said in my post on this from last November, consistency in naming is important, which is part of what Jim is getting at too. Tom Passin From fdrake@beopen.com Mon Jun 26 15:43:36 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Mon, 26 Jun 2000 07:43:36 -0700 (PDT) Subject: [XML-SIG] Re: [DO-SIG] Python language bidning January 2000 Draft In-Reply-To: <200006261435.PAA19126@pineapple.uk.research.att.com> References: <200006261435.PAA19126@pineapple.uk.research.att.com> Message-ID: <14679.27672.177367.5672@mailhost.beopen.com> Duncan Grisby writes: > Is DOM intended to ever be used in a full distributed environment? If > so, supporting direct attribute access is surely a bad idea. Any code > which uses direct attribute access will have to be changed to use the > _get and _set operations expected by a CORBA ORB. In a distributed environment, client-side attribute access, even if made to look "direct" (foo.bar), would have to map to the distributed _get_ and _set_ interfaces. This is actually very reasonable for Python, but is also an extension of the current mapping. > Looking at the IDL used by DOM, it looks like the W3C don't intend it > to be used with CORBA. IDL like > > attribute DOMString nodeValue; > // raises(DOMException) on setting > // raises(DOMException) on retrieval Ouch! So why even have the attribute? This appears useless; perhaps subclasses are expected to implement it in ways appropriate to the specific type? > shows a clear disregard for the semantics of CORBA IDL. That isn't a > Python issue, of course. Even ignoring things like that, the IDL isn't > CORBA 2.3 compliant. Just for the amusement value, here's a list of > the errors in it. Why oh why does anyone listen to the W3C anymore? Haven't they pretty much done themselves in for things like this? -sigh- -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From paul@prescod.net Mon Jun 26 15:48:19 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 26 Jun 2000 07:48:19 -0700 Subject: [XML-SIG] Python 1.6 XML APIs References: <200006251511.JAA13468@localhost.localdomain> Message-ID: <39576D33.3505EA93@prescod.net> Uche Ogbuji wrote: > >... > > I'll admit to its simplicity, and the opportunity for lazy instantiation is > attractive, but I don't think I'd tend to use it a great deal, which, I > imagine, is OK. I wouldn't expect XML power users to use it any more than they would xmllib. Perhaps when you want to do something quick and don't want it to be dependent on large software packages...(as you might use xmllib in the same situation). My primary concern is that we promised to maintain the simplicity of xmllib in Python 1.6. A simple SAX (if we keep it simple) is only a little bit more complicated than a xmllib. But full SAX (all handlers, features, etc. ) is a lot more complicated. I think that pulldom is simpler than either of them. -- Paul Prescod - Not encumbered by corporate consensus "Ivory towers are no longer in order. We need ivory networks. Today, sitting quietly and thinking is the world's greatest generator of wealth and prosperity." - http://www.bespoke.org/viridian/print.asp?t=140 From fdrake@beopen.com Mon Jun 26 15:57:48 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Mon, 26 Jun 2000 07:57:48 -0700 (PDT) Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: <003001bfdf7c$f6c35b40$7cac1218@reston1.va.home.com> References: <3953A717.5289DCC8@digicool.com> <14675.49532.375238.979659@mailhost.beopen.com> <3953E20C.6239D946@prescod.net> <39576435.36751D54@digicool.com> <003001bfdf7c$f6c35b40$7cac1218@reston1.va.home.com> Message-ID: <14679.28524.752264.164183@mailhost.beopen.com> tpassin@home.com writes: > Actually, the W3C DOM standard said that all the bindings were derived from > an underlying XML original: I don't recall the quoted text from the version of the recommendation I read, but it probably wasn't a recommendation at the time either! (And I may have just missed it.) > Perhaps if we were starting from scratch again, emphasizing the XML base > instead of CORBA mappings (I assume these were worked out before the XML > work was really off the ground), we'd get a different solution. It's > probably too late for that since things are pretty far along. But since the > W3C DOM was not developed in IDL, there would seem to be no strong reason to > be limited by the some particular IDL mapping techinique instead of the > underlying XML. We might end up with a different result, but would it be as (potentially) useful? I don't see how. If the IDL is normative (and I don't see anything saying otherwise as I look at the document), then it seems it must be supported to be compliant. Is the IDL non-normative, and I just missed the notation in the recommendation? > As I said in my post on this from last November, consistency in naming is > important, which is part of what Jim is getting at too. Agreed, which is why I don't like seeing several mappings in the W3C recommendation. There should be only one, and the IDL is the right one for that. If any of the languages don't have IDL mappings, that should be dealt with by either creating an interim mapping that does just enough, or by writing a DOM-specific binding in a separate document. -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From paul@prescod.net Mon Jun 26 16:06:29 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 26 Jun 2000 08:06:29 -0700 Subject: [XML-SIG] Python 1.6 XML APIs References: <200006251511.JAA13468@localhost.localdomain> Message-ID: <39577175.82217DB9@prescod.net> Uche Ogbuji wrote: > > ... > > I do feel a bit better about the API now that I've seent Paul's longer pulldom > example, which does seem to conform to the Python/DOM mapping (I'll assume the > "child_nodes" in the earlier example was a typo). However, there are still > the inevitable extansions (ParseString, AttributeList, etc.), which, while > they seem innocuous to me so far, may be worth scrutinizing over time. Let's be careful not to hold the XML library to a higher standard than other Python libraries. If you have access to the CVS repositories, take a look at the documentation of winreg. Or even the ancient "time()" module with its 15-item tuples...or at the various deprecations scattered throughout the documentation. Anyhow, there are these different issues: 1. is the implementation of minidom up to snuff I can demonstrate that the implementation is up to snuff by integrating it with our various existing test codes for DOM data: including 4thought stuff (like 4XPath and 4XSLT). Now would be a good time to tell me if that stuff relies on any 4DOM-specific features. I'll roll back the AttributeList change to improve compatibility and increase comfort. People wanting strings can use getAttribute( ... ) 2. are the new minidom parsing functions good. There are only two parsing functions and they are explicitly designed to be extensible. If with our combined experience we can't figure that out in a few hours, something is wrong! 3. is pulldom any good Once again, we're talking about only two or three methods. There isn't much to get wrong there! 4. Jim's new _get... problem (thanks Jim!) Any sane Python programmer is going to use the attribute versions. Any Python DOM implementation can trivially support both the CORBA versions and the attribute versions. I don't see a big problem here. -- Paul Prescod - Not encumbered by corporate consensus" Ivory towers are no longer in order. We need ivory networks. Today, sitting quietly and thinking is the world's greatest generator of wealth and prosperity." - http://www.bespoke.org/viridian/print.asp?t=140 From walter@livinglogic.de Mon Jun 26 16:15:53 2000 From: walter@livinglogic.de (Walter Doerwald) Date: Mon, 26 Jun 2000 17:15:53 +0200 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: <200006260037.SAA17412@localhost.localdomain> References: <20000625162407.K29590@lyra.org> Message-ID: <4.3.1.0.20000626171543.00b28860@mail.tmt.de> At 02:37 26.06.00, you wrote: I should note that I quite hope that leading-underscore-means-private is indeed not normative, and never will be so. The leading underscore is the most readable way to escape symbol names that would clash with an applicable naming convention. It's far from a conclusive argument (though Jim Fulton tried to make a similar argument to excoriate the Python/CORBA binding), but most languages, such as C++ allow exactly this approach as put to good use= by the C++/CORBA binding. The leading-underscore-is-private idea has the annoying effect that if I= want to call a variable "class", I must instead use the silly "klass", rather= than "_class", which is far more readable and self-explanatory. And if I want to call variables "def", "type" and "else", what then? "deph", "tipe" and= "els"? (I suppose one could use trailing underscore). Most likely, Guido is already in his time machine writing "Thou shalt not= use leading underscore except for private variables" on a stone tablet somewhere in the past to end the whole argument. But people have been saying nasty things about the Python/CORBA binding which wouldn't be as nasty as the= things I'd say about such a restriction in Python. Well the Python style guide= (http://www.python.org/doc/essays/styleguide.html) says: * _single_leading_underscore: weak "internal use" indicator (e.g. "from M import *" does not import objects whose name starts with an underscore). * single_trailing_underscore_: used by convention to avoid conflicts= with Python keyword, e.g. Tkinter.Toplevel(master, class_=3D"ClassName"). Bye, Walter D=F6rwald -- Walter D=F6rwald =B7 LivingLogic AG =B7 Bayreuth, Germany =B7 www.livinglogi= c.de From paul@prescod.net Mon Jun 26 16:18:46 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 26 Jun 2000 08:18:46 -0700 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <3953A717.5289DCC8@digicool.com> <3953E20C.6239D946@prescod.net> <14678.24248.681373.592096@lindm.dm> Message-ID: <39577456.F36E9123@prescod.net> Dieter Maurer wrote: > > ... > I would vote against. > > DOM is specified in terms of IDL. > Python has an IDL -> Python mapping. > Deviating from this mapping for DOM only would require special > knowledge -- a thing I do not like. Less than one in a hundred DOM users (especially minidom users!) will know or care about the original IDL. -- Paul Prescod - Not encumbered by corporate consensus "Ivory towers are no longer in order. We need ivory networks. Today, sitting quietly and thinking is the world's greatest generator of wealth and prosperity." - http://www.bespoke.org/viridian/print.asp?t=140 From paul@prescod.net Mon Jun 26 16:21:08 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 26 Jun 2000 08:21:08 -0700 Subject: [XML-SIG] repost: getElementsByTagName interpretation References: <20000623170913.A26630@vet.uu.nl> <39538215.D9E32589@prescod.net> <20000626010430.A29091@vet.uu.nl> Message-ID: <395774E4.8C7A48D4@prescod.net> Martijn Faassen wrote: > > Paul Prescod wrote: > > Minidom does not do live nodelists. > > Is minidom now the 'official Python DOM'? I.e. what's the status of > 4DOM at the moment? Official according to whom? :) Obviously if minidom got into Python 1.6 that would be a certain kind of officialdom. As long as 4DOM remains the primary DOM distributed with the xml package that's another kind of officialdom. There is no suggestion of phasing out the merged 4/PyDOM package. It is minidom's grown up big brother. -- Paul Prescod - Not encumbered by corporate consensus "If I say something, yet it does not fill you with the immediate burning desire to voluntarily show it to everyone you know, well then, it's probably not all that important." - http://www.bespoke.org/viridian/ From paul@prescod.net Mon Jun 26 16:28:02 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 26 Jun 2000 08:28:02 -0700 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <3953A717.5289DCC8@digicool.com> <14675.49532.375238.979659@mailhost.beopen.com> <3953E20C.6239D946@prescod.net> <39576435.36751D54@digicool.com> Message-ID: <39577682.5269D601@prescod.net> Jim Fulton wrote: > > ... > > Yup, however __getattr__ is a pain to utilize unless you have alot of > infrustructure. Zope has support for computed attributes, which makes > this pretty sane, especially for read-only attributes. a) I think all that you need is a base class. Minidom uses one and it seems to work. Anyhow, inherting from "node" is good practice in any DOM extension framework. b) I have been pushing for computed attributes in standard Python for about three or four years. If this gives someone the impetus to implement it, I won't be overly distraught. :) - Python-1.7-is-just-around-the-corner-ly 'yrs -- Paul Prescod - Not encumbered by corporate consensus "If I say something, yet it does not fill you with the immediate burning desire to voluntarily show it to everyone you know, well then, it's probably not all that important." - http://www.bespoke.org/viridian/ From mclay@nist.gov Mon Jun 26 18:38:44 2000 From: mclay@nist.gov (Michael McLay) Date: Mon, 26 Jun 2000 13:38:44 -0400 (EDT) Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: <200006260037.SAA17412@localhost.localdomain> References: <20000625162407.K29590@lyra.org> <200006260037.SAA17412@localhost.localdomain> Message-ID: <14679.38180.290118.619833@fermi.eeel.nist.gov> Uche Ogbuji writes: > I should note that I quite hope that leading-underscore-means-private > is indeed not normative, and never will be so. The leading underscore > is the most readable way to escape symbol names that would clash with > an applicable naming convention. The use of a single "_" at the front of a name in Python has always (at least since about Python 1.2) been used to filter out names that should not be used outside a module. It is more than an idiom. The concept is enforced in statements of the form "from foo import *". It isn't absolutely enforced. You can still say "from foo import _thing". Arguing that other languages use "_" to eliminate name conflicts is a red-herring when talking about Python. Python namespaces make this unnecessary. It is annoying to some Python users to see an idiom from another language carried over to Python. In this case the carryover directly conflicts with a Python language feature. > It's far from a conclusive argument > (though Jim Fulton tried to make a similar argument to excoriate the > Python/CORBA binding), but most languages, such as C++ allow exactly > this approach as put to good use by the C++/CORBA binding. > The leading-underscore-is-private idea has the annoying effect that if > I want to call a variable "class", I must instead use the silly > "klass", rather than "_class", which is far more readable and > self-explanatory. And if I want to call variables "def", "type" and > "else", what then? "deph", "tipe" and "els"? (I suppose one could > use trailing underscore). Why not use append a "_" to the name to distinguish it from a keyword. The name "class_" would be as readable and it wouldn't conflict with the longstanding Python rule for leading "_" characters. > Most likely, Guido is already in his time machine writing "Thou shalt > not use leading underscore except for private variables" on a stone > tablet somewhere in the past to end the whole argument. But people > have been saying nasty things about the Python/CORBA binding which > wouldn't be as nasty as the things I'd say about such a restriction in > Python. The special meaning of _* is defined here: http://www.python.org/doc/current/ref/id-classes.html The rule is also discussed in section 6.1 of the Tutorial. From paul@prescod.net Mon Jun 26 16:38:49 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 26 Jun 2000 08:38:49 -0700 Subject: [XML-SIG] Re: [DO-SIG] Python language bidning January 2000 Draft References: <200006261435.PAA19126@pineapple.uk.research.att.com> <14679.27672.177367.5672@mailhost.beopen.com> Message-ID: <39577908.AE7495AA@prescod.net> "Fred L. Drake, Jr." wrote: > > Looking at the IDL used by DOM, it looks like the W3C don't intend it > to be used with CORBA. They wanted to use a formalism. They didn't want to invent a formalism merely because invention is more work than stealing. They did not intend the formalism to depend on CORBA implementation or semantics. The DOM is inherently "flexible" in ways that make blind inter-language interoperability unlikely or impossible to start with. It's a template. A set of ideas. A portable pattern. Standards are important but they are only important insofar as the buy interoperability. Slavish conformance to the IDL or to the CORBA mapping does not (as far as I know) buy interoperability because, as far as I know, hardly anyone is sending DOM methods over CORBA. Let's not even think too hard about the performance problems involved there. So let's design for the market we know we have (Python programmers who want an easy API) and not the market that I don't think we have (people who want to use Python DOMs from other languages and other language DOMs from Python). Interoperability among Python DOMs is enough. Bridges to Java and Microsoft COM DOMs would also be useful (and easy to write). -- Paul Prescod - Not encumbered by corporate consensus "If I say something, yet it does not fill you with the immediate burning desire to voluntarily show it to everyone you know, well then, it's probably not all that important." - http://www.bespoke.org/viridian/ From paul@prescod.net Mon Jun 26 16:55:21 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 26 Jun 2000 08:55:21 -0700 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? Message-ID: <39577CE9.E391E8E0@prescod.net> [this is a retry...mailman error at the bottom] "Fred L. Drake, Jr." wrote: > >.... > > I don't recall the quoted text from the version of the > recommendation I read, but it probably wasn't a recommendation at the > time either! (And I may have just missed it.) I don't recall it either. I think that the fact that the DOM IDL is generated from XML is merely an implementation choice. > We might end up with a different result, but would it be as > (potentially) useful? I don't see how. If the IDL is normative (and > I don't see anything saying otherwise as I look at the document), then > it seems it must be supported to be compliant. Is the IDL > non-normative, and I just missed the notation in the recommendation? Let's be blunt: there is nothing about the DOM that is normative. Extensions are encouraged, subsets are encouraged. Langauge-friendly, non-CORBA language bindings are encouraged (and provided). Here's the answer I got when I asked about the Java binding (3!) years ago: > This is a good question, that we have wrestled with in the WG. Basically, > we want to "hand code" the Java mapping because the IDL mapping is sure to > be more obscure than we want, and add arguments that are relevant to CORBA > RPC calls, but would be "overkill" for the DOM's primary purpose, i.e., a > platform-independent API for dynamic scripting in HTML/XML browsers and > editors. > > In short, Java is such an important target, we want to "hand-tune" the > binding to be maximally understandable and useable by our target audience. and > We are using the IDL as an abstract way of specifying interfaces, not as a > way of defining distributed systems. Looking at the output of Sun's > idltojava, it is very complex, and most of this complexity has nothing to > do with what we are actually trying to accomplish. I think the best way to > see this may be to download the idltojava program from Sun and compare its > output with the code we created. http://lists.w3.org/Archives/Public/www-dom/1997OctDec/0054.html > Agreed, which is why I don't like seeing several mappings in the W3C > recommendation. There should be only one, and the IDL is the right > one for that. If any of the languages don't have IDL mappings, that > should be dealt with by either creating an interim mapping that does > just enough, or by writing a DOM-specific binding in a separate > document. The DOM working group says that for Java and Javascript, usability is more important than CORBA compliance. I think that the same goes for Python. That's why I use and advocate attribute syntax. -- Paul Prescod - Not encumbered by corporate consensus "Ivory towers are no longer in order. We need ivory networks. Today, sitting quietly and thinking is the world's greatest generator of wealth and prosperity." - http://www.bespoke.org/viridian/print.asp?t=140 This is the Postfix program at host dinsdale.python.org. I'm sorry to have to inform you that the message returned below could not be delivered to one or more destinations. For further assistance, please contact If you do so, please include this problem report. You can delete your own text from the message returned below. The Postfix program : Command died with status 1: "/home/mailman/mail/wrapper post xml-sig". Command output: Traceback (innermost last): File "/home/mailman/scripts/post", line 33, in ? from Mailman import MailList File "/home/mailman/Mailman/MailList.py", line 36, in ? from Mailman import LockFile MemoryError From paul@prescod.net Mon Jun 26 16:55:21 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 26 Jun 2000 08:55:21 -0700 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? Message-ID: <39577CE9.E391E8E0@prescod.net> [this is a retry...mailman error at the bottom] "Fred L. Drake, Jr." wrote: > >.... > > I don't recall the quoted text from the version of the > recommendation I read, but it probably wasn't a recommendation at the > time either! (And I may have just missed it.) I don't recall it either. I think that the fact that the DOM IDL is generated from XML is merely an implementation choice. > We might end up with a different result, but would it be as > (potentially) useful? I don't see how. If the IDL is normative (and > I don't see anything saying otherwise as I look at the document), then > it seems it must be supported to be compliant. Is the IDL > non-normative, and I just missed the notation in the recommendation? Let's be blunt: there is nothing about the DOM that is normative. Extensions are encouraged, subsets are encouraged. Langauge-friendly, non-CORBA language bindings are encouraged (and provided). Here's the answer I got when I asked about the Java binding (3!) years ago: > This is a good question, that we have wrestled with in the WG. Basically, > we want to "hand code" the Java mapping because the IDL mapping is sure to > be more obscure than we want, and add arguments that are relevant to CORBA > RPC calls, but would be "overkill" for the DOM's primary purpose, i.e., a > platform-independent API for dynamic scripting in HTML/XML browsers and > editors. > > In short, Java is such an important target, we want to "hand-tune" the > binding to be maximally understandable and useable by our target audience. and > We are using the IDL as an abstract way of specifying interfaces, not as a > way of defining distributed systems. Looking at the output of Sun's > idltojava, it is very complex, and most of this complexity has nothing to > do with what we are actually trying to accomplish. I think the best way to > see this may be to download the idltojava program from Sun and compare its > output with the code we created. http://lists.w3.org/Archives/Public/www-dom/1997OctDec/0054.html > Agreed, which is why I don't like seeing several mappings in the W3C > recommendation. There should be only one, and the IDL is the right > one for that. If any of the languages don't have IDL mappings, that > should be dealt with by either creating an interim mapping that does > just enough, or by writing a DOM-specific binding in a separate > document. The DOM working group says that for Java and Javascript, usability is more important than CORBA compliance. I think that the same goes for Python. That's why I use and advocate attribute syntax. -- Paul Prescod - Not encumbered by corporate consensus "Ivory towers are no longer in order. We need ivory networks. Today, sitting quietly and thinking is the world's greatest generator of wealth and prosperity." - http://www.bespoke.org/viridian/print.asp?t=140 This is the Postfix program at host dinsdale.python.org. I'm sorry to have to inform you that the message returned below could not be delivered to one or more destinations. For further assistance, please contact If you do so, please include this problem report. You can delete your own text from the message returned below. The Postfix program : Command died with status 1: "/home/mailman/mail/wrapper post xml-sig". Command output: Traceback (innermost last): File "/home/mailman/scripts/post", line 33, in ? from Mailman import MailList File "/home/mailman/Mailman/MailList.py", line 36, in ? from Mailman import LockFile MemoryError From gstein@lyra.org Mon Jun 26 20:23:20 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 26 Jun 2000 12:23:20 -0700 Subject: [XML-SIG] Re: [DO-SIG] Python language bidning January 2000 Draft In-Reply-To: <39577908.AE7495AA@prescod.net>; from paul@prescod.net on Mon, Jun 26, 2000 at 08:38:49AM -0700 References: <200006261435.PAA19126@pineapple.uk.research.att.com> <14679.27672.177367.5672@mailhost.beopen.com> <39577908.AE7495AA@prescod.net> Message-ID: <20000626122319.K29590@lyra.org> On Mon, Jun 26, 2000 at 08:38:49AM -0700, Paul Prescod wrote: >... > So let's design for the market we know we have (Python programmers who > want an easy API) and not the market that I don't think we have (people > who want to use Python DOMs from other languages and other language DOMs > from Python). Interoperability among Python DOMs is enough. Bridges to > Java and Microsoft COM DOMs would also be useful (and easy to write). Well said! I "violently agree" :-) with this position. Who the heck is going to expect their Python code to be compiled by a C++ compiler? The code simply is not going to port. And when a Python programmer writes his code by looking at similar C++ code, he is certainly going to be aware of the semantics of the foo() and foo(val) methods. He'll map those straight into attribute accesses. Adding complexity to the APIs to adhere to some non-Python language design is a bit wonky. Keep it simple, and keep it focused on the Python programmer. Cheers, -g -- Greg Stein, http://www.lyra.org/ From Mike.Olson@fourthought.com Mon Jun 26 17:45:26 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Mon, 26 Jun 2000 10:45:26 -0600 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <3953A717.5289DCC8@digicool.com> <14675.49532.375238.979659@mailhost.beopen.com> <3953E20C.6239D946@prescod.net> <39576435.36751D54@digicool.com> Message-ID: <395788A6.8C815B57@FourThought.com> Jim Fulton wrote: > > > > > Actually, the DOM can be mapped into a language in a manner that does > > not follow directly from the IDL and CORBA specs. That's why there is a > > formally defined java binding rather than just a reference to the IDL > > specs. Historically, though, 4DOM was really a CORBA tool so it really > > needed to follow the specs. > > Whatever we do, there needs to be a document somewhere that > says what the Python DOM mapping is, even if it is not > much more than a reference to the DOM IDL and the Python > binding. In 4DOM, we are actually moving away from __getattr__ (for speed). We've found that we can keep all of the data cached in attributes for when it is needed. The draw back is that 4DOM will break horribly if people access out side of the DOM interface (or 4DOM supported pythonic interface) and there is a bit of runtime performance lose, however that is minor compared to the perfomace hit from __getattr__. The only reason we still support the '_'* is for legacy. Mike > > > I would vote for losing the leading underscore. > > :) > > > > This says to me that the DOM API specifies use of methods > > > for interface attributes. > > > > I think it is safe to say that a binding should not require a particular > > underlying data structure but Python allows the use of a.b syntax even > > when the surface structure is wildly different than the underlying data > > structure. I'm preaching to the choir as you are the world leader in > > abuse of the dot notation. :) > > Yup, however __getattr__ is a pain to utilize unless you have alot of > infrustructure. Zope has support for computed attributes, which makes > this pretty sane, especially for read-only attributes. I'm working on > a new version of StructuredText, StructuredText NG, > http://www.zope.org/Members/jim/StructuredTextWiki/StructuredTextNG, > which, among other things, creates objects with a DOM interface. > I want this to be independent of Zope, so I can't use any of the standard > Zope getattr tricks. > > In general, I think it would be cool of lots of > objects supported the DOM interface and I hate to make people > jump over the getattr barrier to do that. > > Jim > > -- > Jim Fulton mailto:jim@digicool.com Python Powered! > Technical Director (888) 344-4332 http://www.python.org > Digital Creations http://www.digicool.com http://www.zope.org > > Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email > address may not be added to any commercial mail list with out my > permission. Violation of my privacy with advertising or SPAM will > result in a suit for a MINIMUM of $500 damages/incident, $1500 for > repeats. > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From paul@prescod.net Mon Jun 26 17:52:39 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 26 Jun 2000 09:52:39 -0700 Subject: [XML-SIG] Paul Prescod's pulldom References: <20000623143931.14879.qmail@www0l.netaddress.usa.net> Message-ID: <39578A57.FBBCBD05@prescod.net> Rosalie Dieteman wrote: > > I'm very new to XML, so I think I'd be a perfect test case for Paul's idea of > pulldom being a fast and easy way to teach someone XML. I volunteer! I guess you've found it already but if not: http://www.prescod.net/python/pulldom.html > My application: Writing a form and filling it with the values from an XML > file. I'm not following you here so I can't follow the rest of your message. Forms are usually user interfaces for people. Why would you write code to fill a form and then write code to fill it with values from an XML file? -- Paul Prescod - Not encumbered by corporate consensus The "war on drugs" began as a rhetorical flourish used by Richard Nixon... But as the Reagan, Bush and Clinton administrations poured billions of dollars into fighting drugs, the slogan slipped the reins of metaphor to become just a plain old war - with an army (DEA), an enemy (profiled minorities, the poor, the cities), a budget ($17.8 billion), and a shibboleth (the children). - "This is your bill of rights...on drugs", Harper's, Dec 1999 From Mike.Olson@fourthought.com Mon Jun 26 19:55:58 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Mon, 26 Jun 2000 12:55:58 -0600 Subject: [XML-SIG] Re: [DO-SIG] Python language bidning January 2000 Draft References: <20000623200845.AA7F41CD0C@dinsdale.python.org> <3953C729.C17E75AF@digicool.com> <3953C9D1.410D04C9@digicool.com> <200006241707.TAA01594@loewis.home.cs.tu-berlin.de> <3957557E.4F843F0@digicool.com> <395783D6.5200DCE@FourThought.com> <395797BB.E71AB751@digicool.com> Message-ID: <3957A73E.6D1E7D36@FourThought.com> Jim Fulton wrote: > > Mike Olson wrote: > > > > Jim Fulton wrote: > > > > > Then you advocate abandoning the CORBA IDL mapping for Python. > The Python mapping does *not* provide for using IDL attributes > as Python attributes. In fact, some folks in the DO-SIG seem to > feel strongly that doing so would be a really bad idea. From what I read, they feel strongly that attribute access over an ORB is not a good idea, for reasons of exceptions and the like. I don't remember seeing any complaints about attribute access of python objects. > > > Lets leave it up to implementors to determine > > how they want to spell the functions. After all, these functions really > > are "private". > > No, they aren't private, or at least, they aren't private if you > follow the IDL mapping. I've never acutally tried (so I may be wrong) but I don't think you can call _get_foo accross an ORB for an attribute called foo. In distributed world then thay are private. This was (as my original understanding goes) the intention of the _get* in DOM. Not ment to be accessed from the outside. Mike > > Jim > > -- > Jim Fulton mailto:jim@digicool.com Python Powered! > Technical Director (888) 344-4332 http://www.python.org > Digital Creations http://www.digicool.com http://www.zope.org > > Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email > address may not be added to any commercial mail list with out my > permission. Violation of my privacy with advertising or SPAM will > result in a suit for a MINIMUM of $500 damages/incident, $1500 for > repeats. -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From paul@prescod.net Mon Jun 26 16:38:49 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 26 Jun 2000 08:38:49 -0700 Subject: [XML-SIG] Re: [DO-SIG] Python language bidning January 2000 Draft References: <200006261435.PAA19126@pineapple.uk.research.att.com> <14679.27672.177367.5672@mailhost.beopen.com> Message-ID: <39577908.AE7495AA@prescod.net> "Fred L. Drake, Jr." wrote: > > Looking at the IDL used by DOM, it looks like the W3C don't intend it > to be used with CORBA. They wanted to use a formalism. They didn't want to invent a formalism merely because invention is more work than stealing. They did not intend the formalism to depend on CORBA implementation or semantics. The DOM is inherently "flexible" in ways that make blind inter-language interoperability unlikely or impossible to start with. It's a template. A set of ideas. A portable pattern. Standards are important but they are only important insofar as the buy interoperability. Slavish conformance to the IDL or to the CORBA mapping does not (as far as I know) buy interoperability because, as far as I know, hardly anyone is sending DOM methods over CORBA. Let's not even think too hard about the performance problems involved there. So let's design for the market we know we have (Python programmers who want an easy API) and not the market that I don't think we have (people who want to use Python DOMs from other languages and other language DOMs from Python). Interoperability among Python DOMs is enough. Bridges to Java and Microsoft COM DOMs would also be useful (and easy to write). -- Paul Prescod - Not encumbered by corporate consensus "If I say something, yet it does not fill you with the immediate burning desire to voluntarily show it to everyone you know, well then, it's probably not all that important." - http://www.bespoke.org/viridian/ From paul@prescod.net Mon Jun 26 16:40:39 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 26 Jun 2000 08:40:39 -0700 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <3953A717.5289DCC8@digicool.com> <14675.49532.375238.979659@mailhost.beopen.com> <3953E20C.6239D946@prescod.net> <39576435.36751D54@digicool.com> <003001bfdf7c$f6c35b40$7cac1218@reston1.va.home.com> <14679.28524.752264.164183@mailhost.beopen.com> Message-ID: <39577977.1D3CC12@prescod.net> It might be an interesting publicity move to try and get a Python binding published in future DOM specs. :) -- Paul Prescod - Not encumbered by corporate consensus "If I say something, yet it does not fill you with the immediate burning desire to voluntarily show it to everyone you know, well then, it's probably not all that important." - http://www.bespoke.org/viridian/ From Mike.Olson@fourthought.com Mon Jun 26 20:04:00 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Mon, 26 Jun 2000 13:04:00 -0600 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <3953A717.5289DCC8@digicool.com> <14675.49532.375238.979659@mailhost.beopen.com> <3953E20C.6239D946@prescod.net> <39576435.36751D54@digicool.com> <395788A6.8C815B57@FourThought.com> <395798C5.682588A@digicool.com> Message-ID: <3957A920.C8A6CB76@FourThought.com> Jim Fulton wrote: > > Mike Olson wrote: > > > > Jim Fulton wrote: > > > > > > > > > > > In 4DOM, we are actually moving away from __getattr__ (for speed). > > IMO, this is strong evidence that the Python DOM should > *not* use attributes for implementing the DOM/IDL attributes. Huh? I see exactly the opposit picture. Functions were only needed until all of the issues with current state could be worked out. Most of them arose in DOM L1, but have been clarified in DL2. Dom L3???? If we don't need the over head of __getattr__ override then why should we have it? Are you proposing all access through functions? > > > I'd like it to be as easy as possible for various objects to implement > the DOM. (See for example StructuredTextNG.) I'd hate to make implementers > go through the pain and performance hit of getattr or dictate an implementation > (like caching attributes or otherwise directly storing them, creating > memory leaks). To me, difficulty is more defined by the API then how we map it. We don't add any complexity to implementing the DOM interface to say that finctions should start with a '_' or end with a '_'. Its just a naming convention. Most of the difficulties in implementing the DOM are impementaiton specific. For an in memory version in python, you don't need computed atributes. If you want to write a pulldom, you probably will. If you are writing DOM in zope you will. DOM in C++ maybe. I think it should be left up to the implementator as to whether they need computed attributes. Mike > > Jim > > > -- > Jim Fulton mailto:jim@digicool.com Python Powered! > Technical Director (888) 344-4332 http://www.python.org > Digital Creations http://www.digicool.com http://www.zope.org > > Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email > address may not be added to any commercial mail list with out my > permission. Violation of my privacy with advertising or SPAM will > result in a suit for a MINIMUM of $500 damages/incident, $1500 for > repeats. -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From jim@digicool.com Mon Jun 26 18:49:47 2000 From: jim@digicool.com (Jim Fulton) Date: Mon, 26 Jun 2000 13:49:47 -0400 Subject: [XML-SIG] Re: [DO-SIG] Python language bidning January 2000 Draft References: <20000623200845.AA7F41CD0C@dinsdale.python.org> <3953C729.C17E75AF@digicool.com> <3953C9D1.410D04C9@digicool.com> <200006241707.TAA01594@loewis.home.cs.tu-berlin.de> <3957557E.4F843F0@digicool.com> <395783D6.5200DCE@FourThought.com> Message-ID: <395797BB.E71AB751@digicool.com> Mike Olson wrote: > > Jim Fulton wrote: > > > > > > > > Of course, another solution would be to ignore the fact that > > DOM is specifed in IDL and provide a CORBA-independent DOM API. > > Why not just discourage the use of the accessor and mutator functions > for the python mapping and say that attribute access should be done > through the attribute. Then you advocate abandoning the CORBA IDL mapping for Python. The Python mapping does *not* provide for using IDL attributes as Python attributes. In fact, some folks in the DO-SIG seem to feel strongly that doing so would be a really bad idea. > Lets leave it up to implementors to determine > how they want to spell the functions. After all, these functions really > are "private". No, they aren't private, or at least, they aren't private if you follow the IDL mapping. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From jim@digicool.com Mon Jun 26 18:54:13 2000 From: jim@digicool.com (Jim Fulton) Date: Mon, 26 Jun 2000 13:54:13 -0400 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <3953A717.5289DCC8@digicool.com> <14675.49532.375238.979659@mailhost.beopen.com> <3953E20C.6239D946@prescod.net> <39576435.36751D54@digicool.com> <395788A6.8C815B57@FourThought.com> Message-ID: <395798C5.682588A@digicool.com> Mike Olson wrote: > > Jim Fulton wrote: > > > > > > > > Actually, the DOM can be mapped into a language in a manner that does > > > not follow directly from the IDL and CORBA specs. That's why there is a > > > formally defined java binding rather than just a reference to the IDL > > > specs. Historically, though, 4DOM was really a CORBA tool so it really > > > needed to follow the specs. > > > > Whatever we do, there needs to be a document somewhere that > > says what the Python DOM mapping is, even if it is not > > much more than a reference to the DOM IDL and the Python > > binding. > > In 4DOM, we are actually moving away from __getattr__ (for speed). IMO, this is strong evidence that the Python DOM should *not* use attributes for implementing the DOM/IDL attributes. > We've found that we can keep all of the data cached in attributes for > when it is needed. The draw back is that 4DOM will break horribly if > people access out side of the DOM interface (or 4DOM supported pythonic > interface) and there is a bit of runtime performance lose, however that > is minor compared to the perfomace hit from __getattr__. The only > reason we still support the '_'* is for legacy. I'd like it to be as easy as possible for various objects to implement the DOM. (See for example StructuredTextNG.) I'd hate to make implementers go through the pain and performance hit of getattr or dictate an implementation (like caching attributes or otherwise directly storing them, creating memory leaks). Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From Mike.Olson@fourthought.com Mon Jun 26 18:07:06 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Mon, 26 Jun 2000 11:07:06 -0600 Subject: [XML-SIG] The '_' thingy Message-ID: <39578DBA.8FB64449@FourThought.com> So, I think I see this as a general concensius: 1. DOM will never (in forseeable future) be used over an ORB, so the IDL should be used as a guide. We should focus more on useability then CORBA compliance. 2. Most people will access the DOM via attributes. 3. We need a DOM language mapping document. 4. Computed attribute callback function names should be left up to the implementator (or do we want to define this). If we do define this, then they should be private, and start with an '_' or two. If all are good with this, then we should start down this path. A langauge mapping is something we can put into the next release of 4DOM (something we've been meaning to do any ways). The rest of the cahnges are actually in place (unless we define a different callback naming convention). We will be slowly depricating _get_* soon as well. However we will still need __setattr__ callbacks in some cases.... Mike -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uogbuji@fourthought.com Mon Jun 26 18:12:47 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 26 Jun 2000 11:12:47 -0600 Subject: [XML-SIG] Python 1.6 XML APIs In-Reply-To: Message from Paul Prescod of "Mon, 26 Jun 2000 08:06:29 PDT." <39577175.82217DB9@prescod.net> Message-ID: <200006261712.LAA01728@localhost.localdomain> > Uche Ogbuji wrote: > Anyhow, there are these different issues: > > 1. is the implementation of minidom up to snuff > > I can demonstrate that the implementation is up to snuff by integrating > it with our various existing test codes for DOM data: including 4thought > stuff (like 4XPath and 4XSLT). Now would be a good time to tell me if > that stuff relies on any 4DOM-specific features. Nope. It's all pure DOM. > I'll roll back the AttributeList change to improve compatibility and > increase comfort. People wanting strings can use getAttribute( ... ) Not so fast. Let's not lose a good idea. We could at least have an explicit NamedNodeMap -> AttributeList conversion and see how it plays. > 2. are the new minidom parsing functions good. > > There are only two parsing functions and they are explicitly designed to > be extensible. If with our combined experience we can't figure that out > in a few hours, something is wrong! > > 3. is pulldom any good > > Once again, we're talking about only two or three methods. There isn't > much to get wrong there! > > 4. Jim's new _get... problem (thanks Jim!) > > Any sane Python programmer is going to use the attribute versions. Any > Python DOM implementation can trivially support both the CORBA versions > and the attribute versions. I don't see a big problem here. I tend to think that the attribute version should be required and the CORBA version optional. I think that would make most people happy. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tpassin@home.com Mon Jun 26 17:05:13 2000 From: tpassin@home.com (tpassin@home.com) Date: Mon, 26 Jun 2000 12:05:13 -0400 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <3953A717.5289DCC8@digicool.com> <14675.49532.375238.979659@mailhost.beopen.com> <3953E20C.6239D946@prescod.net> <39576435.36751D54@digicool.com> <003001bfdf7c$f6c35b40$7cac1218@reston1.va.home.com> Message-ID: <006b01bfdf88$4ff277e0$7cac1218@reston1.va.home.com> I wrote - > Actually, the W3C DOM standard said that all the bindings were derived from > an underlying XML original: > > "As stated earlier, all object definitions are specified in XML. The Java > bindings, OMG IDL bindings, and ECMA Script bindings are all generated > automatically from the XML source code. > > This is possible because the information specified in XML is a superset of > what these other syntax need. This is a general observation, and the same > kind of technique can be applied to many other areas: given rich structure, > rich processing and conversion are possible. For Java and OMG IDL, it is > basically just a matter of renaming syntactic keywords; for ECMA Script, the > process is somewhat more involved." > Just to clarify, this quote came from the non-normative appendix entitled Production Notes in the level-1 DOM rec. The Introduction in the rec does say "In order to provide a precise, language-independent specification of the DOM interfaces, we have chosen to define the specifications in OMG IDL, as defined in the CORBA 2.2 specification." For those who have posted being unhappy wth "attributes", the rec says "Attributes defined in the IDL do not imply concrete objects which must have specific data members - in the language bindings, they are translated to a pair of get()/set() functions, not to a data member. (Read-only functions have only a get() function in the language bindings). DOM applications may provide additional interfaces and objects not found in this specification and still be considered DOM compliant." I have always thought that this requires the Python binding to use functions like get...(). But I suppose it's not really clear. For example, the IDL in the rec for Node says that nodeValue is an "attribute" of Node. The ECMAScript binding does not show a Node.setnodeValue() method. Presumably, though, it is implied that there should be one, for there is no other way specified to set the value. So I would think that the Python method should have the same name. But it seems that wiser heads than mine have already settled this. Tom Passin From jim@digicool.com Mon Jun 26 21:03:44 2000 From: jim@digicool.com (Jim Fulton) Date: Mon, 26 Jun 2000 16:03:44 -0400 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <3953A717.5289DCC8@digicool.com> <14675.49532.375238.979659@mailhost.beopen.com> <3953E20C.6239D946@prescod.net> <39576435.36751D54@digicool.com> <395788A6.8C815B57@FourThought.com> <395798C5.682588A@digicool.com> <3957A920.C8A6CB76@FourThought.com> Message-ID: <3957B720.9C6768D6@digicool.com> Mike Olson wrote: > > Jim Fulton wrote: > > > > Mike Olson wrote: > > > > > > Jim Fulton wrote: > > > > > > > > > > > > > > > > In 4DOM, we are actually moving away from __getattr__ (for speed). > > > > IMO, this is strong evidence that the Python DOM should > > *not* use attributes for implementing the DOM/IDL attributes. > > Huh? I see exactly the opposit picture. Functions were only needed > until all of the issues with current state could be worked out. What current state? > Most of > them arose in DOM L1, but have been clarified in DL2. Dom L3???? If we > don't need the over head of __getattr__ override then why should we have > it? DOM defines many useful attributes like parentNode and previousSibling. If these are exposed as attributes in the Python DOM mapping then either: - Implementors of the DOM must *store* these attributes or - implement getattr to provide access to the attributes via computation. I don't want to restrict DOM implementations to store these attributes and I don't want to burden DOM implementations with the hassle or overhead of implementing getattr. > Are you proposing all access through functions? Yes. > > I'd like it to be as easy as possible for various objects to implement > > the DOM. (See for example StructuredTextNG.) I'd hate to make implementers > > go through the pain and performance hit of getattr or dictate an implementation > > (like caching attributes or otherwise directly storing them, creating > > memory leaks). > > To me, difficulty is more defined by the API then how we map it. We > don't add any complexity to implementing the DOM interface to say that > finctions should start with a '_' or end with a '_'. Its just a naming > convention. We weem to be arguing two issues: - Whether to expose DOM attributes as Python attributes or accessor functions, and - How to spell the accessor functions. If we go with accessor functions, which I think would be a good idea, then the accessor functions should be names in a way that is consistent with Python practice. (Those of you who are familiar with Zope may appreciate that I have an extra reason to avoid '_'s because Zope incorporates the use of leading '_'s to indicate private attributes into it's security policies. Accessor functions with names beginning with '_'s are currently inaccessible to through-the web code and RPC.) > Most of the difficulties in implementing the DOM are > impementaiton specific. For an in memory version in python, you don't > need computed atributes. You do if you want to avoid circular references. > If you want to write a pulldom, you probably > will. I have no idea what this is. I don;t think I need to, :) > If you are writing DOM in zope you will. DOM in C++ maybe. > > I think it should be left up to the implementator as to whether they > need computed attributes. We could and should avoid the issue altogether by using access methods. If someone wants to store the attributes, then the access methods can simply return them. If computation is needed, then that's easy enough. By using attribute syntax, then you force people to deal with getattr unless they want circular references, at least for attributes that deal with parents, siblings and such. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From jim@digicool.com Mon Jun 26 17:50:42 2000 From: jim@digicool.com (Jim Fulton) Date: Mon, 26 Jun 2000 12:50:42 -0400 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <200006251553.JAA13600@localhost.localdomain> Message-ID: <395789E2.A5A6D789@digicool.com> Uche Ogbuji wrote: > > > Traditionally, Python attributes (including methods) with > > names starting with '_' were treated as private. > > This is an informal tradition, not universal, and hardly normative. I disagree on two points. - It is not entirely informal: o import * from foo imports only names that don't start with '_'. o Private attributes are based on a leading '_' spelling - Normative is hard to judge, but I think that this is a pretty widely used practice. > > Why oh why then does the Python DOM implementation use > > method names beginning with '_'s in the public API (for > > getting attributes), as in '_get_nodeType'? Why not > > 'get_nodeType' or 'getNodeType'? Is the intent that these > > functions shouldn't be called by Python code? > > We have it this way in order to follow the Python/CORBA mapping. OK, that's why I've taken this discussion to the do-sig and the OMG. :) I think it's worth questioning, however, whether the Python IDL bining *must* dictate the Python DOM API. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From dieter@handshake.de Mon Jun 26 18:54:26 2000 From: dieter@handshake.de (Dieter Maurer) Date: Mon, 26 Jun 2000 19:54:26 +0200 (CEST) Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: <39577456.F36E9123@prescod.net> References: <3953A717.5289DCC8@digicool.com> <39577456.F36E9123@prescod.net> Message-ID: <14679.38962.304031.299572@lindm.dm> Paul Prescod writes: > Dieter Maurer wrote: > > > > ... > > I would vote against. > > > > DOM is specified in terms of IDL. > > Python has an IDL -> Python mapping. > > Deviating from this mapping for DOM only would require special > > knowledge -- a thing I do not like. > > Less than one in a hundred DOM users (especially minidom users!) will > know or care about the original IDL. Anyone that reads the recommendation will see the IDL. Deviating from standards and creating their own variant is something I blame MS for. The Python community should not follow this bad habit, though the effect would not be as drastic. Dieter From tpassin@home.com Tue Jun 27 00:37:32 2000 From: tpassin@home.com (tpassin@home.com) Date: Mon, 26 Jun 2000 19:37:32 -0400 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <3953A717.5289DCC8@digicool.com> <14675.49532.375238.979659@mailhost.beopen.com> <3953E20C.6239D946@prescod.net> <39576435.36751D54@digicool.com> <395788A6.8C815B57@FourThought.com> <395798C5.682588A@digicool.com> <3957A920.C8A6CB76@FourThought.com> <3957B720.9C6768D6@digicool.com> Message-ID: <00a701bfdfc7$7fcb9e80$7cac1218@reston1.va.home.com> Jim Fulton continued the attributes thread - I still don't see why anyone is still arguing about whether the DOM rec makes Python use attributes. I doesn't. In fact, it says that what are called "attributes" in the IDL definitions are NOT supposed to be attributes in implementations, and that the get/set accessor functions don't have to store/retrieve from actual objects, let alone attributes of objects. So can we at least lay this part of it to rest? Now if most people think it is more 'Pythonic' to use attributes, or if there are clearcut performance benefits, then we have a basis for discussion. But let's quit talking about whether the DOM rec makes us do attributes. ... > > Are you proposing all access through functions? > > Yes. > I second this. > .... > > We weem to be arguing two issues: > > - Whether to expose DOM attributes as Python attributes or > accessor functions, and > > - How to spell the accessor functions. > > If we go with accessor functions, which I think would be > a good idea, then the accessor functions should be > names in a way that is consistent with Python practice. > >.... > > Most of the difficulties in implementing the DOM are > > impementaiton specific. For an in memory version in python, you don't > > need computed atributes. > > You do if you want to avoid circular references. > > > If you want to write a pulldom, you probably > > will. > > We could and should avoid the issue altogether by using access methods. > If someone wants to store the attributes, then the access methods > can simply return them. If computation is needed, then that's easy > enough. > > By using attribute syntax, then you force people to deal with > getattr unless they want circular references, at least for > attributes that deal with parents, siblings and such. Attributing-ly yours, Tom Passin From paul@prescod.net Mon Jun 26 18:24:53 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 26 Jun 2000 10:24:53 -0700 Subject: [XML-SIG] SAX Support Message-ID: <395791E5.50D73B92@prescod.net> Let's be clear on what we need for SAX support in Python 1.6. Here's the formal documentation for Python SAX: http://www.garshol.priv.no/download/software/saxlib/sax2/saxlib.html It looks solid to me. This makes sense because Lars has a lot of experience and was also building on the Java API. I think that our SAX support will be a single file/module called "saxparser". It will contain a driver for PyExpat, exception handling and default classes. The following classes are deprecated and thus will be ignored: AttributeList, Parser, DocumentHandler The following classes address features more complex/esoteric than we should undertake to code, test and document: DTDHandler, DeclHandler, EntityResolver, LexicalHandler, Locator These two classes are more useful in a statically typed environment: XMLFilter, InputSource That leaves: #1. Attributes: This is implemented as a wrapper on two dictionaries: Qname->value (URL, Localname)->value #2. ContentHandler: PyExpat will have a SAX 2 mode that uses ContentHandler calling conventions. A no-op base content handler will be provided #3. ErrorHandler: A default error handler will be provided. #4. various exception classes: provided #5. XMLReader: A PyExpat driver will implement this interface. Most of this is just packaging of code we already have. I plan to get what I can from Lars, the xml-sig distribution and elsewhere and integrate it tomorrow. I'd like to try for a checkin on Wednesday or Thursday. Does that plan make sense? Does this SAX subset make sense? -- Paul Prescod - Not encumbered by corporate consensus The "war on drugs" began as a rhetorical flourish used by Richard Nixon... But as the Reagan, Bush and Clinton administrations poured billions of dollars into fighting drugs, the slogan slipped the reins of metaphor to become just a plain old war - with an army (DEA), an enemy (profiled minorities, the poor, the cities), a budget ($17.8 billion), and a shibboleth (the children). - "This is your bill of rights...on drugs", Harper's, Dec 1999 From fdrake@beopen.com Mon Jun 26 23:19:57 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Mon, 26 Jun 2000 15:19:57 -0700 (PDT) Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: <39577CE9.E391E8E0@prescod.net> References: <39577CE9.E391E8E0@prescod.net> Message-ID: <14679.55053.6188.621688@mailhost.beopen.com> Paul Prescod writes: > The DOM working group says that for Java and Javascript, usability is > more important than CORBA compliance. I think that the same goes for > Python. That's why I use and advocate attribute syntax. Paul, From the IDL errors pointed out earlier and these comments, I'd have to conclude that the IDL definition should be removed from the recommendation (not present in the next rev., or whatever), and we should put together our own Python mapping that completely ignores all the naming conventions of the IDL and Java mappings and does the Python thing. The big issue there is the legacy code. So, are people using the _get_/_set_ methods or the attribute names? Why are these questions being brought back up so late in the game, anyway? -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From jim@digicool.com Mon Jun 26 23:40:31 2000 From: jim@digicool.com (Jim Fulton) Date: Mon, 26 Jun 2000 18:40:31 -0400 Subject: [XML-SIG] The '_' thingy References: <39578DBA.8FB64449@FourThought.com> Message-ID: <3957DBDF.40792117@digicool.com> Mike Olson wrote: > > So, I think I see this as a general concensius: Are you kidding? > 1. DOM will never (in forseeable future) be used over an ORB, so the > IDL should be used as a guide. Uh, this doesn't make sense. > We should focus more on useability then > CORBA compliance. > > 2. Most people will access the DOM via attributes. Who says? What do you have to support this? Most people will access the DOM through whatever interface we define. > 3. We need a DOM language mapping document. Yes, we have consensus on this. :) > 4. Computed attribute callback function names should be left up to the > implementator (or do we want to define this). If we do define this, > then they should be private, and start with an '_' or two. If accessor functions are part of the API, then they should not begin with '_'. > If all are good with this, I'm not. I would prefer to see accessor functions for DOM attributes that are a part of the API and whos names don't begin with '_'s. > then we should start down this path. Whatever path we start down, it should begin with a draft that documements the DOM mapping for Python. > A > langauge mapping is something we can put into the next release of 4DOM > (something we've been meaning to do any ways). The rest of the cahnges > are actually in place (unless we define a different callback naming > convention). We will be slowly depricating _get_* soon as well. > However we will still need __setattr__ callbacks in some cases.... Not if you go to accessor functions instead of attribute-based access. In summary, I think using attribute-based access for the Python DOM API would be a mistake because it will make efficient DOM implementations harder than necessary to create. I'd prefer to see accessor functions used to provide access to DOM attributes. There has, however, been relatively lettle discussion on this. I'm curious what opinions others have. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From martin@loewis.home.cs.tu-berlin.de Mon Jun 26 22:06:22 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 26 Jun 2000 23:06:22 +0200 Subject: [XML-SIG] Re: [DO-SIG] Python language bidning January 2000 Draft In-Reply-To: <3957557E.4F843F0@digicool.com> (message from Jim Fulton on Mon, 26 Jun 2000 09:07:10 -0400) References: <20000623200845.AA7F41CD0C@dinsdale.python.org> <3953C729.C17E75AF@digicool.com> <3953C9D1.410D04C9@digicool.com> <200006241707.TAA01594@loewis.home.cs.tu-berlin.de> <3957557E.4F843F0@digicool.com> Message-ID: <200006262106.XAA01096@loewis.home.cs.tu-berlin.de> > I'm doing alot of work with XML these days and with the > XML Document Object Model (DOM), in particular. Now, > these DOM standard, http://www.w3.org/DOM/, is specified > as an IDL interface. Further, key parts of this interface > use attributes. :( Waaaaaa. > > The end result is that an important Python and Zope interface, > DOM, is substantially affected by the choice of a language mapping > for CORBA IDL attributes! This is ironic since many (most) > applications of DOM in Python will have nothing to do with > CORBA! I fully appreciate the problem; there are a number of ways viewing it that make it look better: 1. While W3C uses OMG IDL to express the DOM, they explicitly don't use the OMG language mappings, as they may appear simpler without CORBA. For example, in the C++ mapping, you'd need to care about CORBA memory management and _narrow calls - which you can ignore, to a degree, if you know a local implementation. So the DOM IDL is merely meant as a guideline for defining the API, without prescribing the API for all languages. It would be perfectly ok if the is the DOM mapping to Python, and if this is different from the CORBA IDL mapping. It would be even "allowed" if each implementation was using its own mapping - since the API to the parser is non-standard (*), applications have to pick a specific parser, anyway. However, I'd agree that having consistent mappings is desirable. 2. The CORBA IDL language mapping only specifies the minimal available API; implementations are certainly free to provide extensions. For example, DOM implementations could readily offer mapping IDL attributes to Python attributes. They would still be mapping-compliant if the offered the accessor methods *in addition*. It would then be user's choice to pick one of these options. Regards, Martin (*) OMG currently considers the XML/valuetypes RFP, which aims at defining IDL values (as opposed to interfaces) for representing XML documents in a structured way. The current proposal is to use CORBA 2.3 valuetypes in a DOM-like API; they also define an API to a DOM parser. With that, there would be a standard way for CORBA applications to get a DOM tree given a linear XML document. From martin@loewis.home.cs.tu-berlin.de Mon Jun 26 22:23:52 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 26 Jun 2000 23:23:52 +0200 Subject: [XML-SIG] Re: [DO-SIG] Python language bidning January 2000 Draft In-Reply-To: <200006261435.PAA19126@pineapple.uk.research.att.com> (message from Duncan Grisby on Mon, 26 Jun 2000 15:35:06 +0100) References: <200006261435.PAA19126@pineapple.uk.research.att.com> Message-ID: <200006262123.XAA01196@loewis.home.cs.tu-berlin.de> > Is DOM intended to ever be used in a full distributed environment? I believe that's not explicitly excluded, and 4DOM, to my knowledge, also supports that mode of operation. The intent in the DOM clearly is that IDL was used as a convenient notation for an OO API, not as a CORBA interface. > Looking at the IDL used by DOM, it looks like the W3C don't intend it > to be used with CORBA. IDL like > > attribute DOMString nodeValue; > // raises(DOMException) on setting > // raises(DOMException) on retrieval > > shows a clear disregard for the semantics of CORBA IDL. That isn't a > Python issue, of course. Even ignoring things like that, the IDL isn't > CORBA 2.3 compliant. Well, the 'attribute raises' is part of the CCM submission, and thus potentially part of CORBA 3. In the light of this, mapping attributes to operations is a more logical choice - even though it would be possible to raise arbitrary exceptions from attribute access in Python. > Just for the amusement value, here's a list of > the errors in it. > > dom.idl:118: Identifier `supports' clashes with keyword `supports' > > html.idl:191: Identifier `readOnly' clashes with keyword `readonly' > html.idl:211: Identifier `readOnly' clashes with keyword `readonly' > html.idl:383: Identifier `valueType' clashes with keyword `valuetype' > html.idl:395: Identifier `object' clashes with keyword `Object' > > css.idl:143: Identifier `valueType' clashes with keyword `valuetype' > > range.idl:38: Declaration of interface `Range' clashes with name of > enclosing scope `range' > range.idl:21: (`range' declared here) > > Aren't standards great! I wouldn't put the DOM IDL down too much. The rule about immediately-nested scope was added in CORBA 2.3 only, as was the supports keyword, and the rules about identifiers clashing with keywords in case was clarified just recently - I read CORBA 2.1 as not having such a rule, although others read it differently. Anyway, this can easily be fixed using _ escapes (i.e. _supports, _readOnly). Regards, Martin From paul@prescod.net Mon Jun 26 22:30:57 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 26 Jun 2000 14:30:57 -0700 Subject: [XML-SIG] SAX Support Message-ID: <3957CB91.15DD9855@prescod.net> Weird email problems today...here we go again -------- Original Message -------- Subject: SAX Support Date: Mon, 26 Jun 2000 10:24:53 -0700 From: Paul Prescod To: "xml-sig@python.org" Let's be clear on what we need for SAX support in Python 1.6. Here's the formal documentation for Python SAX: http://www.garshol.priv.no/download/software/saxlib/sax2/saxlib.html It looks solid to me. This makes sense because Lars has a lot of experience and was also building on the Java API. I think that our SAX support will be a single file/module called "saxparser". It will contain a driver for PyExpat, exception handling and default classes. The following classes are deprecated and thus will be ignored: AttributeList, Parser, DocumentHandler The following classes address features more complex/esoteric than we should undertake to code, test and document: DTDHandler, DeclHandler, EntityResolver, LexicalHandler, Locator These two classes are more useful in a statically typed environment: XMLFilter, InputSource That leaves: #1. Attributes: This is implemented as a wrapper on two dictionaries: Qname->value (URL, Localname)->value #2. ContentHandler: PyExpat will have a SAX 2 mode that uses ContentHandler calling conventions. A no-op base content handler will be provided #3. ErrorHandler: A default error handler will be provided. #4. various exception classes: provided #5. XMLReader: A PyExpat driver will implement this interface. Most of this is just packaging of code we already have. I plan to get what I can from Lars, the xml-sig distribution and elsewhere and integrate it tomorrow. I'd like to try for a checkin on Wednesday or Thursday. Does that plan make sense? Does this SAX subset make sense? -- Paul Prescod - Not encumbered by corporate consensus The "war on drugs" began as a rhetorical flourish used by Richard Nixon... But as the Reagan, Bush and Clinton administrations poured billions of dollars into fighting drugs, the slogan slipped the reins of metaphor to become just a plain old war - with an army (DEA), an enemy (profiled minorities, the poor, the cities), a budget ($17.8 billion), and a shibboleth (the children). - "This is your bill of rights...on drugs", Harper's, Dec 1999 From martin@loewis.home.cs.tu-berlin.de Mon Jun 26 22:47:13 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 26 Jun 2000 23:47:13 +0200 Subject: [XML-SIG] Re: [DO-SIG] Python language bidning January 2000 Draft In-Reply-To: <3957A73E.6D1E7D36@FourThought.com> (message from Mike Olson on Mon, 26 Jun 2000 12:55:58 -0600) References: <20000623200845.AA7F41CD0C@dinsdale.python.org> <3953C729.C17E75AF@digicool.com> <3953C9D1.410D04C9@digicool.com> <200006241707.TAA01594@loewis.home.cs.tu-berlin.de> <3957557E.4F843F0@digicool.com> <395783D6.5200DCE@FourThought.com> <395797BB.E71AB751@digicool.com> <3957A73E.6D1E7D36@FourThought.com> Message-ID: <200006262147.XAA01296@loewis.home.cs.tu-berlin.de> > From what I read, they feel strongly that attribute access over an ORB > is not a good idea, for reasons of exceptions and the like. I don't > remember seeing any complaints about attribute access of python objects. =46rom the point of view of a Python Language Mapping user, there is no difference between these two: You access a Python object, but it really is a stub object only, so a call goes over the wire. > I've never acutally tried (so I may be wrong) but I don't think you > can call _get_foo accross an ORB for an attribute called foo. Sure you can. In fact, _get_foo, in the IIOP protocol, is marshalled as an operation invocation of "_get_foo". This gives the additional advantage that ORBs can dispatch incoming operation invocations as-is into method calls - no matter whether these are attribute accesses or other operation invocations. > In distributed world then thay are private. See, this is a common misconception about OMG IDL attributes. They are *not* private; instead, they are a short-hand for two public operations, which are as good as any other operation. Of course, it may be that the DOM designers had a different understanding of attributes in mind. Regards, Martin From Mike.Olson@fourthought.com Tue Jun 27 01:21:09 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Mon, 26 Jun 2000 18:21:09 -0600 Subject: [XML-SIG] The '_' thingy References: <39578DBA.8FB64449@FourThought.com> <3957DBDF.40792117@digicool.com> Message-ID: <3957F375.B8A79E@FourThought.com> Jim Fulton wrote: > > Mike Olson wrote: > > > > So, I think I see this as a general concensius: > > Are you kidding? No. '_' issues aside I think most people want attribute access. I didn't tally a vote or anything, but that was the sense I got. Am I wrong? > > > 1. DOM will never (in forseeable future) be used over an ORB, so the > > IDL should be used as a guide. > > Uh, this doesn't make sense. We don't need to stick strickly to IDL (I don't think that was the original intention), because we won't be doing distributed DOM for a while. > > > > 2. Most people will access the DOM via attributes. > > Who says? What do you have to support this? Most people > will access the DOM through whatever interface we define. Again, just the sense I got. So where are we at on the attribute vs. accessor debate? I throw in my hat for attribute > > > > Whatever path we start down, it should begin with a draft > that documements the DOM mapping for Python. Agreed, but I think we can work out some of the larger issues on the list. > > > A > > langauge mapping is something we can put into the next release of 4DOM > > (something we've been meaning to do any ways). The rest of the cahnges > > are actually in place (unless we define a different callback naming > > convention). We will be slowly depricating _get_* soon as well. > > However we will still need __setattr__ callbacks in some cases.... > > > In summary, I think using attribute-based access for the Python DOM > API would be a mistake because it will make efficient DOM implementations > harder than necessary to create. I'd prefer to see accessor functions used > to provide access to DOM attributes. > > There has, however, been relatively lettle discussion on this. > I'm curious what opinions others have. Jim, I don't see your arguements. How is n.firstChild less efficent the n.get_firstChild() ? In the first, you modfy appendChild, et al and at the end put in if self.childNodes[0] == newNode: self.firstChild = newNode In the second you do a "return self.childNodes[0]" I don't see a major memory or speed difference? You can do the same for all other attributes. I don't see how accessors call get around circular references either. Believe me we have tried with this one. We have come up with a few schemes in our time, proxied nodes and such, but nothing that made it worth the overhead. Its much simplier/efficient to have a utility function to clean up a tree if you need it too. Mike > > Jim > > -- > Jim Fulton mailto:jim@digicool.com Python Powered! > Technical Director (888) 344-4332 http://www.python.org > Digital Creations http://www.digicool.com http://www.zope.org > > Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email > address may not be added to any commercial mail list with out my > permission. Violation of my privacy with advertising or SPAM will > result in a suit for a MINIMUM of $500 damages/incident, $1500 for > repeats. -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uogbuji@fourthought.com Tue Jun 27 01:28:18 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 26 Jun 2000 18:28:18 -0600 Subject: [XML-SIG] Re: [DO-SIG] Python language bidning January 2000 Draft In-Reply-To: Message from Jim Fulton of "Mon, 26 Jun 2000 13:49:47 EDT." <395797BB.E71AB751@digicool.com> Message-ID: <200006270028.SAA02708@localhost.localdomain> Jim Fulton: > Then you advocate abandoning the CORBA IDL mapping for Python. > The Python mapping does *not* provide for using IDL attributes > as Python attributes. In fact, some folks in the DO-SIG seem to > feel strongly that doing so would be a really bad idea. I think that since the end aim is for maximum Pythonicity independent of CORBA, that we should, as Mike suggests, just do away with accessor/mutators entirely. So that would be my vote: The Python/DOM binding for DOM is to access DOM attributes through Python attributes. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uogbuji@fourthought.com Tue Jun 27 01:36:58 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 26 Jun 2000 18:36:58 -0600 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: Message from Jim Fulton of "Mon, 26 Jun 2000 13:54:13 EDT." <395798C5.682588A@digicool.com> Message-ID: <200006270036.SAA02732@localhost.localdomain> > Mike Olson wrote: > > > > Jim Fulton wrote: > > > > > > > > > > > Actually, the DOM can be mapped into a language in a manner that does > > > > not follow directly from the IDL and CORBA specs. That's why there is a > > > > formally defined java binding rather than just a reference to the IDL > > > > specs. Historically, though, 4DOM was really a CORBA tool so it really > > > > needed to follow the specs. > > > > > > Whatever we do, there needs to be a document somewhere that > > > says what the Python DOM mapping is, even if it is not > > > much more than a reference to the DOM IDL and the Python > > > binding. > > > > In 4DOM, we are actually moving away from __getattr__ (for speed). > > IMO, this is strong evidence that the Python DOM should > *not* use attributes for implementing the DOM/IDL attributes. Not so fast. We've mostly solved the speed problem. And we could solve a good deal more of it by getting rid of accessor/mutator functions. This whole argument actually makes mandating only attributes more attractive to me. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uogbuji@fourthought.com Tue Jun 27 01:47:50 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 26 Jun 2000 18:47:50 -0600 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: Message from of "Mon, 26 Jun 2000 19:37:32 EDT." <00a701bfdfc7$7fcb9e80$7cac1218@reston1.va.home.com> Message-ID: <200006270047.SAA02755@localhost.localdomain> Tom Passim: > Jim Fulton continued the attributes thread - > > I still don't see why anyone is still arguing about whether the DOM rec > makes Python use attributes. I doesn't. In fact, it says that what are > called "attributes" in the IDL definitions are NOT supposed to be attributes > in implementations, and that the get/set accessor functions don't have to > store/retrieve from actual objects, let alone attributes of objects. > > So can we at least lay this part of it to rest? Now if most people think it > is more 'Pythonic' to use attributes, or if there are clearcut performance > benefits, then we have a basis for discussion. But let's quit talking about > whether the DOM rec makes us do attributes. Now I have no idea what you lot are arguing. The first argument was against leading underscore because it's "not Python idiom". The point was made that we should simply cock a snook at the Python/CORBA binding. Once that point was allowed, the same lot are arguing against using attributes, which are indisputable Python idiom on the grounds that it goes against the spirit of the W3C spec. I hope I can be blunt without antagonism, but it seems as if a particular goal is in mind: i.e. DOM attribute access through accessor/mutators only, and any available argument is being thrown at that goal. I'll note that I claim to have no agenda except to do what's sensible for Python and DOM (we've already put a great deal of work into making 4DOM conform to the earlier list consensus, and we could put in more work if it made sense.) The course that does make sense is to allow attribute access only because it's most Pythonic. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uogbuji@fourthought.com Mon Jun 26 18:21:10 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 26 Jun 2000 11:21:10 -0600 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: Message from Michael McLay of "Mon, 26 Jun 2000 13:38:44 EDT." <14679.38180.290118.619833@fermi.eeel.nist.gov> Message-ID: <200006261721.LAA01760@localhost.localdomain> > Uche Ogbuji writes: > > I should note that I quite hope that leading-underscore-means-private > > is indeed not normative, and never will be so. The leading underscore > > is the most readable way to escape symbol names that would clash with > > an applicable naming convention. > > The use of a single "_" at the front of a name in Python has always > (at least since about Python 1.2) been used to filter out names that > should not be used outside a module. It is more than an idiom. The > concept is enforced in statements of the form "from foo import *". It > isn't absolutely enforced. You can still say "from foo import _thing". Everything I hear is that it's a weak restriction. Maybe that doesn't matter. > Arguing that other languages use "_" to eliminate name conflicts is a > red-herring when talking about Python. Python namespaces make this > unnecessary. It is annoying to some Python users to see an idiom from > another language carried over to Python. In this case the carryover > directly conflicts with a Python language feature. All true, but we all use other languages to buttress our arguments when it suits us, and protest against bleeding in idioms from other languages when it doesn't. All's fair in flame and dialectic, no? > > It's far from a conclusive argument > > (though Jim Fulton tried to make a similar argument to excoriate the > > Python/CORBA binding), but most languages, such as C++ allow exactly > > this approach as put to good use by the C++/CORBA binding. > > > The leading-underscore-is-private idea has the annoying effect that if > > I want to call a variable "class", I must instead use the silly > > "klass", rather than "_class", which is far more readable and > > self-explanatory. And if I want to call variables "def", "type" and > > "else", what then? "deph", "tipe" and "els"? (I suppose one could > > use trailing underscore). > > Why not use append a "_" to the name to distinguish it from a keyword. > The name "class_" would be as readable and it wouldn't conflict > with the longstanding Python rule for leading "_" characters. That was what I said in my last parenthesis. It's a bit less natural to me, but that's just viscera and there's no need for you to pay attention to it. > > Most likely, Guido is already in his time machine writing "Thou shalt > > not use leading underscore except for private variables" on a stone > > tablet somewhere in the past to end the whole argument. But people > > have been saying nasty things about the Python/CORBA binding which > > wouldn't be as nasty as the things I'd say about such a restriction in > > Python. > > The special meaning of _* is defined here: > http://www.python.org/doc/current/ref/id-classes.html > > The rule is also discussed in section 6.1 of the Tutorial. Still only a weak restriction, but in the interests of moving on to productive work, I think it's time for me to concede this particular argument. So what do we do about the underscores in the DOM binding? -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From case@appliedtheory.com Tue Jun 27 04:30:58 2000 From: case@appliedtheory.com (Benjamin Saller) Date: Mon, 26 Jun 2000 23:30:58 -0400 (EDT) Subject: [XML-SIG] Interested in feedback Message-ID: This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. Send mail to mime@docserver.cac.washington.edu for more info. --2144283543-236838159-962076658=:25151 Content-Type: TEXT/PLAIN; charset=US-ASCII I am not really sure how to prefix this. I have something that is while not general purpose does seem simple to use in the 80% case. If people disagree, or can think of things that might make this better I would like some feedback. This module has a simple interface and is just over 200 lines. import xmlObjects p = xmlObjects.Parser() xml = p.parse("filename") # where maybe this should be a string or fp xml.getValue("container1.allow.host") # == 'loki' xml.getValue("container2.allow.host[1]") # == 'foo.bar.com' xml.getValue("listen[port]", convert=int) # == 9000 xml.getValues("container2.allow.host") # == ['loki...', 'foo...' 'baz...'] xml.getXML() # does a decent job of reproducing the source for the data set loki.appliedtheory.com 127.0.0.1 foo.bar.com baz.bar.com 100.4123 8 16 32 -- Benjamin Saller Technical Strategist AppliedTheory Where tire hits pavement on the Information super-highway, that's where my head is... --2144283543-236838159-962076658=:25151 Content-Type: TEXT/PLAIN; charset=US-ASCII; name="xmlObjects.py" Content-Transfer-Encoding: BASE64 Content-ID: Content-Description: Content-Disposition: attachment; filename="xmlObjects.py" IyEvdXNyL2Jpbi9weXRob24NCiMNCiMgIENvcHlyaWdodCAoQykgMTk5OS0y MDAwICBCZW5qYW1pbiBTYWxsZXIgYW5kIEFwcGxpZWRUaGVvcnkNCiMgICAg ICAgICAgICAgICAgICAgICAgICAgICBhbmQgdGhlIHJlc3BlY3RpdmUgYXV0 aG9ycyAoc2VlIGJlbG93KQ0KIw0KIyAgQXV0aG9yKHMpIGFuZCBDb250cmli dXRvcihzKToNCiMgICAgICAgICAgICBCZW5qYW1pbiBTYWxsZXINCg0KZnJv bSAgIHR5cGVzICAgIGltcG9ydCAqDQpmcm9tICAgeG1sLnNheCAgaW1wb3J0 IHNheGV4dHMNCmZyb20gICB4bWwuc2F4ICBpbXBvcnQgc2F4bGliDQppbXBv cnQgcmUNCmltcG9ydCBzdHJpbmcNCmltcG9ydCBzeXMNCg0KaW5kZXhSZSAg ICAgID0gcmUuY29tcGlsZSgiIiINCig/UDxlbGVtZW50Pltcd1xkX10rPykg ICAgICAgICAjIHJlcXVpcmVkIGVsZW1lbnQgbmFtZQ0KKChbW10pKD9QPGlu ZGV4PlxkKykoW11dKSk/ICAgICMgb3B0aW9uYWwgaW5kZXgNCigoW1tdKSg/ UDxhdHRyPlx3KykoW11dKSk/JCAgICAjIG9wdGlvbmFsIGF0dHINCiIiIiwg cmUuWCkNCg0Kd2hpdGVzcGFjZVJlID0gcmUuY29tcGlsZSgiXHMrIikNCg0K Y2xhc3MgeG1sT2JqZWN0V3JpdGVyTWl4SW46DQogICAgZGVmIF93cml0ZVN0 YXJ0KHNlbGYsIGVuZD1Ob25lKToNCiAgICAgICAgaWYgZW5kOg0KICAgICAg ICAgICAgZW5kID0gJy8nDQogICAgICAgIGVsc2U6DQogICAgICAgICAgICBl bmQgPSAnJw0KDQogICAgICAgIHJldHVybiAiPCVzJXMlcz4iICUoIHNlbGYu X25hbWUsIHNlbGYuX3dyaXRlQXR0cnMoKSwgZW5kKQ0KDQogICAgZGVmIF93 cml0ZUF0dHJzKHNlbGYpOg0KICAgICAgICBhdHRycyA9IFtdDQogICAgICAg IGZvciBrLHYgaW4gc2VsZi5fYXR0cnMuaXRlbXMoKToNCiAgICAgICAgICAg IGF0dHJzLmFwcGVuZCgiJXM9JyVzJyIgJSAoayx2KSkNCiAgICAgICAgaWYg bGVuKGF0dHJzKToNCiAgICAgICAgICAgIHJldHVybiAiICVzIiAlIChzdHJp bmcuam9pbihhdHRycywgJyAnKSkNCiAgICAgICAgZWxzZToNCiAgICAgICAg ICAgIHJldHVybiAnJw0KDQogICAgZGVmIF93cml0ZUVuZChzZWxmKToNCiAg ICAgICAgcmV0dXJuICI8LyVzPiIgJSBzZWxmLl9uYW1lDQoNCiAgICBkZWYg X3dyaXRlVGV4dChzZWxmKToNCiAgICAgICAgcmV0dXJuIHNlbGYuX2dldERh dGEoKQ0KDQogICAgZGVmIHhtbChzZWxmLCBpbmRlbnQ9MCk6DQogICAgICAg IHRhYnMgPSAnICAgICcgKiBpbmRlbnQNCiAgICAgICAgDQogICAgICAgIGlm IGxlbihzZWxmLl9jaGlsZHJlbik6DQogICAgICAgICAgICByZXN1bHRzID0g WyIlcyVzJXMiICUgKHRhYnMsIHNlbGYuX3dyaXRlU3RhcnQoKSwNCiAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgc2VsZi5fd3JpdGVUZXh0 KCkpXQ0KICAgICAgICAgICAgZm9yIGssdiBpbiBzZWxmLl9jaGlsZHJlbi5p dGVtcygpOg0KICAgICAgICAgICAgICAgIGZvciBkYXR1bSBpbiB2Og0KICAg ICAgICAgICAgICAgICAgICByZXN1bHRzLmFwcGVuZChkYXR1bS54bWwoaW5k ZW50ICsgMSkpDQogICAgICAgICAgICAgICAgICAgIA0KICAgICAgICAgICAg cmVzdWx0cy5hcHBlbmQoIiVzJXMiICUgKHRhYnMsIHNlbGYuX3dyaXRlRW5k KCkpKQ0KICAgICAgICAgICAgICAgICAgICAgICAgICAgDQogICAgICAgIGVs c2U6DQogICAgICAgICAgICAjIE5vIGNoaWxkcmVuDQogICAgICAgICAgICB0 ZXh0ID0gc2VsZi5fd3JpdGVUZXh0KCkNCiAgICAgICAgICAgIGlmIGxlbih0 ZXh0KToNCiAgICAgICAgICAgICAgICByZXN1bHRzID0gWyIlcyVzJXMiICUg KHRhYnMsIHNlbGYuX3dyaXRlU3RhcnQoKSwNCiAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgIHNlbGYuX3dyaXRlVGV4dCgpKV0NCiAg ICAgICAgICAgIGVsc2U6DQogICAgICAgICAgICAgICAgcmVzdWx0cyA9IFsi JXMlcyIgICAlICh0YWJzLCBzZWxmLl93cml0ZVN0YXJ0KDEpKV0NCg0KICAg ICAgICByZXR1cm4gc3RyaW5nLmpvaW4ocmVzdWx0cywgJ1xuJykNCiAgICAg ICAgICAgICAgICAgICAgICAgDQogICAgZGVmIGdldFhNTChzZWxmKToNCiAg ICAgICAgcmV0dXJuIHNlbGYueG1sKCkNCiAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgDQogICAgICAgIA0KICAgICAgICANCmNsYXNzIHht bE9iamVjdCh4bWxPYmplY3RXcml0ZXJNaXhJbik6DQogICAgZGVmIF9faW5p dF9fKHNlbGYsIG5hbWUsIGF0dHJzPU5vbmUpOg0KICAgICAgICBzZWxmLl9j aGlsZHJlbiA9IHt9DQogICAgICAgIHNlbGYuX2RhdGEgICAgID0gW10NCiAg ICAgICAgc2VsZi5fbmFtZSAgICAgPSBuYW1lDQogICAgICAgIA0KICAgICAg ICBzZWxmLl9hdHRycyAgICA9IHt9DQogICAgICAgIGlmIGF0dHJzOiAgICAg ICAgc2VsZi5fYXR0cnMudXBkYXRlKGF0dHJzKQ0KDQogICAgZGVmIF9hZGRD aGlsZChzZWxmLCBvYmplY3QpOg0KICAgICAgICB0cnk6DQogICAgICAgICAg ICBzZWxmLl9jaGlsZHJlbltvYmplY3QuX25hbWVdLmFwcGVuZChvYmplY3Qp DQogICAgICAgIGV4Y2VwdCBLZXlFcnJvcjoNCiAgICAgICAgICAgIHNlbGYu X2NoaWxkcmVuW29iamVjdC5fbmFtZV0gPSBbb2JqZWN0XQ0KDQogICAgZGVm IF9hZGRUZXh0KHNlbGYsIHRleHQpOg0KICAgICAgICBzZWxmLl9kYXRhLmFw cGVuZCh0ZXh0KQ0KDQogICAgZGVmIF9nZXREYXRhKHNlbGYpOg0KICAgICAg ICB0ZXh0ID0gc3RyaW5nLmpvaW4oc2VsZi5fZGF0YSwgJycpDQogICAgICAg IHJldHVybiB3aGl0ZXNwYWNlUmUuc3ViKCcgJywgdGV4dCkNCg0KICAgIGRl ZiBfZ2V0QXR0cihzZWxmLCBhdHRyKToNCiAgICAgICAgcmV0dXJuIHNlbGYu X2F0dHJzW2F0dHJdDQogICAgDQogICAgZGVmIF9wYXJzZUF0dHIoc2VsZiwg ZWxlbWVudCk6DQogICAgICAgIGF0dHIgPSBpbmRleFJlLnNlYXJjaChlbGVt ZW50KQ0KICAgICAgICBpZiBhdHRyIGlzIE5vbmU6DQogICAgICAgICAgICBy YWlzZSAiSW52YWxpZCBwYXRoIGVsZW1lbnQ6ICVzXG4iICUgZWxlbWVudA0K ICAgICAgICBlbHNlOg0KICAgICAgICAgICAgZGljdCA9IGF0dHIuZ3JvdXBk aWN0KCkNCiAgICAgICAgICAgIHJldHVybiAoZGljdFsnZWxlbWVudCddLCBk aWN0WydpbmRleCddLCBkaWN0WydhdHRyJ10pDQogICAgDQoNCiAgICBkZWYg X2dldFBhcnQoc2VsZiwgcGFydCk6DQogICAgICAgIHBhcnQsIGluZGV4LCBh dHRyID0gc2VsZi5fcGFyc2VBdHRyKHBhcnQpDQogICAgICAgIG9iaiAgPSBz ZWxmLl9jaGlsZHJlbltwYXJ0XQ0KICAgICAgICBpZiBpbmRleDoNCiAgICAg ICAgICAgIGluZGV4ID0gaW50KGluZGV4KQ0KICAgICAgICAgICAgb2JqID0g b2JqW2luZGV4XQ0KDQogICAgICAgIHJldHVybiBvYmosIGF0dHINCiAgICAN CiAgICBkZWYgX2dldFBhdGgoc2VsZiwgcGF0aCk6DQogICAgICAgIHBhcnRz ICA9IHN0cmluZy5zcGxpdChwYXRoLCAnLicpDQogICAgICAgIG9iamVjdCA9 IHNlbGYNCg0KICAgICAgICBsYXN0ICAgPSBsZW4ocGFydHMpDQogICAgICAg IGkgICAgICA9IDANCg0KICAgICAgICBmb3IgcGFydCBpbiBwYXJ0czoNCiAg ICAgICAgICAgIGkgPSBpICsgMQ0KICAgICAgICAgICAgaXNMYXN0ID0gaSA9 PSBsYXN0DQogICAgICAgICAgICANCiAgICAgICAgICAgIG9iamVjdCwgYXR0 ciA9IG9iamVjdC5fZ2V0UGFydChwYXJ0KQ0KICAgICAgICAgICAgaWYgbm90 IG9iamVjdDoNCiAgICAgICAgICAgICAgICByZXR1cm4gTm9uZQ0KDQogICAg ICAgICAgICBpZiBub3QgaXNMYXN0IGFuZCB0eXBlKG9iamVjdCkgPT0gTGlz dFR5cGU6DQogICAgICAgICAgICAgICAgb2JqZWN0ID0gb2JqZWN0WzBdDQoN CiAgICAgICAgaWYgdHlwZShvYmplY3QpICE9IExpc3RUeXBlOg0KICAgICAg ICAgICAgb2JqZWN0ID0gW29iamVjdF0NCiAgICAgICAgDQogICAgICAgIHJl dHVybiBvYmplY3QsIGF0dHINCiAgICANCiAgICAgICAgICAgIA0KICAgIGRl ZiBnZXRPYmplY3RzKHNlbGYsIHBhdGgpOg0KICAgICAgICBvYmplY3RzLCBh dHRyID0gc2VsZi5fZ2V0UGF0aChwYXRoKQ0KICAgICAgICAjcHJpbnQgImdl dE9iamVjdHMiLCBwYXRoLCBvYmplY3RzLCBhdHRyDQogICAgICAgIHJldHVy biBvYmplY3RzLCBhdHRyDQoNCg0KICAgIGRlZiBnZXRPYmplY3Qoc2VsZiwg cGF0aCk6DQogICAgICAgIG9iamVjdHMsIGF0dHIgPSBzZWxmLmdldE9iamVj dHMocGF0aCkNCiAgICAgICAgcmV0dXJuIG9iamVjdHNbMF0sIGF0dHINCiAg ICANCiAgICBkZWYgZ2V0VmFsdWUoc2VsZiwgcGF0aCwgZmFpbD1Ob25lLCBj b252ZXJ0PU5vbmUpOg0KICAgICAgICBvYmplY3QgPSBzZWxmLmdldFZhbHVl cyhwYXRoLCBmYWlsLCBjb252ZXJ0KQ0KICAgICAgICBpZiB0eXBlKG9iamVj dCkgPT0gTGlzdFR5cGU6DQogICAgICAgICAgICByZXR1cm4gb2JqZWN0WzBd DQogICAgICAgIGVsc2U6DQogICAgICAgICAgICByZXR1cm4gb2JqZWN0ICNm YWlsdmFsDQogICAgICAgIA0KICAgIGRlZiBnZXRWYWx1ZXMoc2VsZiwgcGF0 aCwgZmFpbD1Ob25lLCBjb252ZXJ0PU5vbmUpOg0KICAgICAgICByZXN1bHRz ID0gW10NCiAgICAgICAgb2JqZWN0LCBhdHRyID0gc2VsZi5nZXRPYmplY3Rz KHBhdGgpDQoNCiAgICAgICAgaWYgbm90IG9iamVjdDoNCiAgICAgICAgICAg IHJldHVybiBmYWlsDQoNCiAgICAgICAgaWYgYXR0cjoNCiAgICAgICAgICAg IGZvciBvIGluIG9iamVjdDoNCiAgICAgICAgICAgICAgICByZXN1bHRzLmFw cGVuZChvLl9nZXRBdHRyKGF0dHIpKQ0KICAgICAgICBlbHNlOg0KICAgICAg ICAgICAgI3ByaW50ICJBYm91dCB0byBpdG9yIG9uIiwgb2JqZWN0LCAiZm9y IiwgcGF0aA0KICAgICAgICAgICAgZm9yIG8gaW4gb2JqZWN0Og0KICAgICAg ICAgICAgICAgIHJlc3VsdHMuYXBwZW5kKG8uX2dldERhdGEoKSkNCg0KICAg ICAgICBpZiBjb252ZXJ0Og0KICAgICAgICAgICAgcmVzdWx0cyA9IG1hcChj b252ZXJ0LCByZXN1bHRzKQ0KDQogICAgICAgIHJldHVybiByZXN1bHRzDQog ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgDQogICAgDQpjbGFzcyB4 bWxIYW5kbGVyKHNheGxpYi5Eb2N1bWVudEhhbmRsZXIpOg0KICAgIGRlZiBf X2luaXRfXyhzZWxmKToNCiAgICAgICAgc2VsZi5fc3RhY2sgPSBbeG1sT2Jq ZWN0KCdfdG9wJyldDQoNCiAgICBkZWYgb2JqZWN0KHNlbGYpOg0KICAgICAg ICByZXR1cm4gc2VsZi5fc3RhY2tbMF0uX2NoaWxkcmVuLnZhbHVlcygpWzBd WzBdDQoNCiAgICAgICAgDQogICAgZGVmIHN0YXJ0RWxlbWVudChzZWxmLCBu YW1lLCBhdHRyKToNCiAgICAgICAgYXR0ciA9IGF0dHIubWFwDQogICAgICAg IG5ldyAgPSB4bWxPYmplY3QobmFtZSwgYXR0cikNCiAgICAgICAgDQogICAg ICAgIHRvcCAgPSBzZWxmLl9zdGFja1stMV0NCiAgICAgICAgdG9wLl9hZGRD aGlsZChuZXcpDQogICAgICAgIA0KICAgICAgICBzZWxmLl9zdGFjay5hcHBl bmQobmV3KQ0KDQogICAgZGVmIGVuZEVsZW1lbnQoc2VsZiwgbmFtZSk6DQog ICAgICAgIHNlbGYuX3N0YWNrLnBvcCgpDQoNCiAgICBkZWYgY2hhcmFjdGVy cyhzZWxmLCBjaCwgc3RhcnQsIGxlbmdodCk6DQogICAgICAgIHRvcCA9IHNl bGYuX3N0YWNrWy0xXQ0KICAgICAgICB0b3AuX2FkZFRleHQoY2hbc3RhcnQ6 c3RhcnQrbGVuZ2h0XSkNCg0KY2xhc3MgUGFyc2VyOg0KICAgIGRlZiBfX2lu aXRfXyhzZWxmKToNCiAgICAgICAgc2VsZi5fcGFyc2VyICA9IHNheGV4dHMu bWFrZV9wYXJzZXIoKQ0KICAgICAgICBzZWxmLl9oYW5kbGVyID0geG1sSGFu ZGxlcigpDQogICAgICAgIA0KICAgICAgICBzZWxmLl9wYXJzZXIuc2V0RG9j dW1lbnRIYW5kbGVyKHNlbGYuX2hhbmRsZXIpDQoNCg0KICAgIGRlZiBwYXJz ZShzZWxmLCBpbnB1dCk6DQogICAgICAgIGZwID0gb3BlbihpbnB1dCkNCiAg ICAgICAgc2VsZi5fcGFyc2VyLnBhcnNlRmlsZShmcCkNCiAgICAgICAgZnAu Y2xvc2UoKQ0KICAgICAgICByZXR1cm4gc2VsZi5faGFuZGxlci5vYmplY3Qo KQ0KICAgIA0KICAgICAgICANCg== --2144283543-236838159-962076658=:25151-- From paul@prescod.net Mon Jun 26 19:35:33 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 26 Jun 2000 11:35:33 -0700 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <3953A717.5289DCC8@digicool.com> <14675.49532.375238.979659@mailhost.beopen.com> <3953E20C.6239D946@prescod.net> <39576435.36751D54@digicool.com> <395788A6.8C815B57@FourThought.com> <395798C5.682588A@digicool.com> Message-ID: <3957A275.FF220A1C@prescod.net> Jim Fulton wrote: > > > > In 4DOM, we are actually moving away from __getattr__ (for speed). > > IMO, this is strong evidence that the Python DOM should > *not* use attributes for implementing the DOM/IDL attributes Direct attribute access is MUCH, MUCH faster than a method call, whethr through getattr or not. Under the current implementation we have the option of caching attributes for performance. An all method design would take that away. Minidom uses direct attributes for 95% of what it does. I think I used one getter method to lazily evaluate attributes. > I'd like it to be as easy as possible for various objects to implement > the DOM. (See for example StructuredTextNG.) I'd hate to make implementers > go through the pain and performance hit of getattr or dictate an implementation > (like caching attributes or otherwise directly storing them, creating > memory leaks). I guess the question is who do we cater to? Heretofore it has been DOM users first, DOM implementors second. I don't think that we should turn that around based on the argument that all Python objects will have a DOM interface soon. To me, this looks like Python: a=b.childNodes[0].attributes["abc"] and this looks like Java: a=b.getChildNodes()[0].getAttributes()["abc"] The second grates on me as having interface enforced because of implementation limitations. (which is what all of this griping about getattr being slow boils down to...isn't it better to fix that problem once, for everyone than to work around it a hundred times?) It also drives me crazy that the latter always invokes a method call even when it is stored underneath as a simple attribute. Surely there is some imaginative way to make life easier for your implementors using base classes. For instance, wouldn't it be nice for you to automatically set up the attributes list based on Python attributes? Something like: def __getattr__( self, name ): if name=="attributes": keys=self.__dict__.keys() values=map( str, self.__dict__.values() return JimsAttributeList( keys, values ) Encourage them to subclass from you but add some value that they wouldn't get otherwise. -- Paul Prescod - Not encumbered by corporate consensus The "war on drugs" began as a rhetorical flourish used by Richard Nixon... But as the Reagan, Bush and Clinton administrations poured billions of dollars into fighting drugs, the slogan slipped the reins of metaphor to become just a plain old war - with an army (DEA), an enemy (profiled minorities, the poor, the cities), a budget ($17.8 billion), and a shibboleth (the children). - "This is your bill of rights...on drugs", Harper's, Dec 1999 From hemangee@pspl.co.in Tue Jun 27 05:55:07 2000 From: hemangee@pspl.co.in (Hemangee) Date: Tue, 27 Jun 2000 11:55:07 +0700 Subject: [XML-SIG] XML Python package installation Message-ID: <000501bfdff3$ddb050f0$6102a8c0@intranet.pspl.co.in> Hello, I tried installing the XML package v0.5.2 which is the Python XML package on my Windows NT workstation. It gives following errors when I run the command : python setup.py build File "E:\PyXML-0.5.4\setup.py", line 185, in ? func() File "E:\PyXML-0.5.4\setup.py", line 134, in build_win32 create_build_dir() File "E:\PyXML-0.5.4\setup.py", line 131, in create_build_dir copytree('xml', 'build/xml') File "E:\PyXML-0.5.4\setup.py", line 103, in copytree names = os.listdir(src) OSError: [Errno 3] No such process Please help. Thanks in advance, Hemangee. From robin@jessikat.co.uk Tue Jun 27 11:21:58 2000 From: robin@jessikat.co.uk (Robin Becker) Date: Tue, 27 Jun 2000 11:21:58 +0100 Subject: [XML-SIG] Re: [DO-SIG] Python language bidning January 2000 Draft In-Reply-To: <200006270028.SAA02708@localhost.localdomain> References: <200006270028.SAA02708@localhost.localdomain> Message-ID: In article <200006270028.SAA02708@localhost.localdomain>, Uche Ogbuji writes >Jim Fulton: > >> Then you advocate abandoning the CORBA IDL mapping for Python. >> The Python mapping does *not* provide for using IDL attributes >> as Python attributes. In fact, some folks in the DO-SIG seem to >> feel strongly that doing so would be a really bad idea. > >I think that since the end aim is for maximum Pythonicity independent of >CORBA, that we should, as Mike suggests, just do away with accessor/mutators >entirely. > >So that would be my vote: The Python/DOM binding for DOM is to access DOM >attributes through Python attributes. > > moi aussi -- Robin Becker From robin@jessikat.co.uk Tue Jun 27 11:41:09 2000 From: robin@jessikat.co.uk (Robin Becker) Date: Tue, 27 Jun 2000 11:41:09 +0100 Subject: [XML-SIG] Interested in feedback In-Reply-To: References: Message-ID: In article , Benjamin Saller writes > >I am not really sure how to prefix this. I have something that is while >not general purpose does seem simple to use in the 80% case. If people >disagree, or can think of things that might make this better I would like >some feedback. .... I like this a lot. I changed the parse method to def parse(self, fn): if type(fn) is StringType: self._parser.parseFile(open(fn)) else: self._parser.parseFile(fn) return self._handler.object() and added this to the bottom to make the module self testing. I would prefer it if the name were all lower case as that makes life slightly more robust with win32. if __name__=='__main__': dataset=''' loki.appliedtheory.com 127.0.0.1 foo.bar.com baz.bar.com 100.4123 8 16 32 ''' import xmlObjects, StringIO fp = StringIO.StringIO(dataset) p = xmlObjects.Parser() xml = p.parse(fp) # where maybe this should be a string or fp print xml.getValue("container1.allow.host") # == 'loki' print xml.getValue("container2.allow.host[1]") # == 'foo.bar.com' print xml.getValue("listen[port]", convert=int) # == 9000 print xml.getValues("container2.allow.host") # == ['loki...', 'foo...' 'baz...'] print xml.getXML() # does a decent job of reproducing the source -- Robin Becker From paul@prescod.net Tue Jun 27 13:21:19 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 27 Jun 2000 05:21:19 -0700 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <3953A717.5289DCC8@digicool.com> <14675.49532.375238.979659@mailhost.beopen.com> <3953E20C.6239D946@prescod.net> <39576435.36751D54@digicool.com> <395788A6.8C815B57@FourThought.com> <395798C5.682588A@digicool.com> <3957A920.C8A6CB76@FourThought.com> <3957B720.9C6768D6@digicool.com> <00a701bfdfc7$7fcb9e80$7cac1218@reston1.va.home.com> Message-ID: <39589C3F.79599D64@prescod.net> tpassin@home.com wrote: > > Jim Fulton continued the attributes thread - > > I still don't see why anyone is still arguing about whether the DOM rec > makes Python use attributes. It doesn't. Nobody is arguing that. Some people *were* arguing that the DOM rec mandates the use of methods (or, more precisely, that DOM ID + Python IDL mapping = methods). But the DOM IDL is clearly not normative because it doesn't even parse as IDL. So we can put that argument to bed. We need to make the decision on technical and aesthetic merits. Attributes: * arguably more Pythonic (=easier to use) * faster for non-computed attributes * slower for computed attributes * more like Javascript, VB and COM-like languages (C# :) ) Methods: * slower for non-computed attributes * faster for computed attributes * harder to implement * more like Java There are no killer arguments here, just different weights applied to the various features. I don't think that we are going to agree to break code today. Maybe later we'll see that there are more DOM implementors than clients and their ease of implementation will take precedence. -- Paul Prescod - Not encumbered by corporate consensus When George Bush entered office, a Washington Post-ABC News poll found that 62 percent of Americans "would be willing to give up a few of the freedoms we have" for the war effort. They have gotten their wish. - "This is your bill of rights...on drugs", Harpers, Dec. 1999 From jerome.marant@free.fr Tue Jun 27 13:58:56 2000 From: jerome.marant@free.fr (J�r�me Marant) Date: 27 Jun 2000 14:58:56 +0200 Subject: [XML-SIG] New subscription Message-ID: <64n1k7i8kv.fsf@amboise.ird.idealx.com> Hi, Does the subscription page work at python.org/mailman/listinfo/xml-sig ? I used to subscribe hours ago and I got no reply. Thanks. --=20 J=E9r=F4me Marant ----------------------------------------------------------- | IDEALX - Open Source Engineering / Ing=E9nierie Open Source | | http://IDEALX.com | ----------------------------------------------------------- From Juergen Hermann" Message-ID: <200006271316.PAA21138@statistik.cinetic.de> On Mon, 26 Jun 2000 10:24:53 -0700, Paul Prescod wrote: >The following classes address features more complex/esoteric than we >should undertake to code, test and document: DTDHandler, DeclHandler, >EntityResolver, LexicalHandler, Locator I would take Locator out of that list. For reporting errors in the application domain (i.e. not catched by the parser), it is quite important to provide some hint to the user WHERE the error is. Ciao, J=FCrgen -- J=FCrgen Hermann (jhe@webde-ag.de) WEB.DE AG, Amalienbadstr.41, D-76227 Karlsruhe Tel.: 0721/94329-0, Fax: 0721/94329-22 From Juergen Hermann" Message-ID: <200006271316.PAA21142@statistik.cinetic.de> On Mon, 26 Jun 2000 18:40:31 -0400, Jim Fulton wrote: >In summary, I think using attribute-based access for the Python DOM >API would be a mistake because it will make efficient DOM implementatio= ns >harder than necessary to create. I'd prefer to see accessor functions u= sed >to provide access to DOM attributes. I'd prefer to have both. Those people that need, know about and care for= the speedier accessor functions can use those. Those that simply want an= easy interface can use attribute style. If we don't want both, I prefer the accessor functions, but without any = ugly underscores, i.e. get/setAttribute(). >There has, however, been relatively lettle discussion on this. You have to have a nitpicking soul to do so. ;)) >I'm curious what opinions others have. See above. Ciao, J=FCrgen -- J=FCrgen Hermann (jhe@webde-ag.de) WEB.DE AG, Amalienbadstr.41, D-76227 Karlsruhe Tel.: 0721/94329-0, Fax: 0721/94329-22 From jim@digicool.com Tue Jun 27 15:06:09 2000 From: jim@digicool.com (Jim Fulton) Date: Tue, 27 Jun 2000 10:06:09 -0400 Subject: [XML-SIG] Re: [DO-SIG] Python language bidning January 2000 Draft References: <200006270028.SAA02708@localhost.localdomain> Message-ID: <3958B4D1.4EA25505@digicool.com> Uche Ogbuji wrote: > > Jim Fulton: > > > Then you advocate abandoning the CORBA IDL mapping for Python. > > The Python mapping does *not* provide for using IDL attributes > > as Python attributes. In fact, some folks in the DO-SIG seem to > > feel strongly that doing so would be a really bad idea. > > I think that since the end aim is for maximum Pythonicity independent of > CORBA, that we should, as Mike suggests, just do away with accessor/mutators > entirely. > > So that would be my vote: The Python/DOM binding for DOM is to access DOM > attributes through Python attributes. It is pretty clear that the Python DOM API should not be bound to the Python CORBA bining, so I think we can excuse the do-sig from further discussions. ;) I don't think that there is agreement on whether attribute access, accessor functions, or both, should be used for the Python DOM API. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From uogbuji@fourthought.com Tue Jun 27 15:11:06 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 27 Jun 2000 08:11:06 -0600 Subject: [XML-SIG] The '_' thingy In-Reply-To: Message from Jim Fulton of "Mon, 26 Jun 2000 18:40:31 EDT." <3957DBDF.40792117@digicool.com> Message-ID: <200006271411.IAA04994@localhost.localdomain> > Mike Olson wrote: > > > > So, I think I see this as a general concensius: > > Are you kidding? > > > 1. DOM will never (in forseeable future) be used over an ORB, so the > > IDL should be used as a guide. > > Uh, this doesn't make sense. Please elaborate. > > We should focus more on useability then > > CORBA compliance. > > > > 2. Most people will access the DOM via attributes. > > Who says? What do you have to support this? Most people > will access the DOM through whatever interface we define. Mike's support is that back when this SIG agreed upon attribute access as well as _get/_set ops, most people said they'd prefer to just use plain attribute access anyway, and the _get/_set was only needed for completeness. Do you have any support to contradict his assertion? > I'm not. I would prefer to see accessor functions for > DOM attributes that are a part of the API and whos names > don't begin with '_'s. > > > then we should start down this path. > > Whatever path we start down, it should begin with a draft > that documements the DOM mapping for Python. I'm working on it. > > A > > langauge mapping is something we can put into the next release of 4DOM > > (something we've been meaning to do any ways). The rest of the cahnges > > are actually in place (unless we define a different callback naming > > convention). We will be slowly depricating _get_* soon as well. > > However we will still need __setattr__ callbacks in some cases.... > > Not if you go to accessor functions instead of attribute-based access. Not a problem for our uses. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From akuchlin@mems-exchange.org Tue Jun 27 15:11:00 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Tue, 27 Jun 2000 10:11:00 -0400 Subject: [XML-SIG] SAX Support In-Reply-To: <395791E5.50D73B92@prescod.net> References: <395791E5.50D73B92@prescod.net> Message-ID: <20000627101100.A19033@kronos.cnri.reston.va.us> On Mon, Jun 26, 2000 at 10:24:53AM -0700, Paul Prescod wrote: >Most of this is just packaging of code we already have. I plan to get >what I can from Lars, the xml-sig distribution and elsewhere and >integrate it tomorrow. I'd like to try for a checkin on Wednesday or >Thursday. Does that plan make sense? Does this SAX subset make sense? I've been meaning to post about the 1.6 strategy. The plan is to create a package for XML-related code in the core distribution; "xmlcore" was suggested and seems a reasonable choice. The XML-SIG distribution, once it becomes inextricably tied to 1.6, can then just assume xmlcore is there and import any bits from xmlcore that are needed. The SAX2 plans look good. What else goes into xmlcore? pulldom? (The mailing list problems stem from dinsdale.python.org being overloaded for some reason; I don't know why, but that's Barry's problem. Mail does go through, just very slowly; I made a bunch of checkins last night, and didn't receive any of the python-checkins messages until this morning.) --amk From jim@digicool.com Tue Jun 27 15:10:41 2000 From: jim@digicool.com (Jim Fulton) Date: Tue, 27 Jun 2000 10:10:41 -0400 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <39577CE9.E391E8E0@prescod.net> <14679.55053.6188.621688@mailhost.beopen.com> Message-ID: <3958B5E1.F82BC081@digicool.com> "Fred L. Drake, Jr." wrote: > > Paul Prescod writes: > > The DOM working group says that for Java and Javascript, usability is > > more important than CORBA compliance. I think that the same goes for > > Python. That's why I use and advocate attribute syntax. > > Paul, > From the IDL errors pointed out earlier and these comments, I'd have > to conclude that the IDL definition should be removed from the > recommendation (not present in the next rev., or whatever), and we > should put together our own Python mapping that completely ignores all > the naming conventions of the IDL and Java mappings and does the > Python thing. I suspect that there is agreement on this. > The big issue there is the legacy code. Is there much? > So, are people using the _get_/_set_ methods or the attribute names? > Why are these questions being brought back up so late in the game, > anyway? Is it late in the game? From the evidence on the XML-SIG pages and the discussion here, it appears to me that there is not a defined Python DOM mapping. Some people think that it provides direct attribute, others seem to thing it provides access based on both. I started this because I'm working on a DOM implementation for the next generation of StructuredText and I couldn't tell what the heck I was supposed to implement. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From uogbuji@fourthought.com Tue Jun 27 15:23:48 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 27 Jun 2000 08:23:48 -0600 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: Message from Paul Prescod of "Mon, 26 Jun 2000 11:35:33 PDT." <3957A275.FF220A1C@prescod.net> Message-ID: <200006271423.IAA05032@localhost.localdomain> > Jim Fulton wrote: > > > > In 4DOM, we are actually moving away from __getattr__ (for speed). > > > > IMO, this is strong evidence that the Python DOM should > > *not* use attributes for implementing the DOM/IDL attributes > > Direct attribute access is MUCH, MUCH faster than a method call, whethr > through getattr or not. Under the current implementation we have the > option of caching attributes for performance. An all method design would > take that away. Minidom uses direct attributes for 95% of what it does. > I think I used one getter method to lazily evaluate attributes. These are the optimizations we've been using, and 4DOM has gained tremendously in speed. Funnily enough, the biggest remaining speed barrier is the fact that we also have to support method access. This makes it very attractive to simply use attribute access, simplify our code and then get a spped boost. There are other structural optimizations we're looking at (and one deserialization optimization Lars suggested), but all else has nothing to do with attribute access. > > I'd like it to be as easy as possible for various objects to implement > > the DOM. (See for example StructuredTextNG.) I'd hate to make implementers > > go through the pain and performance hit of getattr or dictate an implementation > > (like caching attributes or otherwise directly storing them, creating > > memory leaks). > > I guess the question is who do we cater to? Heretofore it has been DOM > users first, DOM implementors second. I don't think that we should turn > that around based on the argument that all Python objects will have a > DOM interface soon. That was my feeling. Yes, it's easier to code as methods, but we should be considering making things easier for users, even if implementors have to jump through hoops. > To me, this looks like Python: > > a=b.childNodes[0].attributes["abc"] > > and this looks like Java: > > a=b.getChildNodes()[0].getAttributes()["abc"] > > The second grates on me as having interface enforced because of > implementation limitations. (which is what all of this griping about > getattr being slow boils down to...isn't it better to fix that problem > once, for everyone than to work around it a hundred times?) It also > drives me crazy that the latter always invokes a method call even when > it is stored underneath as a simple attribute. > > Surely there is some imaginative way to make life easier for your > implementors using base classes. For instance, wouldn't it be nice for > you to automatically set up the attributes list based on Python > attributes? Something like: > def __getattr__( self, name ): > if name=="attributes": > keys=self.__dict__.keys() > values=map( str, self.__dict__.values() > return JimsAttributeList( keys, values ) > We've done exactly this by making Node a sort of "attribute manager". It does make implementation much easier, and has led to a good deal of flexibility, for instance, when we released a ZDOM version, we had to change little more than Node. This works for us even through the long inheritance line from, say Node -> HTMLTableElement We also ourselves pretty much exclusively use attribute access on 4DOM because it's cleaner and faster. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From jim@digicool.com Tue Jun 27 15:29:00 2000 From: jim@digicool.com (Jim Fulton) Date: Tue, 27 Jun 2000 10:29:00 -0400 Subject: [XML-SIG] The '_' thingy References: <39578DBA.8FB64449@FourThought.com> <3957DBDF.40792117@digicool.com> <3957F375.B8A79E@FourThought.com> Message-ID: <3958BA2C.92E85930@digicool.com> Mike Olson wrote: > > Jim Fulton wrote: > > > > Mike Olson wrote: > > > > > > So, I think I see this as a general concensius: > > > > Are you kidding? > > No. '_' issues aside I think most people want attribute access. I > didn't tally a vote or anything, but that was the sense I got. Am I > wrong? I certainly don't think that there is concensus. Unfortunately, I have no idea if there is a process for making these decisions in a SIG where there is disagreement. Unless, of course, the Benevolent Dictator decides to get involved. ;) > > > > > 1. DOM will never (in forseeable future) be used over an ORB, so the > > > IDL should be used as a guide. > > > > Uh, this doesn't make sense. > > We don't need to stick strickly to IDL (I don't think that was the > original intention), because we won't be doing distributed DOM for a > while. Then perhaps you meant to say that "IDL shouldn't be uses as a guide". I think we agree not to base the Python DOM API on the Python IDL mapping. > > > > > > 2. Most people will access the DOM via attributes. > > > > Who says? What do you have to support this? Most people > > will access the DOM through whatever interface we define. > > Again, just the sense I got. > > So where are we at on the attribute vs. accessor debate? I dunno. > I throw in my hat for attribute I disagree. :) > > > > > > > > Whatever path we start down, it should begin with a draft > > that documements the DOM mapping for Python. > > Agreed, but I think we can work out some of the larger issues on the > list. > > > > > > A > > > langauge mapping is something we can put into the next release of 4DOM > > > (something we've been meaning to do any ways). The rest of the cahnges > > > are actually in place (unless we define a different callback naming > > > convention). We will be slowly depricating _get_* soon as well. > > > However we will still need __setattr__ callbacks in some cases.... > > > > > > > In summary, I think using attribute-based access for the Python DOM > > API would be a mistake because it will make efficient DOM implementations > > harder than necessary to create. I'd prefer to see accessor functions used > > to provide access to DOM attributes. > > > > There has, however, been relatively lettle discussion on this. > > I'm curious what opinions others have. > > Jim, I don't see your arguements. > > How is n.firstChild less efficent the n.get_firstChild() ? It's not if you constrain the implementation to store the first child. If an implemantaion chooses not to store the first child indepenent of the chidren, then the implementatin must implement __getattr__. It's worse for (the few) settable attributes, because the implementation *must* implement the attributes as stored attributes or implement __setattr__. > In the first, you modfy appendChild, et al and at the end put in if > self.childNodes[0] == newNode: self.firstChild = newNode > > In the second you do a "return self.childNodes[0]" > > I don't see a major memory or speed difference? You can do the same for > all other attributes. Oh? What about parentNode and previousSibling? If you store these directly, then you introduce a circular reference. > I don't see how accessors call get around circular references either. > Believe me we have tried with this one. We have come up with a few > schemes in our time, proxied nodes and such, but nothing that made it > worth the overhead. Its much simplier/efficient to have a utility > function to clean up a tree if you need it too. You may think that this is the right design tradeoff, but I don't agree with you. You can get around the circular references with Acquisition and with a similar wrapper-based technique. My DOM implementation for StructuredTextNG uses a wrapper-based approach that avoids circular references without using Acquisition. It is straightforward to implement accessor functions regardless of the underlying design, while implementation of attribute access is complicated and inefficient when designs that avoid circular references are used. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From jim@digicool.com Tue Jun 27 15:34:00 2000 From: jim@digicool.com (Jim Fulton) Date: Tue, 27 Jun 2000 10:34:00 -0400 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <200006270036.SAA02732@localhost.localdomain> Message-ID: <3958BB58.36DFD589@digicool.com> Uche Ogbuji wrote: > > > Mike Olson wrote: > > > > > > Jim Fulton wrote: > > > > > > > > > > > > > > Actually, the DOM can be mapped into a language in a manner that does > > > > > not follow directly from the IDL and CORBA specs. That's why there is a > > > > > formally defined java binding rather than just a reference to the IDL > > > > > specs. Historically, though, 4DOM was really a CORBA tool so it really > > > > > needed to follow the specs. > > > > > > > > Whatever we do, there needs to be a document somewhere that > > > > says what the Python DOM mapping is, even if it is not > > > > much more than a reference to the DOM IDL and the Python > > > > binding. > > > > > > In 4DOM, we are actually moving away from __getattr__ (for speed). > > > > IMO, this is strong evidence that the Python DOM should > > *not* use attributes for implementing the DOM/IDL attributes. > > Not so fast. We've mostly solved the speed problem. And we could solve a > good deal more of it by getting rid of accessor/mutator functions. Could you elaborate on this? Who is we? I'm implementing a DOM for StructuredTextNG and I can't see a way to avoid using getattr if I want to avoid circular references. How have you solved the speed problem for me? In general, I'd like it to be relatively easy to implement DOM on non-xml-specific objects that I want to process with DOM-aware tools, like 4XSLT. Requiring that these objects implement the DOM attributes as actual attributes or that the objects implement getattr seems pretty burdonsome to me. > This whole argument actually makes mandating only attributes more attractive > to me. I can live with whatever we decide (although I have no idea how we go about coming to a decision ;), however, the discussion has convinced me that the DOM API should map DOM attributes to Python attributes. Sigh. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From jim@digicool.com Tue Jun 27 16:03:07 2000 From: jim@digicool.com (Jim Fulton) Date: Tue, 27 Jun 2000 11:03:07 -0400 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <3953A717.5289DCC8@digicool.com> <14675.49532.375238.979659@mailhost.beopen.com> <3953E20C.6239D946@prescod.net> <39576435.36751D54@digicool.com> <39577682.5269D601@prescod.net> Message-ID: <3958C22B.22C65ABF@digicool.com> Paul Prescod wrote: > > Jim Fulton wrote: > > > > ... > > > > Yup, however __getattr__ is a pain to utilize unless you have alot of > > infrustructure. Zope has support for computed attributes, which makes > > this pretty sane, especially for read-only attributes. > > a) I think all that you need is a base class. Minidom uses one and it > seems to work. Anyhow, inherting from "node" is good practice in any DOM > extension framework. This is fine if your classes *only* want to implement DOM or other related APIs. I'd like to be able to add support for the DOM API to objects *without* making objects DOM specific. For example, I'd like to provide DOM support in most, if not all Zope objects. These objects altready implement __getattr__ for other purposes. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From fdrake@beopen.com Tue Jun 27 16:04:25 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Tue, 27 Jun 2000 08:04:25 -0700 (PDT) Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: <3958B5E1.F82BC081@digicool.com> References: <39577CE9.E391E8E0@prescod.net> <14679.55053.6188.621688@mailhost.beopen.com> <3958B5E1.F82BC081@digicool.com> Message-ID: <14680.49785.762419.861483@mailhost.beopen.com> Fred L. Drake, Jr. wrote: > The big issue there is the legacy code. Jim Fulton writes: > Is there much? *I* don't have much and would be happy to convert. But I'm more interested in hearing the answer to this question from the people doing XML work for a living -- I suspect their answer may be *very* different! > Is it late in the game? From the evidence on the XML-SIG > pages and the discussion here, it appears to me that there > is not a defined Python DOM mapping. Some people think that > it provides direct attribute, others seem to thing it provides > access based on both. Considering that all this has been hashed out several times, I'd say it is. The lack of a DOM mapping document is more because everyone is ultra-busy than because the matter hasn't been considered. If no one has started on a mapping document, I'll be glad to start one based on the W3C recommendation and what's in the PyXML tree. A question for the 4Suite team: how did you deal with the IDL roblems when CORBA support was a primary requirement for 4DOM? Did you use your own IDL derived from the Java description, or what? I'm surprised that the problems of the published W3C IDL have only now been mentioned here. -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From jim@digicool.com Tue Jun 27 16:17:32 2000 From: jim@digicool.com (Jim Fulton) Date: Tue, 27 Jun 2000 11:17:32 -0400 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <3953A717.5289DCC8@digicool.com> <14675.49532.375238.979659@mailhost.beopen.com> <3953E20C.6239D946@prescod.net> <39576435.36751D54@digicool.com> <395788A6.8C815B57@FourThought.com> <395798C5.682588A@digicool.com> <3957A275.FF220A1C@prescod.net> Message-ID: <3958C58C.8CA55519@digicool.com> Paul Prescod wrote: > > Jim Fulton wrote: > > > > > > > In 4DOM, we are actually moving away from __getattr__ (for speed). > > > > IMO, this is strong evidence that the Python DOM should > > *not* use attributes for implementing the DOM/IDL attributes > > Direct attribute access is MUCH, MUCH faster than a method call, whethr > through getattr or not. Under the current implementation we have the > option of caching attributes for performance. An all method design would > take that away. Minidom uses direct attributes for 95% of what it does. > I think I used one getter method to lazily evaluate attributes. Does minidom create circular references? I don't see how you could use direct attributes for attributes like parentNode and previousSibling without creating circularreferences/ > > I'd like it to be as easy as possible for various objects to implement > > the DOM. (See for example StructuredTextNG.) I'd hate to make implementers > > go through the pain and performance hit of getattr or dictate an implementation > > (like caching attributes or otherwise directly storing them, creating > > memory leaks). > > I guess the question is who do we cater to? Heretofore it has been DOM > users first, DOM implementors second. I don't think that we should turn > that around based on the argument that all Python objects will have a > DOM interface soon. I agree that this is a tradeoff. I don't think we should dismiss implementation effort. > To me, this looks like Python: > > a=b.childNodes[0].attributes["abc"] > > and this looks like Java: > > a=b.getChildNodes()[0].getAttributes()["abc"] To me, both look like Python. > The second grates on me as having interface enforced because of > implementation limitations. (which is what all of this griping about > getattr being slow boils down to...isn't it better to fix that problem > once, for everyone than to work around it a hundred times?) It also > drives me crazy that the latter always invokes a method call even when > it is stored underneath as a simple attribute. The point of an interface is to avoid dictating an implementation. DOM is certainly an API that cries for alternate implementations. > Surely there is some imaginative way to make life easier for your > implementors using base classes. For instance, wouldn't it be nice for > you to automatically set up the attributes list based on Python > attributes? Something like: > > def __getattr__( self, name ): > if name=="attributes": > keys=self.__dict__.keys() > values=map( str, self.__dict__.values() > return JimsAttributeList( keys, values ) > > Encourage them to subclass from you but add some value that they > wouldn't get otherwise. __getattr__ is notoriously difficult to mix. I might want to mix this with other classes that *also* want to use __getattr__. __getattr__-based implemantations are likely to be too slow, as Mike pointed out. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From jim@digicool.com Tue Jun 27 16:18:29 2000 From: jim@digicool.com (Jim Fulton) Date: Tue, 27 Jun 2000 11:18:29 -0400 Subject: [XML-SIG] The '_' thingy References: <200006271411.IAA04994@localhost.localdomain> Message-ID: <3958C5C5.26147A0B@digicool.com> Uche Ogbuji wrote: > > > Mike Olson wrote: > > > > > > So, I think I see this as a general concensius: > > > > Are you kidding? > > > > > 1. DOM will never (in forseeable future) be used over an ORB, so the > > > IDL should be used as a guide. > > > > Uh, this doesn't make sense. > > Please elaborate. Mike says that DOM won't be used over an ORB and yet he says that (CORBA) IDL idle *should* be used as a guide. If we don't care about CORBA, why be guided by UDL? > > > We should focus more on useability then > > > CORBA compliance. > > > > > > 2. Most people will access the DOM via attributes. > > > > Who says? What do you have to support this? Most people > > will access the DOM through whatever interface we define. > > Mike's support is that back when this SIG agreed upon attribute access as well > as _get/_set ops, most people said they'd prefer to just use plain attribute > access anyway, and the _get/_set was only needed for completeness. > > Do you have any support to contradict his assertion? In the discussion over the last few days, I've seen alot of confusion over what the Python DOM API actually is. It's hard to believe that real consensus exists in such an environment. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From jerome@IDEALX.com Tue Jun 27 16:14:50 2000 From: jerome@IDEALX.com (J�r�me Marant) Date: 27 Jun 2000 17:14:50 +0200 Subject: [XML-SIG] Test Message-ID: <64hfafgnpx.fsf@amboise.ird.idealx.com> Y suis-je ?=20 --=20 J=E9r=F4me Marant ----------------------------------------------------------- | IDEALX - Open Source Engineering / Ing=E9nierie Open Source | | http://IDEALX.com | ----------------------------------------------------------- From uogbuji@fourthought.com Tue Jun 27 16:21:52 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 27 Jun 2000 09:21:52 -0600 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: Message from Uche Ogbuji of "Mon, 26 Jun 2000 18:47:50 MDT." <200006270047.SAA02755@localhost.localdomain> Message-ID: <200006271521.JAA05260@localhost.localdomain> > Tom Passim: > > > Jim Fulton continued the attributes thread - > > > > I still don't see why anyone is still arguing about whether the DOM rec > > makes Python use attributes. I doesn't. In fact, it says that what are > > called "attributes" in the IDL definitions are NOT supposed to be attributes > > in implementations, and that the get/set accessor functions don't have to > > store/retrieve from actual objects, let alone attributes of objects. > > > > So can we at least lay this part of it to rest? Now if most people think it > > is more 'Pythonic' to use attributes, or if there are clearcut performance > > benefits, then we have a basis for discussion. But let's quit talking about > > whether the DOM rec makes us do attributes. > > Now I have no idea what you lot are arguing. The first argument was against > leading underscore because it's "not Python idiom". The point was made that > we should simply cock a snook at the Python/CORBA binding. Once that point > was allowed, the same lot are arguing against using attributes, which are > indisputable Python idiom on the grounds that it goes against the spirit of > the W3C spec. > > I hope I can be blunt without antagonism, but it seems as if a particular goal > is in mind: i.e. DOM attribute access through accessor/mutators only, and any > available argument is being thrown at that goal. > > I'll note that I claim to have no agenda except to do what's sensible for > Python and DOM (we've already put a great deal of work into making 4DOM > conform to the earlier list consensus, and we could put in more work if it > made sense.) > > The course that does make sense is to allow attribute access only because it's > most Pythonic. I think I was misunderstanding matters. I was genuinely confused at what was going on when I wrote the above message (the weird lag problems on the list don't help one bit), but I now think I understand at least Jim's train of argument (and respect it thoroughly). As I see it, there are three arguments: 1 the leading underscore in the Python/CORBA binding is non-pythonic and we should avoid following it 2 it is desirable to respect the W3C's use of IDL, even if this somewhat contradicts one above 3 it is easier for implementors to map DOM attributes to accessor/mutator methods The jump from 1 to 2 confused me because it seemed contradictory, but I see now that the core argument seems to be 3, with 1 and 2 just consequences of 3. I still disagree with 3 because I think users' convenience is more important than implementors', but that's a fair argument. I apologize for any misunderstanding. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From jim@digicool.com Tue Jun 27 16:13:45 2000 From: jim@digicool.com (Jim Fulton) Date: Tue, 27 Jun 2000 11:13:45 -0400 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <200006270047.SAA02755@localhost.localdomain> Message-ID: <3958C4A9.2283FD0C@digicool.com> Uche Ogbuji wrote: > > Tom Passim: > > > Jim Fulton continued the attributes thread - > > > > I still don't see why anyone is still arguing about whether the DOM rec > > makes Python use attributes. I doesn't. In fact, it says that what are > > called "attributes" in the IDL definitions are NOT supposed to be attributes > > in implementations, and that the get/set accessor functions don't have to > > store/retrieve from actual objects, let alone attributes of objects. > > > > So can we at least lay this part of it to rest? Now if most people think it > > is more 'Pythonic' to use attributes, or if there are clearcut performance > > benefits, then we have a basis for discussion. But let's quit talking about > > whether the DOM rec makes us do attributes. > > Now I have no idea what you lot are arguing. The first argument was against > leading underscore because it's "not Python idiom". The point was made that > we should simply cock a snook at the Python/CORBA binding. Once that point > was allowed, the same lot are arguing against using attributes, which are > indisputable Python idiom on the grounds that it goes against the spirit of > the W3C spec. People who read the W3C spec have reason to expect accessor functions. This is a resonable argument for them. While this argument isn't necessarily conclusive, there's no reason to dismiss it. I think a much stronger argument is that requireing that DOM attributes be published as Python attributes puts an undue burden on DOM implementors. The abstraction of attribute access to allow computation *is* a Python idiom, but so is use of accessor functions. Avoidance of circular references is *certainly* a common Python design practice. There are techniques for implementing the DOM API that avoid circular references that require compting some DOM attributes. Requiring that DOM attributes be published as Python attributes makes this harder. > I hope I can be blunt without antagonism, but it seems as if a particular goal > is in mind: i.e. DOM attribute access through accessor/mutators only, and any > available argument is being thrown at that goal. Is there something wrong with presenting arguments to support a position? > I'll note that I claim to have no agenda But you have an opinion, which is fine. > except to do what's sensible for > Python and DOM (we've already put a great deal of work into making 4DOM > conform to the earlier list consensus, and we could put in more work if it > made sense.) I question the value of list consensus without follow through. I, as a DOM implementor have no way of knowing what the list consensus was unless I happened to be paying attention at the time, which I wasn't. If there was consensus, then it should be published. When I asked about the API a few days ago, there didn't seem to me to be much consensus about what the Python DOM API actually is. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From jim@digicool.com Tue Jun 27 16:33:06 2000 From: jim@digicool.com (Jim Fulton) Date: Tue, 27 Jun 2000 11:33:06 -0400 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <3953A717.5289DCC8@digicool.com> <14675.49532.375238.979659@mailhost.beopen.com> <3953E20C.6239D946@prescod.net> <39576435.36751D54@digicool.com> <395788A6.8C815B57@FourThought.com> <395798C5.682588A@digicool.com> <3957A920.C8A6CB76@FourThought.com> <3957B720.9C6768D6@digicool.com> <00a701bfdfc7$7fcb9e80$7cac1218@reston1.va.home.com> <39589C3F.79599D64@prescod.net> Message-ID: <3958C932.9FAD143C@digicool.com> Paul Prescod wrote: > > tpassin@home.com wrote: > > > > Jim Fulton continued the attributes thread - > > > > I still don't see why anyone is still arguing about whether the DOM rec > > makes Python use attributes. It doesn't. > > Nobody is arguing that. Some people *were* arguing that the DOM rec > mandates the use of methods (or, more precisely, that DOM ID + Python > IDL mapping = methods). But the DOM IDL is clearly not normative because > it doesn't even parse as IDL. So we can put that argument to bed. We > need to make the decision on technical and aesthetic merits. Thanks for laying the issue out in a nice table. Sorry for messing it up. Some people have argued that a decision was already made some time ago, and that we shouldn't have to cover old ground. (Somebody eaven sent me a pointer to the discussions, which I've missplaced. :() If a decision *really* was made, then I'm willing to abide by the decision. I have no idea how SIG decisions are made unless Guido gets involved ..... In any case, a decision needs to be published and actually known to have any value. > Attributes: > * arguably more Pythonic (=easier to use) I think that this extremely arguable. Many people would argue that it's less OO and, therefore less Pythonic. I obviously (from other work) think computed attributes are OK, however, I think they have significant downsides, especially for an API that we might want people to implement. + easier to use This should be added as a separate bullet. > * faster for non-computed attributes > * slower for computed attributes > * more like Javascript, VB and COM-like languages (C# :) ) Who cares? + much harder to implement for computed attribues > Methods: > * slower for non-computed attributes > * faster for computed attributes > * harder to implement Do you really mean this? What's so hard about: def getFoo(self): return self.foo ??? > * more like Java Who cares unless it is *less* like Python, which it isn't, IMO. Here is an additional, admitadly Zope-specific argument for accessor methods: + Easier to assign access meta-data, such as security assertions. In Zope, we can assign attributes to methods, which allows us more control over access. Accessor functions are much more friendly to the Zope security machinery. Obviously, the API should not be giverned soley by Zope's needs. > There are no killer arguments here, just different weights applied to > the various features. I don't think that we are going to agree to break > code today. Maybe later we'll see that there are more DOM implementors > than clients and their ease of implementation will take precedence. Do you really think that "x.getY()" is significantly harder to use than "x.y"? I can definately tell you that computed attributes are very significantly much harder to implement than accessor functions. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From jim@digicool.com Tue Jun 27 16:49:36 2000 From: jim@digicool.com (Jim Fulton) Date: Tue, 27 Jun 2000 11:49:36 -0400 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <39577CE9.E391E8E0@prescod.net> <14679.55053.6188.621688@mailhost.beopen.com> <3958B5E1.F82BC081@digicool.com> <14680.49785.762419.861483@mailhost.beopen.com> Message-ID: <3958CD10.5A36C42A@digicool.com> "Fred L. Drake, Jr." wrote: > > Fred L. Drake, Jr. wrote: > > The big issue there is the legacy code. > > Jim Fulton writes: > > Is there much? > > *I* don't have much and would be happy to convert. But I'm more > interested in hearing the answer to this question from the people > doing XML work for a living -- I suspect their answer may be *very* > different! > > > Is it late in the game? From the evidence on the XML-SIG > > pages and the discussion here, it appears to me that there > > is not a defined Python DOM mapping. Some people think that > > it provides direct attribute, others seem to thing it provides > > access based on both. > > Considering that all this has been hashed out several times, I'd say > it is. I'll be willing to give on this, however, I assert that a decision isn't a decision unless it's published. It doesn't help when the people who made the decision can't seem to remember what it was. Supposedly, the decision was that DOM attributes are accessed as ordinary Python attributes, as in:: foo.nodeName yet several people seemed to think that attributes are obtained via accessor functions: foo._get_nodeName() You argued that this was appropriate based on IDL conformance. Why make this argument if the decision was to use attribute access? Or was it? What was the decision anyway? :) > The lack of a DOM mapping document is more because everyone is > ultra-busy than because the matter hasn't been considered. If the decision is important, someone should find the time to publish it. Presumably people found time to write code. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From case@appliedtheory.com Tue Jun 27 16:49:32 2000 From: case@appliedtheory.com (Benjamin Saller) Date: Tue, 27 Jun 2000 11:49:32 -0400 (EDT) Subject: [XML-SIG] Interested in feedback In-Reply-To: Message-ID: Today, Robin Becker wrote: In article , Benjamin Saller writes > >I am not really sure how to prefix this. I have something that is while >not general purpose does seem simple to use in the 80% case. If people >disagree, or can think of things that might make this better I would like >some feedback. .... I like this a lot. I changed the parse method to def parse(self, fn): if type(fn) is StringType: self._parser.parseFile(open(fn)) else: self._parser.parseFile(fn) return self._handler.object() I think this is a good change. I will add that. and added this to the bottom to make the module self testing. I would prefer it if the name were all lower case as that makes life slightly more robust with win32. The filenames or the class names? I thought the BiCap scheme was common for C++ which is common for win32? As for the testing I didn't release it, but I have a PyUnit testsuite that goes along with it. The base functionallity is all covered by tests. I have made some incremental improvements over the posted version, which are checked into cvs at: (as part of a larger project) http://cvs.sourceforge.net/cgi-bin/cvsweb.cgi/src/xmlConfig/?cvsroot=PASS -- Benjamin Saller Technical Strategist AppliedTheory Where tire hits pavement on the Information super-highway, that's where my head is... From mclay@nist.gov Tue Jun 27 18:48:51 2000 From: mclay@nist.gov (Michael McLay) Date: Tue, 27 Jun 2000 13:48:51 -0400 (EDT) Subject: [XML-SIG] Reconsidering the DOM API Message-ID: <14680.59651.130151.140876@fermi.eeel.nist.gov> Paul Prescod writes: > tpassin@home.com wrote: > > > > Jim Fulton continued the attributes thread - > > > > I still don't see why anyone is still arguing about whether the DOM rec > > makes Python use attributes. It doesn't. > > Nobody is arguing that. Some people *were* arguing that the DOM rec > mandates the use of methods (or, more precisely, that DOM ID + Python > IDL mapping = methods). But the DOM IDL is clearly not normative because > it doesn't even parse as IDL. So we can put that argument to bed. We > need to make the decision on technical and aesthetic merits. > > Attributes: > * arguably more Pythonic (=easier to use) In your last post you gave the example: > a=b.childNodes[0].attributes["abc"] > > and this looks like Java: > > a=b.getChildNodes()[0].getAttributes()["abc"] Why not use the follows notation: a=b.get_childNode(0).get_attribute("abc") or perhaps the call chain should be reduced by merging methods: c = b.get(childNode=0, attribute="abc") > * faster for non-computed attributes > * slower for computed attributes > * more like Javascript, VB and COM-like languages (C# :) ) > > Methods: > * slower for non-computed attributes > * faster for computed attributes > * harder to implement > * more like Java The implementation details would favor one choice over another so perhaps these aren't the appropriate metrics to use in making the decision. > There are no killer arguments here, just different weights applied to > the various features. I don't think that we are going to agree to break > code today. Maybe later we'll see that there are more DOM implementors > than clients and their ease of implementation will take precedence. The standard Python interface may end up having to support both access approaches (direct and through methods), which will really make the interface ugly. If we had to choose one, which one will allow the greater flexiblity? A methods based interface provides a level of abstraction that easily allows for future changes in the underlying implementation. Direct attribute access will require the future reimplementation to use getattr and setattr to hide the implementation details from the API. If making the interface more Pythonic is a priority then should keyword arguments be considered? Python is distinguished from most other languges by this feature. Does some variation on the following make the Pythonic DOM easier to use? c = b.get(childNode=0, attribute="abc") # returns a specific attribute c = b.get(childNode=0) # returns a dictionary of attributes c = b.get(attribute="abc") # return all childNodes with # an "abc" attribute c = b.set(childNode=0, attribute="abc",value="1.0") Benjamin Saller "getValues" idea looks interesting. Perhaps it is time to step back and ask how easy XML could be if the Python interface had nothing to do with SAX or DOM. Of course we could have two different modules available in the xml package: xml.dom_by_attributes xml.dom_by_methods From fwang2@yahoo.com Tue Jun 27 17:00:33 2000 From: fwang2@yahoo.com (oliver) Date: Tue, 27 Jun 2000 12:00:33 -0400 (EDT) Subject: [XML-SIG] will SAX be suitable for this task? Message-ID: hi, Being a pretty much beginer of both python and XML, I am facing the choice of which API I should use for the following task and hope to get some help. I have a potentially very large data file, composed by a fixed number type of format, each type can appear multiple times. The task involves searching a certain record (with certain criteria) and then searching backward or forward from that point (depending on the situation). My impression with SAX is that it is more suitable for one-time pass processing, there is no "stoping-point" where you can go backward. Am I right? I don't know much about DOM, but reading file to memeory and manipulating a tree sounds like a overkill and may not be practical when the file size is very large. Since all the comments are speculation not first hand experience, please correct me if I am wrong, and any suggestions are also appreciated. Thanks oliver From molson@fourthought.com Tue Jun 27 18:02:05 2000 From: molson@fourthought.com (Mike.Olson) Date: Tue, 27 Jun 2000 11:02:05 -0600 (MDT) Subject: [XML-SIG] The '_' thingy In-Reply-To: <3958C5C5.26147A0B@digicool.com> Message-ID: On Tue, 27 Jun 2000, Jim Fulton wrote: > Uche Ogbuji wrote: > > > > > Mike Olson wrote: > > > > > > Mike says that DOM won't be used over an ORB and yet > he says that (CORBA) IDL idle *should* be used as a guide. > If we don't care about CORBA, why be guided by UDL? I meant that we should use the IDL as an interface guide (ie we need to support parentNode in some way), however it doesn't need to be an attribute, and it doesn't need to be a method. Mike From molson@fourthought.com Tue Jun 27 18:06:47 2000 From: molson@fourthought.com (Mike.Olson) Date: Tue, 27 Jun 2000 11:06:47 -0600 (MDT) Subject: [XML-SIG] The '_' thingy In-Reply-To: <200006271316.PAA21142@statistik.cinetic.de> Message-ID: On Tue, 27 Jun 2000, Juergen Hermann wrote: > On Mon, 26 Jun 2000 18:40:31 -0400, Jim Fulton wrote: > > I'd prefer to have both. Those people that need, know about and care for > the speedier accessor functions can use those. Those that simply want an > easy interface can use attribute style. The only problem I see with this, is in some implementations (Jim's) function access would be quicker because that would be the native access method, while in 4DOM, attribute access would be quicker. Probably minor and covered with good documentation though. Mike From paul@prescod.net Tue Jun 27 20:11:14 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 27 Jun 2000 12:11:14 -0700 Subject: [XML-SIG] Reconsidering the DOM API References: <14680.59651.130151.140876@fermi.eeel.nist.gov> Message-ID: <3958FC52.A747A970@prescod.net> Michael McLay wrote: > > .. > > a=b.get_childNode(0).get_attribute("abc") How do you get the list of childnodes and attributes? > or perhaps the call chain should be reduced by merging methods: > > c = b.get(childNode=0, attribute="abc") How about childNodes[0].childNodes[1].childNodes[0].attributes["abc"] > The standard Python interface may end up having to support both access > approaches (direct and through methods), which will really make the > interface ugly. That's the status quo!!!! > If we had to choose one, which one will allow the > greater flexiblity? Neither. Python attribute accesses can really be method calls and python method calls can really be attribute accesses. That's why I love Python. > Benjamin Saller "getValues" idea looks interesting. Perhaps it is > time to step back and ask how easy XML could be if the Python > interface had nothing to do with SAX or DOM. We have already done so. I've seen around 7 or 8 APIs for Python XML. There was no *general purpose* API that was significantly easier than the DOM. If you restrict your problem domain (e.g. no editing, no character data, etc.) then you can make easier APIs. The whole reason we are arguing about this is because some people like the DOM *so much* that they want to apply it to non-XML information like ZObjects. All we disagree on is trivial syntactic issues. In my opinion, going back to first principles would be, well, going backwards. -- Paul Prescod - Not encumbered by corporate consensus When George Bush entered office, a Washington Post-ABC News poll found that 62 percent of Americans "would be willing to give up a few of the freedoms we have" for the war effort. They have gotten their wish. - "This is your bill of rights...on drugs", Harpers, Dec. 1999 From paul@prescod.net Tue Jun 27 20:12:03 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 27 Jun 2000 12:12:03 -0700 Subject: [XML-SIG] will SAX be suitable for this task? References: Message-ID: <3958FC83.39CD7BE7@prescod.net> Without pre-indexing or tree-building, an XML file is simply not a random access data structure and cannot be processed in that way. No API will allow you to go forward and then go backwards unless the API is buffering and in that case, it will only be able to go backwards some fixed number of records. Of course if your buffer is "the whole document" (i.e. DOM) then you won't have any problems with going backwards but you will have performance problems. If you know in advance the maximum amount you need to go back, then you could use PullDOM. Otherwise, I would suggest a two-pass algorithm. -- Paul Prescod - Not encumbered by corporate consensus When George Bush entered office, a Washington Post-ABC News poll found that 62 percent of Americans "would be willing to give up a few of the freedoms we have" for the war effort. They have gotten their wish. - "This is your bill of rights...on drugs", Harpers, Dec. 1999 From paul@prescod.net Tue Jun 27 19:51:08 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 27 Jun 2000 11:51:08 -0700 Subject: [XML-SIG] Interested in feedback References: Message-ID: <3958F79B.266B0129@prescod.net> Some interesting ideas...to a certain extent, though you should watch out for reinventing the wheel. Benjamin Saller wrote: > > xml.getValue("container1.allow.host") # == 'loki' > xml.getValue("container2.allow.host[1]") # == 'foo.bar.com' What happens if you don't give an index but there are multiple sub-elements of a type? Do you get a list or the first one? The "." is a legal XML name character. You could have an element type named "container1.allow.host." "/" would be a better choice. If you do use "/" then you will have reinvented a tiny subset of XPath. No harm in that: you should probably look at the spec and try to rationalize your mini-language with it explicitly. I think that Python 1.7 should have an XPath subset in it and this is probably a good start. Also, insofar as you have an in-memory tree data structure, it might be better to use a DOM rather than building your own structure using SAX. -- Paul Prescod - Not encumbered by corporate consensus When George Bush entered office, a Washington Post-ABC News poll found that 62 percent of Americans "would be willing to give up a few of the freedoms we have" for the war effort. They have gotten their wish. - "This is your bill of rights...on drugs", Harpers, Dec. 1999 From akuchlin@mems-exchange.org Tue Jun 27 19:58:06 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Tue, 27 Jun 2000 14:58:06 -0400 Subject: [XML-SIG] Mail backlog clearing Message-ID: <20000627145806.I19033@kronos.cnri.reston.va.us> The mail backlog on python.org is now clearing. Some goofball at infonie.fr was running a stupid robot that kept continually hitting the Mailman pages, driving the CPU load through the roof; they now seem to have stopped. However, the mail seems to be showing up in random order, so the thread of discussion in the DOM/IDL thread is going to trickle into your mailbox in a confusing fashion. Some participants may seem to have multiple personalities as they assert X, then assert !X, then assert X again. You're better off reading the archive to get a clear idea of everyone's position. --amk From paul@prescod.net Tue Jun 27 20:29:35 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 27 Jun 2000 12:29:35 -0700 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <200006270047.SAA02755@localhost.localdomain> <3958C4A9.2283FD0C@digicool.com> Message-ID: <3959009F.4676F561@prescod.net> Jim Fulton wrote: > > ... > > The abstraction of attribute access to allow computation *is* a Python > idiom, but so is use of accessor functions. Avoidance of circular references > is *certainly* a common Python design practice. There are techniques for > implementing the DOM API that avoid circular references that require > compting some DOM attributes. Requiring that DOM attributes be published > as Python attributes makes this harder. Avoiding circular references in a DOM environment is hard and almost always has a negative impact on performance. If your implementors are "up to" doing all of this proxy magic, using attribute accessor functions will be comparitavely a snap. Or if *you* are implementing the proxy for them, then you can implement the attribute accessor functions for them. Note also that the Python bug of having trouble with circular references is scheduled to be "experimentally" fixed in 1.6 and totally fixed in 1.7. I am relucatant to design around it. If we put our heads together, the efficiency hit of accessor functions could also be solved in the 1.7 timeline (by making them a first-class language feature). > I question the value of list consensus without follow through. I, as a DOM > implementor have no way of knowing what the list consensus was unless I > happened to be paying attention at the time, which I wasn't. If there was > consensus, then it should be published. You are asking for a more formal process than is used in the Python world. I don't think that there isn't even formal documentation for the "file" interface used in hundreds of places. comp.lang.python and /lib are the documentation. In open source land, "go look at the code" is a vald answer (though sometimes suboptimal). > When I asked about the API a few days ago, there didn't seem to me > to be much consensus about what the Python DOM API actually is. I believe that there was consensus among people who have implemented DOMs. I expressed the opinion that I wasn't entirely happy with the leading underscore thing, but that's what I implemented in my DOM anyhow. As far as I know, the three Python DOMs all allowed the use of either methods or direct attribute access. -- Paul Prescod - Not encumbered by corporate consensus When George Bush entered office, a Washington Post-ABC News poll found that 62 percent of Americans "would be willing to give up a few of the freedoms we have" for the war effort. They have gotten their wish. - "This is your bill of rights...on drugs", Harpers, Dec. 1999 From paul@prescod.net Tue Jun 27 20:39:10 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 27 Jun 2000 12:39:10 -0700 Subject: [XML-SIG] Re: PyXML writer References: Message-ID: <395902DE.B38A96BF@prescod.net> Eric Freese wrote: > > Paul: > > Perhaps I'm missing something, but when I use the PyXML/xml/sax/writer.py > Xmlwriter class, the beginning PI does not include the version value in the > startDocument method. Is there some way to tell it to put that value out, > since I believe it is required? It's a bug here. Just fix it in your local copy and I'll get someone to fix the official verson. Put in the version declaration between the words "xml" and "encoding". def startDocument(self): if self.__syntax.pic == "?>": lit = self.__syntax.lit s = '%sxml encoding%s%siso-8859-1%s' % ( self.__syntax.pio, self.__syntax.vi, lit, lit) if self.__standalone: s = '%s standalone%s%s%s%s' % ( s, self.__syntax.vi, lit, self.__standalone, lit) self._write("%s%s\n" % (s, self.__syntax.pic)) This code must have been written by a Norwegian SGML hacker. :) -- Paul Prescod - Not encumbered by corporate consensus When George Bush entered office, a Washington Post-ABC News poll found that 62 percent of Americans "would be willing to give up a few of the freedoms we have" for the war effort. They have gotten their wish. - "This is your bill of rights...on drugs", Harpers, Dec. 1999 From paul@prescod.net Tue Jun 27 20:35:17 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 27 Jun 2000 12:35:17 -0700 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <39577CE9.E391E8E0@prescod.net> <14679.55053.6188.621688@mailhost.beopen.com> <3958B5E1.F82BC081@digicool.com> <14680.49785.762419.861483@mailhost.beopen.com> <3958CD10.5A36C42A@digicool.com> Message-ID: <395901F5.C989916F@prescod.net> Jim Fulton wrote: > > ... > > I'll be willing to give on this, however, I assert that > a decision isn't a decision unless it's published. Consider it published as soon as this email appears in the archives. :) > It doesn't help when the people who made the decision > can't seem to remember what it was. Supposedly, the > decision was that DOM attributes are accessed as ordinary > Python attributes, as in:: > > foo.nodeName > > yet several people seemed to think that attributes are obtained > via accessor functions: > > foo._get_nodeName() We agreed to support both. The former was more Pythonic and the latter was CORBA compliant. Implementing both is only a very slight bit more work than implementing one (not double the code or anything thanks to __getattr__) so we decided to be both CORBA compatible and "friendly". CORBA compliance has faded in importance so the getter function version could be phased out but the usability argument for attributes remains the same. -- Paul Prescod - Not encumbered by corporate consensus When George Bush entered office, a Washington Post-ABC News poll found that 62 percent of Americans "would be willing to give up a few of the freedoms we have" for the war effort. They have gotten their wish. - "This is your bill of rights...on drugs", Harpers, Dec. 1999 From uogbuji@fourthought.com Tue Jun 27 21:00:59 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 27 Jun 2000 14:00:59 -0600 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: Message from "Fred L. Drake, Jr." of "Tue, 27 Jun 2000 08:04:25 PDT." <14680.49785.762419.861483@mailhost.beopen.com> Message-ID: <200006272000.OAA06198@localhost.localdomain> > A question for the 4Suite team: how did you deal with the IDL > roblems when CORBA support was a primary requirement for 4DOM? Did > you use your own IDL derived from the Java description, or what? I'm > surprised that the problems of the published W3C IDL have only now > been mentioned here. We used our own version of the DOM IDL, with a few mods. Fnorb was easy, but for ILU we had to make liberal use of underscores to escape clashing names, although as Martin von Lowis has pointed out, most of the errors are for CORBA 2.3 only and were not errors for CORBA 2.2. Besides changing the few clashing names, the DOM IDL in general compiled just fine. I think people have been making far too much of the few errors name-clashes Duncan Grisby turned up. Note that Duncan Grisby maintains the unsurpassed omniORBpy, by far the most compliant 2.3 ORB for Python. It is far stricter than Fnorb/ILU, but only emerged after we took CORBA out of the 4DOM core (and, IIRC, after our earlier discussion on Python/DOM binding). -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uogbuji@fourthought.com Tue Jun 27 21:06:00 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 27 Jun 2000 14:06:00 -0600 Subject: [XML-SIG] The '_' thingy In-Reply-To: Message from Jim Fulton of "Tue, 27 Jun 2000 11:18:29 EDT." <3958C5C5.26147A0B@digicool.com> Message-ID: <200006272006.OAA06220@localhost.localdomain> > Uche Ogbuji wrote: > > > > > Mike Olson wrote: > > > > > > > > So, I think I see this as a general concensius: > > > > > > Are you kidding? > > > > > > > 1. DOM will never (in forseeable future) be used over an ORB, so the > > > > IDL should be used as a guide. > > > > > > Uh, this doesn't make sense. > > > > Please elaborate. > > Mike says that DOM won't be used over an ORB and yet > he says that (CORBA) IDL idle *should* be used as a guide. > If we don't care about CORBA, why be guided by UDL? I think you mis-read him. I think by "guide", he means that the IDL should be a guiding formalism, not that the API must cooincide with what you get when you compile the IDL. I thought we'd all got on the same side of this. > > > > We should focus more on useability then > > > > CORBA compliance. > > > > > > > > 2. Most people will access the DOM via attributes. > > > > > > Who says? What do you have to support this? Most people > > > will access the DOM through whatever interface we define. > > > > Mike's support is that back when this SIG agreed upon attribute access as well > > as _get/_set ops, most people said they'd prefer to just use plain attribute > > access anyway, and the _get/_set was only needed for completeness. > > > > Do you have any support to contradict his assertion? > > In the discussion over the last few days, I've seen alot of confusion > over what the Python DOM API actually is. It's hard to believe that > real consensus exists in such an environment. If I understand how strict you're setting up the conditions for "real consensus" to be, I hardly thing there is any area of Python development (or of development in general) where you'll find real consensus. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From case@appliedtheory.com Tue Jun 27 21:08:13 2000 From: case@appliedtheory.com (Benjamin Saller) Date: Tue, 27 Jun 2000 16:08:13 -0400 (EDT) Subject: [XML-SIG] Interested in feedback In-Reply-To: <3958F79B.266B0129@prescod.net> Message-ID: Today, Paul Prescod wrote: Some interesting ideas...to a certain extent, though you should watch out for reinventing the wheel. Benjamin Saller wrote: > > xml.getValue("container1.allow.host") # == 'loki' > xml.getValue("container2.allow.host[1]") # == 'foo.bar.com' What happens if you don't give an index but there are multiple sub-elements of a type? Do you get a list or the first one? You get the first element of a set on a call to getValue. If your code supports multiple values getValues is your friend and can return a list of typed data. The "." is a legal XML name character. You could have an element type named "container1.allow.host." "/" would be a better choice. Thats a good point. Making that switch would make sense at this point. If you do use "/" then you will have reinvented a tiny subset of XPath. No harm in that: you should probably look at the spec and try to rationalize your mini-language with it explicitly. I think that Python 1.7 should have an XPath subset in it and this is probably a good start. Also, insofar as you have an in-memory tree data structure, it might be better to use a DOM rather than building your own structure using SAX. I think that XPath is a more approachable idea than navigating a tree of child nodes in a dom tree. Given that I don't expose any of that I thought the DOM might be a bit heavy weight for very simple way it would be used. When I started using XML in python I was really surprised to find that there was no easy way to navigate an XML document. My hope was that even if the implementation was questionable or didn't reuse all the tools in PyXML people would see the idea and think about how to provide this interface to people. Thanks for your feedback. -- Benjamin Saller Technical Strategist AppliedTheory Where tire hits pavement on the Information super-highway, that's where my head is... From uogbuji@fourthought.com Tue Jun 27 21:12:25 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 27 Jun 2000 14:12:25 -0600 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: Message from Jim Fulton of "Tue, 27 Jun 2000 11:33:06 EDT." <3958C932.9FAD143C@digicool.com> Message-ID: <200006272012.OAA06234@localhost.localdomain> > > Attributes: > > * arguably more Pythonic (=easier to use) > > I think that this extremely arguable. Many people would argue > that it's less OO and, therefore less Pythonic. I obviously > (from other work) think computed attributes are OK, however, I > think they have significant downsides, especially for an API > that we might want people to implement. There are many features of Python that are not OO. That's why Python is quite popular among those (including me, as you know), who think that OO at any cost is dangerous. IMO, direct attribute access is very Pythonic, though it may not be C++-like or Java-like or Smalltalk-like... > > There are no killer arguments here, just different weights applied to > > the various features. I don't think that we are going to agree to break > > code today. Maybe later we'll see that there are more DOM implementors > > than clients and their ease of implementation will take precedence. > > Do you really think that "x.getY()" is significantly harder to use > than "x.y"? As a _heavy_ user of DOM (in XPath and XSLT as well as in client code), and having made the conversion from the former to the latter, I'd say "yes". It's also less readable and less natural for a good deal of the attributes. > I can definately tell you that computed attributes are very significantly > much harder to implement than accessor functions. They are harder, but again, having implemented them, not so much harder as to be forbidding. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uogbuji@fourthought.com Tue Jun 27 21:32:23 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 27 Jun 2000 14:32:23 -0600 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: Message from Jim Fulton of "Tue, 27 Jun 2000 11:49:36 EDT." <3958CD10.5A36C42A@digicool.com> Message-ID: <200006272032.OAA06271@localhost.localdomain> > I'll be willing to give on this, however, I assert that > a decision isn't a decision unless it's published. > > It doesn't help when the people who made the decision > can't seem to remember what it was. _Completely_ untrue. Can you point me to a post where someone said they don't kow what the decision was? I _can_ point you to posts where the consensus was summarized. See, for instance http://www.python.org/pipermail/xml-sig/1999-November/003281.html (and following posts) http://www.python.org/pipermail/xml-sig/1999-November/003313.html The whole affair started in the "4DOM future" thread and continued in the "foo.bar vs. foo.get_bar()", rounding out in the "CORBA compliance for the DOM in Python?" thread and other sundry conversations. > Supposedly, the > decision was that DOM attributes are accessed as ordinary > Python attributes, as in:: > > foo.nodeName Yes. > yet several people seemed to think that attributes are obtained > via accessor functions: > > foo._get_nodeName() Yes. The decision was "both". See the links. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From jim@digicool.com Tue Jun 27 21:34:45 2000 From: jim@digicool.com (Jim Fulton) Date: Tue, 27 Jun 2000 16:34:45 -0400 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <200006270047.SAA02755@localhost.localdomain> <3958C4A9.2283FD0C@digicool.com> <3959009F.4676F561@prescod.net> Message-ID: <39590FE4.518C2590@digicool.com> Paul Prescod wrote: > > Jim Fulton wrote: > > > > ... > > > > The abstraction of attribute access to allow computation *is* a Python > > idiom, but so is use of accessor functions. Avoidance of circular references > > is *certainly* a common Python design practice. There are techniques for > > implementing the DOM API that avoid circular references that require > > compting some DOM attributes. Requiring that DOM attributes be published > > as Python attributes makes this harder. > > Avoiding circular references in a DOM environment is hard and almost > always has a negative impact on performance. It's not *that* hard. I think that the StructuredTextNG DOM will provide an interesting examle of this. > If your implementors are > "up to" doing all of this proxy magic, using attribute accessor > functions will be comparitavely a snap. No, it won't, because __getattr__ doesn't mix well. I can add a bunch of new methods and mix them in without much effect on other mixins. This is not true of getattr. > Or if *you* are implementing the proxy for them, then you can implement > the attribute accessor functions for them. Not if they have their *own* __getattr__. > Note also that the Python bug of having trouble with circular references > is scheduled to be > "experimentally" fixed in 1.6 and totally fixed in 1.7. I am relucatant > to design around it. If we put our heads together, the efficiency hit of > accessor functions could also be solved in the 1.7 timeline (by making > them a first-class language feature). Circular teferences are harmful in other ways that just GC. For example, deep copy, and similar applications are certainly complicated by them. In any case, designing an API that encourages memory leaks in current Python seems like a pretty bad idea to me. > > I question the value of list consensus without follow through. I, as a DOM > > implementor have no way of knowing what the list consensus was unless I > > happened to be paying attention at the time, which I wasn't. If there was > > consensus, then it should be published. > > You are asking for a more formal process than is used in the Python > world. I don't think that there isn't even formal documentation for the > "file" interface used in hundreds of places. comp.lang.python and /lib > are the documentation. But there *is* formal documentation for sequence, mapping, and number interfaces. There's a DBI spec that provides an interface for relational databases. There's a ZODB Storage interface that allows people to implement alternate storages for ZODB. Any time you need interoperability, you need to define and document interfaces. > In open source land, "go look at the code" is a > vald answer (though sometimes suboptimal). Sorry, that just doesn't work. If I looked at the code in the XML distribution I'd have alot of trouble figuring out what the Python DOM API is. > > When I asked about the API a few days ago, there didn't seem to me > > to be much consensus about what the Python DOM API actually is. > > I believe that there was consensus among people who have implemented > DOMs. Hm, I really don't think I believe this. Then again, I'm still not sure I know what the Python DOM mapping is. I have a sense that the mapping calls for use of attribute syntax, yet people bothered to defend the accessor method spelling as though it was in the API. > I expressed the opinion that I wasn't entirely happy with the > leading underscore thing, but that's what I implemented in my DOM > anyhow. > As far as I know, the three Python DOMs all allowed the use of either > methods or direct attribute access. "As far as I know"? QED :) Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From uogbuji@fourthought.com Tue Jun 27 21:37:31 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 27 Jun 2000 14:37:31 -0600 Subject: [XML-SIG] Reconsidering the DOM API In-Reply-To: Message from Michael McLay of "Tue, 27 Jun 2000 13:48:51 EDT." <14680.59651.130151.140876@fermi.eeel.nist.gov> Message-ID: <200006272037.OAA06291@localhost.localdomain> > Eh, exactly what sort of "pot" is that? [snip] > > Attributes: > > * arguably more Pythonic (=easier to use) > > In your last post you gave the example: > > > a=b.childNodes[0].attributes["abc"] > > > > and this looks like Java: > > > > a=b.getChildNodes()[0].getAttributes()["abc"] > > Why not use the follows notation: > > a=b.get_childNode(0).get_attribute("abc") > > or perhaps the call chain should be reduced by merging methods: > > c = b.get(childNode=0, attribute="abc") This is why I asked the question above. Way psychedelic, dude. > > There are no killer arguments here, just different weights applied to > > the various features. I don't think that we are going to agree to break > > code today. Maybe later we'll see that there are more DOM implementors > > than clients and their ease of implementation will take precedence. > > The standard Python interface may end up having to support both access > approaches (direct and through methods), which will really make the > interface ugly. If we had to choose one, which one will allow the > greater flexiblity? That is how things stand now. 4DOM (and PyDOM) support both. > Benjamin Saller "getValues" idea looks interesting. Perhaps it is > time to step back and ask how easy XML could be if the Python > interface had nothing to do with SAX or DOM. But there's plenty of (good) effort in that direction already. It's orthogonal to the DOM mapping decision. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From dieter@handshake.de Tue Jun 27 18:31:52 2000 From: dieter@handshake.de (Dieter Maurer) Date: Tue, 27 Jun 2000 19:31:52 +0200 (CEST) Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: <14679.38180.290118.619833@fermi.eeel.nist.gov> References: <200006260037.SAA17412@localhost.localdomain> <14679.38180.290118.619833@fermi.eeel.nist.gov> Message-ID: <14680.58511.859304.862711@lindm.dm> Michael McLay writes: > The special meaning of _* is defined here: > http://www.python.org/doc/current/ref/id-classes.html This only defines "_*" as special for *module* variables. Here, we speak about the names of class methods/attributes and not module variables. Dieter From uogbuji@fourthought.com Tue Jun 27 21:48:49 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 27 Jun 2000 14:48:49 -0600 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: Message from Paul Prescod of "Tue, 27 Jun 2000 12:29:35 PDT." <3959009F.4676F561@prescod.net> Message-ID: <200006272048.OAA06336@localhost.localdomain> > Jim Fulton wrote: > > The abstraction of attribute access to allow computation *is* a Python > > idiom, but so is use of accessor functions. Avoidance of circular references > > is *certainly* a common Python design practice. There are techniques for > > implementing the DOM API that avoid circular references that require > > compting some DOM attributes. Requiring that DOM attributes be published > > as Python attributes makes this harder. > > Avoiding circular references in a DOM environment is hard and almost > always has a negative impact on performance. If your implementors are > "up to" doing all of this proxy magic, using attribute accessor > functions will be comparitavely a snap. > > Or if *you* are implementing the proxy for them, then you can implement > the attribute accessor functions for them. > > Note also that the Python bug of having trouble with circular references > is scheduled to be > "experimentally" fixed in 1.6 and totally fixed in 1.7. I am relucatant > to design around it. If we put our heads together, the efficiency hit of > accessor functions could also be solved in the 1.7 timeline (by making > them a first-class language feature). I'll note that IMO the whole circular references argument for a particular DOM implementation is a red herring. We have a huge project running at a client's: well over 500,000 lines of Python, long-running required, etc. It uses 4DOM and other code that can introduce circular references, but a few hours with Cyclops led us very quickly to any remaining memory leaks. It's not that hard of a problem with explicit freeing. Anyone running a long-running process, whether in C++, Java or Python, had better be sophisticated enough to be able to debug memory allocation errors, garbage-collection latency, and curcular references respectively. (As an aside, C++ has tools, Nu-Mega, Purify... to help and Python has tools: Plumbo, Cyclops to help... it looks as if Java is the only case in which you're just SOL; yet another reason for me to dislike Java). > > I question the value of list consensus without follow through. I, as a DOM > > implementor have no way of knowing what the list consensus was unless I > > happened to be paying attention at the time, which I wasn't. If there was > > consensus, then it should be published. > > You are asking for a more formal process than is used in the Python > world. I don't think that there isn't even formal documentation for the > "file" interface used in hundreds of places. comp.lang.python and /lib > are the documentation. In open source land, "go look at the code" is a > vald answer (though sometimes suboptimal). Quite agreed. I do agree that some effort at documenting would be nice, but that's the nature of the OSS beast. Pretty much everyone is a volunteer so as much documentation gets done as there are willing hands to do it. The only other way is to give one entity (say, Sun) a stranglehold on the process, and in that case they have to do all the work > > When I asked about the API a few days ago, there didn't seem to me > > to be much consensus about what the Python DOM API actually is. > > I believe that there was consensus among people who have implemented > DOMs. I expressed the opinion that I wasn't entirely happy with the > leading underscore thing, but that's what I implemented in my DOM > anyhow. > > As far as I know, the three Python DOMs all allowed the use of either > methods or direct attribute access. Yes. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From dieter@handshake.de Tue Jun 27 20:10:19 2000 From: dieter@handshake.de (Dieter Maurer) Date: Tue, 27 Jun 2000 21:10:19 +0200 (CEST) Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: <3957B720.9C6768D6@digicool.com> References: <3953A717.5289DCC8@digicool.com> <3957B720.9C6768D6@digicool.com> Message-ID: <14680.64297.110297.762192@lindm.dm> Jim Fulton writes: > We seem to be arguing two issues: > > - Whether to expose DOM attributes as Python attributes or > accessor functions, and > > - How to spell the accessor functions. > > If we go with accessor functions, which I think would be > a good idea, then the accessor functions should be > names in a way that is consistent with Python practice. Python, unlike Zope, does *not* treat *methods/attributes* with leading '_' specially. Only objects in modules with names starting with a '_' are in some way treated as private. Dieter From dieter@handshake.de Tue Jun 27 20:15:47 2000 From: dieter@handshake.de (Dieter Maurer) Date: Tue, 27 Jun 2000 21:15:47 +0200 (CEST) Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: <395789E2.A5A6D789@digicool.com> References: <200006251553.JAA13600@localhost.localdomain> <395789E2.A5A6D789@digicool.com> Message-ID: <14680.64552.90330.216017@lindm.dm> Jim Fulton writes: > Uche Ogbuji wrote: > > > > > Traditionally, Python attributes (including methods) with > > > names starting with '_' were treated as private. > > > > This is an informal tradition, not universal, and hardly normative. > > I disagree on two points. > > - It is not entirely informal: > > o import * from foo > imports only names that don't start with '_'. This affects only objects in modules. There is no such statement for class attributes/methods. > o Private attributes are based on a leading '_' > spelling Who says? > - Normative is hard to judge, but I think that this > is a pretty widely used practice. In fact, I do it to indicate that I do not expect that the attribute/method is used outside of the class or its derived classes. However, I would not blame the DO-SIG for using '_' prefixed accessor functions. Because the rule is never explicitely stated. Dieter From dieter@handshake.de Tue Jun 27 20:05:22 2000 From: dieter@handshake.de (Dieter Maurer) Date: Tue, 27 Jun 2000 21:05:22 +0200 (CEST) Subject: [XML-SIG] The '_' thingy In-Reply-To: <39578DBA.8FB64449@FourThought.com> References: <39578DBA.8FB64449@FourThought.com> Message-ID: <14680.63215.46641.68830@lindm.dm> Mike Olson writes: > So, I think I see this as a general concensius: I am not consent. > 1. DOM will never (in forseeable future) be used over an ORB, so the > IDL should be used as a guide. We should focus more on useability then > CORBA compliance. I would like very much that database suppliers (Oracle, Poet, Zope) would support DOM access to the objects in the database through CORBA. AFAIK, Oracle already uses an ORB to facilitate integration. > 2. Most people will access the DOM via attributes. I would not mind to use attributes. However, I would like that your experience and techniques to use Python attributes for IDL attributes is communicated to the DO-SIG and that the IDL->Python mapping uses attributes, too. > 3. We need a DOM language mapping document. I would prefer, that this documents specifies: take the DOM IDL specification, combine it with the IDL->Python mapping and you get the Python DOM API. > 4. Computed attribute callback function names should be left up to the > implementator (or do we want to define this). If we do define this, > then they should be private, and start with an '_' or two. You should not use "__". While "_" has no special meaning in *method/attribute* names (unlike module variable names!), "__" prefix makes the method/attribute really private, it can only be used inside the class and no longer (without special knowledge) from outside or even in a derived class. > If all are good with this, then we should start down this path. A > langauge mapping is something we can put into the next release of 4DOM > (something we've been meaning to do any ways). The rest of the cahnges > are actually in place (unless we define a different callback naming > convention). We will be slowly depricating _get_* soon as well. > However we will still need __setattr__ callbacks in some cases.... You may depricate it, but please continue to support it -- for people that know IDL and use the IDL->Python mapping. Dieter From dieter@handshake.de Tue Jun 27 20:20:29 2000 From: dieter@handshake.de (Dieter Maurer) Date: Tue, 27 Jun 2000 21:20:29 +0200 (CEST) Subject: [XML-SIG] Re: [DO-SIG] Python language bidning January 2000 Draft In-Reply-To: <3958B4D1.4EA25505@digicool.com> References: <200006270028.SAA02708@localhost.localdomain> <3958B4D1.4EA25505@digicool.com> Message-ID: <14680.64976.255884.423734@lindm.dm> Jim Fulton writes: > It is pretty clear that the Python DOM API should not be bound to > the Python CORBA bining, so I think we can excuse the do-sig from further > discussions. ;) It is not at all clear to me. DOM is specified with an IDL interface specification. There is an IDL->Python mapping. The most natural thing is to combine the two to obtain the Python DOM interface. All arguments we have to either use Python attributes for IDL attributes or to use accessors without leading '_' apply to other CORBA interfaces as well. Dieter From dieter@handshake.de Tue Jun 27 19:47:31 2000 From: dieter@handshake.de (Dieter Maurer) Date: Tue, 27 Jun 2000 20:47:31 +0200 (CEST) Subject: [XML-SIG] Re: [DO-SIG] Python language bidning January 2000 Draft In-Reply-To: <20000626122319.K29590@lyra.org> References: <20000626122319.K29590@lyra.org> Message-ID: <14680.62613.131407.313720@lindm.dm> Greg Stein writes: > On Mon, Jun 26, 2000 at 08:38:49AM -0700, Paul Prescod wrote: > >... > > So let's design for the market we know we have (Python programmers who > > want an easy API) and not the market that I don't think we have (people > > who want to use Python DOMs from other languages and other language DOMs > > from Python). Interoperability among Python DOMs is enough. Bridges to > > Java and Microsoft COM DOMs would also be useful (and easy to write). > > Well said! > > I "violently agree" :-) with this position. Who the heck is going to expect > their Python code to be compiled by a C++ compiler? The code simply is not > going to port. I disagree. Of cause, the same code will not work in Python and C++. However, when I look at the DOM recommendation, I see an IDL interface specification. I strongly favor that the Python DOM API is composed of this official standard document and an (official) IDL->Python mapping. All arguments, we give here for use of attributes or accessor function without leading '_', hold also for other IDL mappings. Thus, maybe change the IDL->Python mapping, but please keep the "Python-API = IDL->Python-Mapping(IDL-Spec)". Dieter From uogbuji@fourthought.com Tue Jun 27 22:09:34 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 27 Jun 2000 15:09:34 -0600 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: Message from Jim Fulton of "Tue, 27 Jun 2000 16:34:45 EDT." <39590FE4.518C2590@digicool.com> Message-ID: <200006272109.PAA06402@localhost.localdomain> > Circular teferences are harmful in other ways that just GC. > For example, deep copy, and similar applications are certainly complicated > by them. In any case, designing an API that encourages memory leaks > in current Python seems like a pretty bad idea to me. I hardly see how it encourages them as long as there is a documented method to avoiding them. > > In open source land, "go look at the code" is a > > vald answer (though sometimes suboptimal). > > Sorry, that just doesn't work. If I looked at the code in > the XML distribution I'd have alot of trouble figuring > out what the Python DOM API is. I disagree. I think it's quite obviuos in 4DOM and PyDOM. > > > > When I asked about the API a few days ago, there didn't seem to me > > > to be much consensus about what the Python DOM API actually is. > > > > I believe that there was consensus among people who have implemented > > DOMs. > > Hm, I really don't think I believe this. Then again, I'm still not > sure I know what the Python DOM mapping is. I have a sense that the > mapping calls for use of attribute syntax, yet people bothered to defend > the accessor method spelling as though it was in the API. I've posted the links. It's pretty clear. > > I expressed the opinion that I wasn't entirely happy with the > > leading underscore thing, but that's what I implemented in my DOM > > anyhow. > > > As far as I know, the three Python DOMs all allowed the use of either > > methods or direct attribute access. > > "As far as I know"? It's pretty idiomatic for this phrase to have none of the evil connotation I think you're attaching to it. And, yes, all three DOMs _do_ support both methods. > QED :) Nil demonstravis, in my opinion. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From paul@prescod.net Tue Jun 27 22:41:12 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 27 Jun 2000 14:41:12 -0700 Subject: [XML-SIG] Re: [DO-SIG] Python language bidning January 2000 Draft References: <20000626122319.K29590@lyra.org> <14680.62613.131407.313720@lindm.dm> Message-ID: <39591F78.94C4256B@prescod.net> Dieter Maurer wrote: > > ... > > All arguments, we give here for use of attributes or > accessor function without leading '_', hold also for > other IDL mappings. The killer argument for us is that we have no need for CORBA in this application. If we needed CORBA interoperability then of course we need to follow the letter of the IDL. Right now, there is no concrete benefit in doing so! I would really like to see Python have a built-in solution for computed attributes and I would like to see some future version of the CORBA mapping use that solution. In the meantime, the CORBA folks have different priorities than we do and trying to force either group to compromise would not buy anything. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From jim@digicool.com Tue Jun 27 22:47:15 2000 From: jim@digicool.com (Jim Fulton) Date: Tue, 27 Jun 2000 17:47:15 -0400 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <200006272109.PAA06402@localhost.localdomain> Message-ID: <395920E3.816E2897@digicool.com> Uche Ogbuji wrote: > > > Circular teferences are harmful in other ways that just GC. > > For example, deep copy, and similar applications are certainly complicated > > by them. In any case, designing an API that encourages memory leaks > > in current Python seems like a pretty bad idea to me. > > I hardly see how it encourages them as long as there is a documented method to > avoiding them. If you store the parent in the child, then you are causing a circular reference which is very likely to cause a leak. An API that requires implementation of __getattr__ to avoid storing the parent encourages storing the parent. (snip) > > > > "As far as I know"? > > It's pretty idiomatic for this phrase to have none of the evil connotation I > think you're attaching to it. I didn't attach any evil connotations to anything. From what I've seen over the last few days, the Python DOM API is well understood. There's no reason why someone should have to say "As far as I know". They should be able to point to the spec. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From uogbuji@fourthought.com Tue Jun 27 22:50:06 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 27 Jun 2000 15:50:06 -0600 Subject: [XML-SIG] The '_' thingy In-Reply-To: Message from Dieter Maurer of "Tue, 27 Jun 2000 21:05:22 +0200." <14680.63215.46641.68830@lindm.dm> Message-ID: <200006272150.PAA06528@localhost.localdomain> > Mike Olson writes: > > So, I think I see this as a general concensius: > I am not consent. > > > 1. DOM will never (in forseeable future) be used over an ORB, so the > > IDL should be used as a guide. We should focus more on useability then > > CORBA compliance. > I would like very much that database suppliers (Oracle, Poet, Zope) > would support DOM access to the objects in the database through > CORBA. AFAIK, Oracle already uses an ORB to facilitate integration. I agree. I'm actually quite surprised Mike said that, since we have had good reason to use DOM over CORBA before. We should also nota that as Jim points out, sometimes DOM is just a convenient notation for a more complex beast. If "childNodes" is doing a large database join, for instance, I think the arguments about the DOM IDL being too fine-grained tend to evaporate. There is a place for DOM-over-CORBA. Not a common one, but it's there. > > If all are good with this, then we should start down this path. A > > langauge mapping is something we can put into the next release of 4DOM > > (something we've been meaning to do any ways). The rest of the cahnges > > are actually in place (unless we define a different callback naming > > convention). We will be slowly depricating _get_* soon as well. > > However we will still need __setattr__ callbacks in some cases.... > You may depricate it, but please continue to support it -- > for people that know IDL and use the IDL->Python mapping. I don't think we'll be making _any_ rash moves in the near future, rest assured. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From jim@digicool.com Tue Jun 27 23:08:29 2000 From: jim@digicool.com (Jim Fulton) Date: Tue, 27 Jun 2000 18:08:29 -0400 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <200006272032.OAA06271@localhost.localdomain> Message-ID: <395925DD.2CF1ACC4@digicool.com> Uche Ogbuji wrote: > > > I'll be willing to give on this, however, I assert that > > a decision isn't a decision unless it's published. > > > > It doesn't help when the people who made the decision > > can't seem to remember what it was. This was too strongly worded > _Completely_ untrue. That's strong, and I can't agree with this characterization. > Can you point me to a post where someone said they don't > kow what the decision was? There were a number of posts that made it seem that there wasn't a consensus. For example, http://www.python.org/pipermail/xml-sig/2000-June/004258.html Mike says "The methods really are not part of the DOM API". That is, just use the attributes. http://www.python.org/pipermail/xml-sig/2000-June/004261.html Fred says "The W3C documentation gives the IDL mapping, which requires the Python specific mapping.". Fred reinforces this in http://www.python.org/pipermail/xml-sig/2000-June/004299.html Now since the Python CORBA mapping doesn't provide for direct attribute access, a reasonable person would expect that Python DOM uses accessor methods (with leading '_'s). http://www.python.org/pipermail/xml-sig/2000-June/004285.html Dieter says "DOM is specified in terms of IDL. Python has an IDL -> Python mapping. Deviating from this mapping for DOM only would require special knowledge -- a thing I do not like." Hm, based on the Python IDL mapping, we use accessor methods I could go on, but I won't. If there was a published mapping, we wouldn't be covering this ground again. > I _can_ point you to posts where the consensus was > summarized. See, for instance > > http://www.python.org/pipermail/xml-sig/1999-November/003281.html (and > following posts) > > http://www.python.org/pipermail/xml-sig/1999-November/003313.html > > The whole affair started in the "4DOM future" thread and continued in the > "foo.bar vs. foo.get_bar()", rounding out in the "CORBA compliance for the DOM > in Python?" thread and other sundry conversations. Thanks. How would anyone know about this without living in the SIG or searching the archives? Obviously they wouldn't. Even after reading these, I don't see a very strong consensus. The strongest indication of consensus is in the second link you provided, which points to an "unofficial Python binding". > > Supposedly, the > > decision was that DOM attributes are accessed as ordinary > > Python attributes, as in:: > > > > foo.nodeName > > Yes. > > > yet several people seemed to think that attributes are obtained > > via accessor functions: > > > > foo._get_nodeName() > > Yes. The decision was "both". See the links. OK, I'll try to summarize this in a sperate message. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From paul@prescod.net Tue Jun 27 23:08:10 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 27 Jun 2000 15:08:10 -0700 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <200006272109.PAA06402@localhost.localdomain> <395920E3.816E2897@digicool.com> Message-ID: <395925CA.ABD48D83@prescod.net> Jim Fulton wrote: > > ... > > If you store the parent in the child, then you are causing a circular reference > which is very likely to cause a leak. An API that requires implementation of > __getattr__ to avoid storing the parent encourages storing the parent. An API that has a getParent() method also encourages storing the parent. The DOM is the source ofthe problem. > There's no reason why someone should have to say "As far as I know". > They should be able to point to the spec. I know what we agreed to. I didn't know if everyone had got around to implementing it correctly. "As far as I know, every Python XML parser handles namespaces" does not imply an amgiuity in the XML namespaces specification! I think we have to agree to disagree here. If the FourThought guys are up to it, we can drop the leading underscore from the method versions. You can implement a DOM subset that only supports the method versions. We can easily whip up adapter proxies that make your DOMs compliant with the full attributes+methods API if interoperability becomes an issue. If you get hundreds of Python objects to support your DOM interface then we may think twice about being partially incompatible with them and migrate our users to the method version. Alternately, in Python 1.7, we could get first-class computed attribute support into Python and agree to use that. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From jim@digicool.com Tue Jun 27 23:13:15 2000 From: jim@digicool.com (Jim Fulton) Date: Tue, 27 Jun 2000 18:13:15 -0400 Subject: [XML-SIG] Consensus: Python mapping for DOM attributes? Message-ID: <395926FB.A20538E0@digicool.com> OK, I'm told that there was a decision made last November that attributes defined by DOM are mapped to both Python attributes and accessor functions. This means that DOM clients can use either "direct attribute access": node.parent or an accessor method: node._get_parent() An implementor of the Python DOM API must support both methods of access. I don't personally like this API very much, but I guess it's to late to change it. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From uogbuji@fourthought.com Tue Jun 27 23:40:52 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 27 Jun 2000 16:40:52 -0600 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: Message from Jim Fulton of "Tue, 27 Jun 2000 18:08:29 EDT." <395925DD.2CF1ACC4@digicool.com> Message-ID: <200006272240.QAA06679@localhost.localdomain> > Uche Ogbuji wrote: > Thanks. How would anyone know about this without living in the SIG or > searching the archives? Obviously they wouldn't. Agreed that this is not the best. I've been tooling around with a basic document to help rectify. Don't worry, I haven't addressed the current controversial point yet. > Even after reading these, I don't see a very strong > consensus. The strongest indication of consensus is in the > second link you provided, which points to an > "unofficial Python binding". Yes, I considered it unofficial (and still do) because it is not formally stated and published. However, I do think it represented the consensus (I hope I've never tried to pass off "consensus" and "official" as the same thing). Paul's point about implementor consensus comes from the fact that both the 4DOM and PyDOM teams pledged, and soon afterwards effected a change to support the dual foo/_get_foo convention. Note that there were no howls of protest. You can see in the threads that pretty much everyone ended up liking the attribute approach because of its pythonicity, whough there were a few mentions of the implementation difficulties you've brought up. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uogbuji@fourthought.com Tue Jun 27 23:45:52 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 27 Jun 2000 16:45:52 -0600 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: Message from Paul Prescod of "Tue, 27 Jun 2000 15:08:10 PDT." <395925CA.ABD48D83@prescod.net> Message-ID: <200006272245.QAA06699@localhost.localdomain> Paul Prescod: > I think we have to agree to disagree here. If the FourThought guys are > up to it, we can drop the leading underscore from the method versions. I personally don't have a tremendous problem with this, though it could become a bit of a pain if we ever tried to use our CORBA/4DOM code with DOM Level 2, or to initate another project using CORBA/4DOM (never fear: it's no longer in the core, but an add-on layer). Dieter Maurer seems to have similar concerns. I'm wary about making all the changes until we've all had some time to do some Tai Chi and decide whether that is indeed the Dao. > Alternately, in Python 1.7, we could > get first-class computed attribute support into Python and agree to use > that. I remember when you first brought up CAs. My response was "huh?". No longer I assure you. Where do I sign the petition for these buggers? -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From jim@digicool.com Wed Jun 28 00:06:24 2000 From: jim@digicool.com (Jim Fulton) Date: Tue, 27 Jun 2000 19:06:24 -0400 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <200006272109.PAA06402@localhost.localdomain> <395920E3.816E2897@digicool.com> <395925CA.ABD48D83@prescod.net> Message-ID: <39593370.7DD03DDE@digicool.com> Paul Prescod wrote: > > Jim Fulton wrote: > > > > ... > > > > If you store the parent in the child, then you are causing a circular reference > > which is very likely to cause a leak. An API that requires implementation of > > __getattr__ to avoid storing the parent encourages storing the parent. > > An API that has a getParent() method also encourages storing the parent. > The DOM is the source ofthe problem. I don't agree. It fact, it certainly suggests that some computation is possible. If I want to compute the parent, I can do so without mucking with __getattr__. OTOH, if I have to support "foo.parent" (as I apparently do) then the only way to avoid implementing __getattr__ is to store the darn thing. (snip) > You can implement a DOM subset that only supports the method versions. Noooooooooo. If I'm going to bother to implement DOM, I want it to work with all DOM clients. > We can easily whip up adapter proxies that make your DOMs compliant with > the full attributes+methods API if interoperability becomes an issue. If > you get hundreds of Python objects to support your DOM interface then we > may think twice about being partially incompatible with them and migrate > our users to the method version. Whimper. I don't want *my* DOM API. I want there to be exactly *one* official documented Python DOM API mapping. I'd prefer that the mapping be as easy as possible to implement and use and, as a bonus, it would be swell if it worked well with Zope, which the current API doesn't. Oh well. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From Fredrik Lundh" thought some of you might find this one a bit interesting: http://hem.passagen.se/eff/2000_06_01_bot-archive.htm#397730 cheers /F From janssen@parc.xerox.com Wed Jun 28 00:20:07 2000 From: janssen@parc.xerox.com (Bill Janssen) Date: Tue, 27 Jun 2000 16:20:07 PDT Subject: [XML-SIG] Re: [DO-SIG] Python language bidning January 2000 Draft In-Reply-To: Your message of "Tue, 27 Jun 2000 05:04:00 PDT." <395797BB.E71AB751@digicool.com> Message-ID: <00Jun27.162008pdt."3438"@watson.parc.xerox.com> > The Python mapping does *not* provide for using IDL attributes > as Python attributes. In fact, some folks in the DO-SIG seem to > feel strongly that doing so would be a really bad idea. Count me in as thinking it's a bad idea, too. Bill From Fredrik Lundh" following up on myself: has anyone benchmarked pyexpat against xmllib (as of 1.6a2). how much faster is it? just curious /F From paul@prescod.net Wed Jun 28 01:23:59 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 27 Jun 2000 17:23:59 -0700 Subject: [XML-SIG] XML parsing performance References: <00a601bfe08d$8f5cd6e0$f2a6b5d4@hagrid> Message-ID: <3959459F.24C7FE10@prescod.net> Your benchmarks look good. The shallow parser appropach may be interesting for XML vocabularies that don't make heavy use of attributes, entities and so forth. No, I haven't done xmllib benchmarking lately. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From fdrake@beopen.com Wed Jun 28 02:19:23 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Tue, 27 Jun 2000 21:19:23 -0400 (EDT) Subject: [XML-SIG] Re: PyXML writer In-Reply-To: <395902DE.B38A96BF@prescod.net> References: <395902DE.B38A96BF@prescod.net> Message-ID: <14681.21147.418947.623762@cj42289-a.reston1.va.home.com> Paul Prescod writes: > This code must have been written by a Norwegian SGML hacker. :) Actually, no. Surprisingly enough, I *have* offered more than opinionated rhetoric. ;) -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From akuchlin@mems-exchange.org Wed Jun 28 03:52:20 2000 From: akuchlin@mems-exchange.org (A.M. Kuchling) Date: Tue, 27 Jun 2000 22:52:20 -0400 Subject: [XML-SIG] Contents of xmlcore? Message-ID: <200006280252.WAA01464@mira.erols.com> We need to decide what will go into the xmlcore package in Python 1.6. SAX, yes; what else? pulldom? The choice needs to be finalized as soon as possible in order to stand a chance of getting things checked in with minimal test suites for 1.6b1, which is only a few days away. Would an IRC session on Wednesday let us settle the question faster than through e-mail? --amk From fdrake@beopen.com Wed Jun 28 04:42:00 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Tue, 27 Jun 2000 23:42:00 -0400 (EDT) Subject: [XML-SIG] Contents of xmlcore? In-Reply-To: <200006280252.WAA01464@mira.erols.com> References: <200006280252.WAA01464@mira.erols.com> Message-ID: <14681.29704.194719.579529@cj42289-a.reston1.va.home.com> A.M. Kuchling writes: > We need to decide what will go into the xmlcore package in Python 1.6. > SAX, yes; what else? pulldom? The choice needs to be finalized as > soon as possible in order to stand a chance of getting things checked Based on the recent mess regarding the DOM API, I'm not inclined to include a DOM-like API until we have a specification for the Python DOM API. I won't have time to write one before 1.6 is done. ;( > Would an IRC session on Wednesday let us settle the question faster > than through e-mail? No, because not everyone has a clue as to how to operate a chat program these days. I haven't used IRC since the "I" was added! (What? You don't remember BitNet?) -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From hemangee@pspl.co.in Wed Jun 28 03:49:22 2000 From: hemangee@pspl.co.in (Hemangee) Date: Wed, 28 Jun 2000 09:49:22 +0700 Subject: [XML-SIG] Installation problem Message-ID: <000001bfe0ab$76926bc0$6102a8c0@intranet.pspl.co.in> Hello, I tried installing the PyXML-0.5.4 on my Windows NT workstation When i run the command python setup.py build it gives many errors like functions not found....etc and does not install the same What should I do ? Or have i downloaded a wrong installable ? In that case please suggest one. Thanks, Hemangee. From gstein@lyra.org Wed Jun 28 07:31:07 2000 From: gstein@lyra.org (Greg Stein) Date: Tue, 27 Jun 2000 23:31:07 -0700 Subject: [XML-SIG] Contents of xmlcore? In-Reply-To: <14681.29704.194719.579529@cj42289-a.reston1.va.home.com>; from fdrake@beopen.com on Tue, Jun 27, 2000 at 11:42:00PM -0400 References: <200006280252.WAA01464@mira.erols.com> <14681.29704.194719.579529@cj42289-a.reston1.va.home.com> Message-ID: <20000627233107.L29590@lyra.org> On Tue, Jun 27, 2000 at 11:42:00PM -0400, Fred L. Drake, Jr. wrote: > > A.M. Kuchling writes: > > We need to decide what will go into the xmlcore package in Python 1.6. > > SAX, yes; what else? pulldom? The choice needs to be finalized as > > soon as possible in order to stand a chance of getting things checked > > Based on the recent mess regarding the DOM API, I'm not inclined to > include a DOM-like API until we have a specification for the Python > DOM API. I won't have time to write one before 1.6 is done. ;( I'd have to agree with this. Punt the DOM. Seems like that leaves expat and sax? Not much of a core :-( Cheers, -g -- Greg Stein, http://www.lyra.org/ From dgrisby@uk.research.att.com Wed Jun 28 10:30:54 2000 From: dgrisby@uk.research.att.com (Duncan Grisby) Date: Wed, 28 Jun 2000 10:30:54 +0100 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: Message from Uche Ogbuji of "Tue, 27 Jun 2000 14:00:59 MDT." <200006272000.OAA06198@localhost.localdomain> Message-ID: <200006280930.KAA01726@pineapple.uk.research.att.com> On Tuesday 27 June, Uche Ogbuji wrote: > We used our own version of the DOM IDL, with a few mods. Fnorb was > easy, but for ILU we had to make liberal use of underscores to > escape clashing names, although as Martin von Lowis has pointed out, > most of the errors are for CORBA 2.3 only and were not errors for > CORBA 2.2. Besides changing the few clashing names, the DOM IDL in > general compiled just fine. I think people have been making far too > much of the few errors name-clashes Duncan Grisby turned up. Note > that Duncan Grisby maintains the unsurpassed omniORBpy, by far the > most compliant 2.3 ORB for Python. It is far stricter than > Fnorb/ILU, but only emerged after we took CORBA out of the 4DOM core > (and, IIRC, after our earlier discussion on Python/DOM binding). I'm sorry I introduced the stuff about the name clashes. I just meant it to be evidence that aiming for compliance with the Python CORBA mapping was more trouble than it's worth. As Martin pointed out, all but one of the clashes are trivial to fix, anyway [1]. The only issue is really with the W3C since they claim their IDL is CORBA 2.3.1 compliant, which it plainly isn't. I don't think people who are using/implementing DOM should religiously stick to the CORBA mapping. As long as you keep to the intent of the IDL, it will be trivial to map whatever you use into the full CORBA mapping. This could even be done at the IDL compiler level, so there wouldn't need to be any proxy objects doing the translation. As an example, dom::Attr is defined as: module dom { interface Attr : Node { readonly attribute DOMString name; readonly attribute boolean specified; attribute DOMString value; // raises(DOMException) on setting // Introduced in DOM Level 2: readonly attribute Element ownerElement; }; }; omniORBpy currently maps that to the following (sort of -- I've cut out a few things which aren't relevant, and wrapped some long lines): # Attr object reference class _objref_Attr (_0_dom._objref_Node): def __init__(self): _0_dom._objref_Node.__init__(self) def _get_name(self, *args): return _omnipy.invoke(self, "_get_name", _0_dom.Attr._d__get_name, args) def _get_specified(self, *args): return _omnipy.invoke(self, "_get_specified", _0_dom.Attr._d__get_specified, args) def _get_value(self, *args): return _omnipy.invoke(self, "_get_value", _0_dom.Attr._d__get_value, args) def _set_value(self, *args): return _omnipy.invoke(self, "_set_value", _0_dom.Attr._d__set_value, args) def _get_ownerElement(self, *args): return _omnipy.invoke(self, "_get_ownerElement", _0_dom.Attr._d__get_ownerElement, args) It would be very easy to add a flag to the IDL compiler which generated the obvious __getattr__ and __setattr__ methods. The server side would still have to use the _get and _set methods. Cheers, Duncan. [1] The only clash which can't be fixed by escaping the IDL identifiers with an underscore is the interface named "Range" in module "range". This is really a problem for the W3C to fix, but it would be easy enough to relax the name scoping rules to allow range::Range to pass the IDL compiler. -- -- Duncan Grisby \ Research Engineer -- -- AT&T Laboratories Cambridge -- -- http://www.uk.research.att.com/~dpg1 -- From jerome@IDEALX.com Wed Jun 28 11:20:45 2000 From: jerome@IDEALX.com (J�r�me Marant) Date: 28 Jun 2000 12:20:45 +0200 Subject: [XML-SIG] xbel doc Message-ID: <64zoo6ayyq.fsf@amboise.ird.idealx.com> Hi, I can't find the xbel doc in version 0.5.5 of pyxml. It used to exist in version 0.5.1. Why has it disappeared ? thx. --=20 J=E9r=F4me Marant ----------------------------------------------------------- | IDEALX - Open Source Engineering / Ing=E9nierie Open Source | | http://IDEALX.com | ----------------------------------------------------------- From walter@livinglogic.de Wed Jun 28 11:48:59 2000 From: walter@livinglogic.de (Walter Doerwald) Date: Wed, 28 Jun 2000 12:48:59 +0200 Subject: [XML-SIG] Reconsidering the DOM API In-Reply-To: <3958FC52.A747A970@prescod.net> References: <14680.59651.130151.140876@fermi.eeel.nist.gov> Message-ID: <4.3.1.0.20000628124707.00b1a100@mail.tmt.de> At 21:11 27.06.00, you wrote: >Michael McLay wrote: > > > > .. > > > > a=3Db.get_childNode(0).get_attribute("abc") > >How do you get the list of childnodes and attributes? > > > or perhaps the call chain should be reduced by merging methods: > > > > c =3D b.get(childNode=3D0, attribute=3D"abc") > >How about childNodes[0].childNodes[1].childNodes[0].attributes["abc"] Why not put children and attribute access into __getitem__ c =3D b[0][1][0]["abc"] Bye, Walter D=F6rwald -- Walter D=F6rwald =B7 LivingLogic AG =B7 Bayreuth, Germany =B7 www.livinglogi= c.de From fdrake@beopen.com Wed Jun 28 12:16:53 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Wed, 28 Jun 2000 07:16:53 -0400 (EDT) Subject: [XML-SIG] back up & running Message-ID: <14681.56997.50955.892954@cj42289-a.reston1.va.home.com> My mega-laptop got fixed faster than I'd expected, so I'm back up to doing useful work. My top priorities are handling patches and going back through my email to find all the documentation patches that have sat idle for too long. -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From fdrake@beopen.com Wed Jun 28 13:23:32 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Wed, 28 Jun 2000 08:23:32 -0400 (EDT) Subject: [XML-SIG] Contents of xmlcore? In-Reply-To: <20000627233107.L29590@lyra.org> References: <200006280252.WAA01464@mira.erols.com> <14681.29704.194719.579529@cj42289-a.reston1.va.home.com> <20000627233107.L29590@lyra.org> Message-ID: <14681.60996.19842.149115@cj42289-a.reston1.va.home.com> Greg Stein writes: > I'd have to agree with this. Punt the DOM. > > Seems like that leaves expat and sax? Not much of a core :-( It's better than we have now! I don't think there's any need to say this is all it'll ever contain; I wouldn't mind if the XML package became part of the standard library, but that makes more sense for Python 1.7. -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From case@appliedtheory.com Wed Jun 28 14:22:56 2000 From: case@appliedtheory.com (Benjamin Saller) Date: Wed, 28 Jun 2000 09:22:56 -0400 (EDT) Subject: [XML-SIG] Reconsidering the DOM API In-Reply-To: <4.3.1.0.20000628124707.00b1a100@mail.tmt.de> Message-ID: Today, Walter Doerwald wrote: Why not put children and attribute access into __getitem__ c = b[0][1][0]["abc"] Try to maintain that a month after you wrote it ;> -- Benjamin Saller Technical Strategist AppliedTheory Where tire hits pavement on the Information super-highway, that's where my head is... From davecosta@netscape.net Wed Jun 28 14:27:31 2000 From: davecosta@netscape.net (Dave Costa) Date: 28 Jun 00 06:27:31 PDT Subject: [[XML-SIG] Installation problem] Message-ID: <20000628132731.12756.qmail@www0w.netaddress.usa.net> Hemangee, If you do not have the distutils package installed, I suggest you install= that first, the PyXML install was much smoother with it. http://www.python.org/sigs/distutils-sig/ Also, even if you do have distutils installed, you may have trouble getti= ng the C extensions to compile. I have tried it on Windows 98 with Borland = C++ (simply doesn't work) and NT Workstations with MS Visual Studio (gives a = link error because it is looking for a file in the wrong place, as far as I ca= n tell). Even when using the "install_lib" option, which is supposed to only insta= ll the pure-Python pieces, the same problem occurs. Supposedly, there is a precompiled version for Windows, but I have not fo= und it. Does anyone on the list know where it is? "Hemangee" wrote: Hello, I tried installing the PyXML-0.5.4 on my Windows NT workstation When i run the command python setup.py build it gives many errors like functions not found....etc and does not install the same What should I do ? Or have i downloaded a wrong installable ? In that case please suggest one. Thanks, Hemangee. _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://www.python.org/mailman/listinfo/xml-sig ____________________________________________________________________ Get your own FREE, personal Netscape WebMail account today at http://webm= ail.netscape.com. From paul@prescod.net Wed Jun 28 14:34:37 2000 From: paul@prescod.net (Paul Prescod) Date: Wed, 28 Jun 2000 06:34:37 -0700 Subject: [XML-SIG] Re: PyXML writer References: <395902DE.B38A96BF@prescod.net> <14681.21147.418947.623762@cj42289-a.reston1.va.home.com> Message-ID: <3959FEED.60151181@prescod.net> "Fred L. Drake, Jr." wrote: > > Paul Prescod writes: > > This code must have been written by a Norwegian SGML hacker. :) > > Actually, no. Surprisingly enough, I *have* offered more than > opinionated rhetoric. ;) You must have spent too much time with the SGML Handbook. "pio", "pic", "vi" Reminds me of the halcyon days of youth. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From paul@prescod.net Wed Jun 28 14:40:06 2000 From: paul@prescod.net (Paul Prescod) Date: Wed, 28 Jun 2000 06:40:06 -0700 Subject: [XML-SIG] Contents of xmlcore? References: <200006280252.WAA01464@mira.erols.com> Message-ID: <395A0036.C06AD8A@prescod.net> I'm not really "into" IRC and have a doctor's appointment today. I am writing a very thorough test suite for minidom and pulldom. The two modules are mutually dependent (really only split for logistical reasons). Therefore the only question is whether the interfaces are "right". What if I describe the interfaces in a series of messages so that people can critique them rather than having to download the modules explicitly. I think that the test suites and all can be finished tomorrow but I may have to put off the completion of SAX-izing PyExpat until after beta 1. qp_xml is not going to make it. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From fdrake@beopen.com Wed Jun 28 14:55:10 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Wed, 28 Jun 2000 09:55:10 -0400 (EDT) Subject: [XML-SIG] Re: PyXML writer In-Reply-To: <3959FEED.60151181@prescod.net> References: <395902DE.B38A96BF@prescod.net> <14681.21147.418947.623762@cj42289-a.reston1.va.home.com> <3959FEED.60151181@prescod.net> Message-ID: <14682.958.561741.611364@cj42289-a.reston1.va.home.com> Paul Prescod writes: > You must have spent too much time with the SGML Handbook. "pio", "pic", > "vi" Actually, I've never had good access to a copy of that -- it costs too much. But "Practical SGML" (I think that's right) uses all those abbreviations as well, so I adopted those so I didn't have to make up anything new. The SGML parser in Grail does much the same thing. ;) -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From fdrake@beopen.com Wed Jun 28 15:02:12 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Wed, 28 Jun 2000 10:02:12 -0400 (EDT) Subject: [XML-SIG] Contents of xmlcore? In-Reply-To: <395A0036.C06AD8A@prescod.net> References: <200006280252.WAA01464@mira.erols.com> <395A0036.C06AD8A@prescod.net> Message-ID: <14682.1380.325958.704518@cj42289-a.reston1.va.home.com> Paul Prescod writes: > I am writing a very thorough test suite for minidom and pulldom. The two > modules are mutually dependent (really only split for logistical > reasons). Therefore the only question is whether the interfaces are Excellent! > "right". What if I describe the interfaces in a series of messages so > that people can critique them rather than having to download the modules This would be very good, and could serve as the basis for the documentation as well. > explicitly. I think that the test suites and all can be finished > tomorrow but I may have to put off the completion of SAX-izing PyExpat > until after beta 1. > > qp_xml is not going to make it. What, you mean something isn't going to make it by Friday??? You slacker! ;) -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From walter@livinglogic.de Wed Jun 28 15:32:16 2000 From: walter@livinglogic.de (Walter Doerwald) Date: Wed, 28 Jun 2000 16:32:16 +0200 Subject: [XML-SIG] Reconsidering the DOM API In-Reply-To: References: <4.3.1.0.20000628124707.00b1a100@mail.tmt.de> Message-ID: <4.3.1.0.20000628152845.00b189b0@mail.tmt.de> At 15:22 28.06.00, Benjamin Saller wrote: >Today, Walter Doerwald wrote: > > Why not put children and attribute access into __getitem__ > c =3D b[0][1][0]["abc"] > >Try to maintain that a month after you wrote it ;> I don't see a problem here. When you want something that is short, because you use it *all* *the* *time*, then you won't forget how it works. What's the difference between remembering that [index] gives you the index'th child and remembering that _get_childNodes(index) does it? And the implementation is straightforward: def __getitem__(self,index): if type(index) =3D=3D types.IntType: return self._get_childNodes(index) elif type(index) =3D=3D types.StringType: return self._get_attribute(index) else: raise ValueError("wrong type for index") (or whatever the DOM API will be) Bye, Walter D=F6rwald -- Walter D=F6rwald =B7 LivingLogic AG =B7 Bayreuth, Germany =B7 www.livinglogi= c.de From paul@prescod.net Wed Jun 28 15:30:40 2000 From: paul@prescod.net (Paul Prescod) Date: Wed, 28 Jun 2000 07:30:40 -0700 Subject: [XML-SIG] Contents of xmlcore? References: <200006280252.WAA01464@mira.erols.com> <14681.29704.194719.579529@cj42289-a.reston1.va.home.com> Message-ID: <395A0C10.FB5A81F0@prescod.net> "Fred L. Drake, Jr." wrote: > > ... > > Based on the recent mess regarding the DOM API, I'm not inclined to > include a DOM-like API until we have a specification for the Python > DOM API. I won't have time to write one before 1.6 is done. ;( Argh. We have four such specifications: * 4dom * pydom * minidom * http://www.python.org/doc/howto/xml/node14.html Nothing has changed for MONTHS. I'm somewhat dismayed that the XML stuff is being held to a much higher status than anything else. In what other area would an API remain the same for months, fend off an attack from one of the leading gurus in the Python world and then be refused entry because it is not written up in a more formal document than a howto? Okay, fine. Here's the Python DOM 1 mapping: DOM nodes must be represented by Python objects. The DOM describes node types. Node type equality is defined in the DOM according to integer equality based on a .nodeType property. Nodetype equality should be checked in the same way in the Python DOM. Using isinstance is not recommended. DOM nodelists should be represented by Python sequence (typically list) objects. These lists must be considered read-only (as in the DOM) but implementations need not enforce this. The DOM "item" and "length" methods are not required because the Python equivalents are sufficient. DOM namednodemaps must be represented by Python mapping objects. Once again, the DOM methods are optional because Python equivalents are sufficient. Integer indexing of these objects is not allowed. Index into the keys() list instead. These objects should be read-write (as in the DOM). The DOMString interface must be represented by Python string or Unicode string objects. There is no provision for interoperability between DOMs. It may or may not be possible to move objects between DOMs. The DOM does not require this interoperability and most DOM implementations (for all languages) do not provide it. DOM attributes must be provided to client software both as Python attributes and through _get_XXX and _set_XXX methods. The underlying implementation may use Python object attributes or something more sophisticated (with __getattr__). The _get_XXX syntax is subject to change in the future because many have expressed dissatisfaction with it. Therefore client software is encouraged to use the attribute syntax. Exceptions thrown by a DOM should either be built-in Python exceptions or should inherit from the appropriate Python exception. INDEX_SIZE_ERR is a kind of IndexError DOMSTRING_SIZE_ERR is a kind of MemoryError HIERARCHY_REQUEST_ERR is a kind of TypeError WRONG_DOCUMENT_ERR is a kind of TypeError INVALID_CHARACTER_ERR is a kind of TypeError NO_DATA_ALLOWED_ERR is a kind of AssertionError NO_MODIFICATION_ALLOWED_ERR is a kind of TypeError NOT_FOUND_ERR is a kind of IndexError NOT_SUPPORTED_ERR is a kind of AssertionError INUSE_ATTRIBUTE_ERR is a kind of AssertionError -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From paul@prescod.net Wed Jun 28 15:31:44 2000 From: paul@prescod.net (Paul Prescod) Date: Wed, 28 Jun 2000 07:31:44 -0700 Subject: [XML-SIG] Reconsidering the DOM API References: <14680.59651.130151.140876@fermi.eeel.nist.gov> <4.3.1.0.20000628124707.00b1a100@mail.tmt.de> Message-ID: <395A0C50.A781B03F@prescod.net> Walter Doerwald wrote: > > Why not put children and attribute access into __getitem__ > c = b[0][1][0]["abc"] Not a bad idea as syntactic sugar. We should consider it for the NEXT version of the DOM API mapping. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From akuchlin@mems-exchange.org Wed Jun 28 16:04:14 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Wed, 28 Jun 2000 11:04:14 -0400 Subject: [XML-SIG] xbel doc In-Reply-To: <64zoo6ayyq.fsf@amboise.ird.idealx.com> References: <64zoo6ayyq.fsf@amboise.ird.idealx.com> Message-ID: <20000628110414.C9063@kronos.cnri.reston.va.us> On Wed, Jun 28, 2000 at 12:20:45PM +0200, J�r�me Marant wrote: >I can't find the xbel doc in version 0.5.5 of pyxml. >It used to exist in version 0.5.1. Why has it disappeared ? It probably got dropped accidentally when I started using the Distutils to make the source distribution; the MANIFEST file would need to list it, and it would be easy for me to forget a file or directory that doesn't affect the compilation. I'll fix it... --amk From mgushee@havenrock.com Wed Jun 28 16:23:10 2000 From: mgushee@havenrock.com (Matt Gushee) Date: Wed, 28 Jun 2000 11:23:10 -0400 (EDT) Subject: [XML-SIG] soaplib errors Message-ID: <14682.6238.219244.669161@kirin.architag.com> Hi, Folks-- I thought I'd try out soaplib 0.8. But apparently I'm missing something or haven't set it up correctly. When I run the test programs, 'test2.py' works, but the other two fail with the following errors (complete backtraces included at the end of this message): > test1.py File "C:\PROGRA~1\Python\SITE-P~1\soap\test1.py", line 9, in ? print server.call(Payload()) ..... SyntaxError: unknown type 'http://www.w3.org/1999/XMLSchema/ur-type[2]' (for now) > test3.py File "C:\PROGRA~1\Python\SITE-P~1\soap\test3.py", line 21, in ? except Error, v: NameError: Error I am running Python 1.5.2 "standard" package (? -- e.g. the binary obtained thru python.org, not the Pythonware package) on Windows 2000 Professional sgmlop not installed Any ideas what the problem is? Matt Gushee ---------------------------------------------------------------- Complete backtraces: > cd C:\Program Files\Python\site-packages\soap > test1.py C:\Program Files\Python\site-packages\soap>test1.py *** http://www.w3.org/1999/XMLSchema/ur-type[2] Traceback (innermost last): File "C:\PROGRA~1\Python\SITE-P~1\soap\test1.py", line 9, in ? print server.call(Payload()) File "C:\PROGRA~1\Python\SITE-P~1\soap\soaplib.py", line 687, in __call__ return self.__send(self.__name, pargs, kwargs) File "C:\PROGRA~1\Python\SITE-P~1\soap\soaplib.py", line 798, in __request request File "C:\PROGRA~1\Python\SITE-P~1\soap\soaplib.py", line 734, in request response = self.parse_response(h.getfile()) File "C:\PROGRA~1\Python\SITE-P~1\soap\soaplib.py", line 752, in parse_response p.feed(response) File "C:\Program Files\Python\Lib\xmllib.py", line 149, in feed self.goahead(0) File "C:\Program Files\Python\Lib\xmllib.py", line 247, in goahead k = self.parse_endtag(i) File "C:\Program Files\Python\Lib\xmllib.py", line 638, in parse_endtag self.finish_endtag(tag) File "C:\Program Files\Python\Lib\xmllib.py", line 677, in finish_endtag self.unknown_endtag(nstag) File "C:\PROGRA~1\Python\SITE-P~1\soap\soaplib.py", line 555, in end self.end_unknown(type) File "C:\PROGRA~1\Python\SITE-P~1\soap\soaplib.py", line 565, in end_unknown raise SyntaxError, ("unknown type %s (for now)" % repr(type)) SyntaxError: unknown type 'http://www.w3.org/1999/XMLSchema/ur-type[2]' (for now) > cd C:\Program Files\Python\site-packages\soap > test3.py C:\Program Files\Python\site-packages\soap>test3.py Traceback (innermost last): File "C:\PROGRA~1\Python\SITE-P~1\soap\test3.py", line 21, in ? except Error, v: NameError: Error From case@appliedtheory.com Wed Jun 28 16:12:35 2000 From: case@appliedtheory.com (Benjamin Saller) Date: Wed, 28 Jun 2000 11:12:35 -0400 (EDT) Subject: [XML-SIG] Reconsidering the DOM API In-Reply-To: <4.3.1.0.20000628152845.00b189b0@mail.tmt.de> Message-ID: Today, Walter Doerwald wrote: At 15:22 28.06.00, Benjamin Saller wrote: >Today, Walter Doerwald wrote: > > Why not put children and attribute access into __getitem__ > c = b[0][1][0]["abc"] > >Try to maintain that a month after you wrote it ;> I don't see a problem here. When you want something that is short, because you use it *all* *the* *time*, then you won't forget how it works. What's the difference between remembering that [index] gives you the index'th child and remembering that _get_childNodes(index) does it? That was unfair of me. The two examples you have are obviously equivalent. My problem is not that either example is more or less expressive than the other. I just tend to think that for the same reason we use DNS rather than IP addresses referring to your document in terms of numbers and offsets is less maintainable than an XPath like notation. In many ways the DOM removes the human readable advantages of XML because code no longer reflects that the objects are human readable and carry with them domain knowledge. -- Benjamin Saller Technical Strategist AppliedTheory Where tire hits pavement on the Information super-highway, that's where my head is... From GSMiros@netscape.net Wed Jun 28 16:30:20 2000 From: GSMiros@netscape.net (Rosalie Dieteman) Date: 28 Jun 00 08:30:20 PDT Subject: [XML-SIG] attributes discussion Message-ID: <20000628153020.11182.qmail@www0m.netaddress.usa.net> I'm a little dense and uninformed here... There is/will be a way to get at a node's attributes. This whole long th= read is just a) how to get at them and b) is there supposed to be an underscor= e at the beginning of the method/property (to use database terminology) name. Am I correct? Rosalie Dieteman ____________________________________________________________________ Get your own FREE, personal Netscape WebMail account today at http://webm= ail.netscape.com. From Juergen Hermann" Message-ID: <200006281602.SAA02323@statistik.cinetic.de> On 28 Jun 00 06:27:31 PDT, Dave Costa wrote: >Supposedly, there is a precompiled version for Windows, but I have not = found >it. Does anyone on the list know where it is? If you do not find it elsewhere, try http://www.dragon-ware.com/~jh/pub/pyxml-0.5.5-win32.zip That archive is intended to be unpacked in your "/program files/python" = directory. Ciao, J=FCrgen -- J=FCrgen Hermann (jhe@webde-ag.de) WEB.DE AG, Amalienbadstr.41, D-76227 Karlsruhe Tel.: 0721/94329-0, Fax: 0721/94329-22 From paul@prescod.net Wed Jun 28 17:21:33 2000 From: paul@prescod.net (Paul Prescod) Date: Wed, 28 Jun 2000 09:21:33 -0700 Subject: [XML-SIG] Reconsidering the DOM API References: Message-ID: <395A260D.2BF8404B@prescod.net> Benjamin Saller wrote: > > ... > > In many ways the > DOM removes the human readable advantages of XML because code no longer > reflects that the objects are human readable and carry with them domain > knowledge. The DOM gives you access to element type names. You choose whether to use that feature or not. That's not to say that I am against extensions to the DOM that make the XML-structure more central to the code but it is hardly fair to take a single example that does not use attribute names and extrapolate to all DOM code. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From akuchlin@mems-exchange.org Wed Jun 28 18:04:12 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Wed, 28 Jun 2000 13:04:12 -0400 Subject: [XML-SIG] xbel doc In-Reply-To: <64ya3p3hpu.fsf@amboise.ird.idealx.com> References: <64zoo6ayyq.fsf@amboise.ird.idealx.com> <20000628110414.C9063@kronos.cnri.reston.va.us> <64ya3p3hpu.fsf@amboise.ird.idealx.com> Message-ID: <20000628130412.A23352@kronos.cnri.reston.va.us> On Wed, Jun 28, 2000 at 06:15:09PM +0200, J�r�me Marant wrote: >Andrew Kuchling writes: >Ok. Are you one of the persons who maintain the PyXML* tarball ? >I am the new Debian Maintainer of the python XML implementation. Ah, OK! Yes, I'm one of the people with checkin privileges. Since the code is quickly moving to use the Distutils for installation, your job, and the job of Debian maintainers of Python-related packages, might be much simpler if the Distutils could automatically build .deb files, much as it can currently build RPM files. Then it would be trivial to build the .deb for any random Python package. Some work on this has been attempted: http://www.python.org/pipermail/distutils-sig/2000-April/001360.html http://www.python.org/pipermail/distutils-sig/2000-May/001431.html But nothing's been implemented yet. You might want to talk to Gregor Hoffleit, or post on the Distutils list, and see what progress is being made. >I also noticed that some html docs disapeared: all those named >xml.arch.*. I'll check that, and change the MANIFEST.in appropriately; thanks for the report. --amk From case@appliedtheory.com Wed Jun 28 18:18:44 2000 From: case@appliedtheory.com (Benjamin Saller) Date: Wed, 28 Jun 2000 13:18:44 -0400 (EDT) Subject: [XML-SIG] Reconsidering the DOM API In-Reply-To: <395A260D.2BF8404B@prescod.net> Message-ID: Today, Paul Prescod wrote: Benjamin Saller wrote: > > ... > > In many ways the > DOM removes the human readable advantages of XML because code no longer > reflects that the objects are human readable and carry with them domain > knowledge. The DOM gives you access to element type names. You choose whether to use that feature or not. That's not to say that I am against extensions to the DOM that make the XML-structure more central to the code but it is hardly fair to take a single example that does not use attribute names and extrapolate to all DOM code. Thats true, I shouldn't extrapolate. I am just trying to express my concerns that code written while in the head-space of the problem prolly wont make sense a month after rollout. Perhaps I am being too pessimistic, but in my experience if a[0][1][0]['foo'] makes sense at the time you are building the code and its half the length of the attribute names option people will take the shortcut. I just know that *I* am not able to go back to that later and work with it. If its just a matter of discipline to always use the more descriptive variants so be it. I just know a lot of schedule pressure driven programmers who just try to 'get it done'. Then again I work at a SEI level 0 shop and thats just symptomatic. The higher the bar to entry (and re-entry) the more I am willing to look at changes to the approach. That is why I write in Python now :) I personally have lost the ease of use vs. value of use debate too many times. I don't want/expect the DOM to go away. Its powerful and flexible. Its also somewhat more complex than the common case seems to need, but I could be wrong. -- Benjamin Saller Technical Strategist AppliedTheory Where tire hits pavement on the Information super-highway, that's where my head is... From paul@prescod.net Wed Jun 28 18:41:17 2000 From: paul@prescod.net (Paul Prescod) Date: Wed, 28 Jun 2000 10:41:17 -0700 Subject: [XML-SIG] Reconsidering the DOM API References: Message-ID: <395A38BD.6077296D@prescod.net> Benjamin Saller wrote: > > ... > > Perhaps I am being too pessimistic, but in my experience if > a[0][1][0]['foo'] makes sense at the time you are building the code and > its half the length of the attribute names option people will take the > shortcut. I can't speak for all people, but the real problem with that code, and the real reason to use element type names rather than numbers, is that ordinal-based code is very fragile. It's so fragile that it hardly ever works. It's so fragile that it seldom even works for your *test document*. It's so fragile that hardly anyone would do it that way. I'm sorry I brought up the example!!!! > Its also somewhat more complex than the common case seems to > need, but I could be wrong. Full DOM 2+? Maybe. Core DOM? How is it complex? You have parents, children, elements, attributes, siblings, etc. Usability enhancements are important (I typically combine XPath with the DOM whenever I need to do DOM work) but they are extensions. The DOM itself is not that complicated. I mean the DOM was designed for knuckle-dragging JavaScript "programmers" (term used lightly). The core concepts can be taught in five minutes (see the XML Howto). I don't see the complexity. I would say rather that it isn't complex enough in that it is lacking query facilities (i.e. XPath). -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From fdrake@beopen.com Wed Jun 28 19:04:23 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Wed, 28 Jun 2000 14:04:23 -0400 (EDT) Subject: [XML-SIG] attributes discussion In-Reply-To: <20000628153020.11182.qmail@www0m.netaddress.usa.net> References: <20000628153020.11182.qmail@www0m.netaddress.usa.net> Message-ID: <14682.15911.660939.870901@cj42289-a.reston1.va.home.com> Rosalie Dieteman writes: > There is/will be a way to get at a node's attributes. This whole > long thread is just a) how to get at them and b) is there supposed > to be an underscore at the beginning of the method/property (to use > database terminology) name. > > Am I correct? You are correct. Both the _get_/_set_ method and "direct" attribute access work on the current implementations. -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From Mike.Olson@fourthought.com Wed Jun 28 19:08:07 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Wed, 28 Jun 2000 12:08:07 -0600 Subject: [XML-SIG] Reconsidering the DOM API References: <14680.59651.130151.140876@fermi.eeel.nist.gov> <4.3.1.0.20000628124707.00b1a100@mail.tmt.de> <395A0C50.A781B03F@prescod.net> Message-ID: <395A3F07.7A812984@FourThought.com> Paul Prescod wrote: > > Walter Doerwald wrote: > > > > Why not put children and attribute access into __getitem__ > > c = b[0][1][0]["abc"] > > Not a bad idea as syntactic sugar. We should consider it for the NEXT > version of the DOM API mapping. I like it for syntatic sugar as well, but namespaces would make the attribute access interesting c = b[("http://www.fourthought.com","abc")] or c = b["http://wwwifourthought.com:abc"] Also, for childNode access would we do anything about node types? is b[0] the first child, first element, first elment with a certain tagname? All I see as useful. Mike > > -- > Paul Prescod - Not encumbered by corporate consensus > The calculus and the rich body of mathematical analysis to which it > gave rise made modern science possible, but it was the algorithm that > made the modern world possible. > - The Advent of the Algorithm (pending), by David Berlinski > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From fdrake@beopen.com Wed Jun 28 12:16:53 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Wed, 28 Jun 2000 07:16:53 -0400 (EDT) Subject: [XML-SIG] [Python-Dev] back up & running Message-ID: <14681.56997.50955.892954@cj42289-a.reston1.va.home.com> My mega-laptop got fixed faster than I'd expected, so I'm back up to doing useful work. My top priorities are handling patches and going back through my email to find all the documentation patches that have sat idle for too long. -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://www.python.org/mailman/listinfo/python-dev From paul@prescod.net Wed Jun 28 19:12:38 2000 From: paul@prescod.net (Paul Prescod) Date: Wed, 28 Jun 2000 11:12:38 -0700 Subject: [XML-SIG] Reconsidering the DOM API References: <14680.59651.130151.140876@fermi.eeel.nist.gov> <4.3.1.0.20000628124707.00b1a100@mail.tmt.de> <395A0C50.A781B03F@prescod.net> <395A3F07.7A812984@FourThought.com> Message-ID: <395A4016.5F7B6467@prescod.net> > I like it for syntatic sugar as well, but namespaces would make the > attribute access interesting ... > Also, for childNode access would we do anything about node types? is > b[0] the first child, first element, first elment with a certain > tagname? All I see as useful.... Once we go down this path we end up reinventing XPath. That's why my EventDOM depends on it so heavily. XPath in Python 1.7! DOM in Python 1.6. :) -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From case@appliedtheory.com Wed Jun 28 19:14:59 2000 From: case@appliedtheory.com (Benjamin Saller) Date: Wed, 28 Jun 2000 14:14:59 -0400 (EDT) Subject: [XML-SIG] Reconsidering the DOM API In-Reply-To: <395A38BD.6077296D@prescod.net> Message-ID: Today, Paul Prescod wrote: Core DOM? How is it complex? You have parents, children, elements, attributes, siblings, etc. Usability enhancements are important (I typically combine XPath with the DOM whenever I need to do DOM work) but they are extensions. The DOM itself is not that complicated. I mean the DOM was designed for knuckle-dragging JavaScript "programmers" (term used lightly). The core concepts can be taught in five minutes (see the XML Howto). I don't see the complexity. I would say rather that it isn't complex enough in that it is lacking query facilities (i.e. XPath). You are totally correct that the concept of the DOM is simple. The complexity is the usage in terms of things that you should be able to do with 2 or 3 lines of code need 2 or 3 times that without something like XPath. I just think the more common the usage pattern the easier you need it to be. I don't mean to sound at odds with you and I obviously don't expect you to disagree with the last statement. I am just saying that we both see a common usage pattern that is not supported out of the box in Python and it looks like it won't be in 1.6. I am just trying to encourage solutions with very simplistic usage patterns and would like to be able to 'sell' that. When people ask what a solution looks like in Java or ColdFusion or PHP I want to be able to point to a Python solution that is simpler and uses less code. -- Benjamin Saller Technical Strategist AppliedTheory Where tire hits pavement on the Information super-highway, that's where my head is... From case@appliedtheory.com Wed Jun 28 19:30:23 2000 From: case@appliedtheory.com (Benjamin Saller) Date: Wed, 28 Jun 2000 14:30:23 -0400 (EDT) Subject: [XML-SIG] Reconsidering the DOM API In-Reply-To: <395A4016.5F7B6467@prescod.net> Message-ID: Today, Paul Prescod wrote: Once we go down this path we end up reinventing XPath. That's why my EventDOM depends on it so heavily. XPath in Python 1.7! DOM in Python 1.6. :) Fair enough. -- Benjamin Saller Technical Strategist AppliedTheory Where tire hits pavement on the Information super-highway, that's where my head is... From bjorn@roguewave.com Wed Jun 28 19:45:46 2000 From: bjorn@roguewave.com (Bjorn Pettersen) Date: Wed, 28 Jun 2000 12:45:46 -0600 Subject: [XML-SIG] Contents of xmlcore? References: <200006280252.WAA01464@mira.erols.com> <14681.29704.194719.579529@cj42289-a.reston1.va.home.com> <395A0C10.FB5A81F0@prescod.net> Message-ID: <395A47DA.C68841FD@roguewave.com> Paul Prescod wrote: > > "Fred L. Drake, Jr." wrote: > > > > ... > > > > Based on the recent mess regarding the DOM API, I'm not inclined to > > include a DOM-like API until we have a specification for the Python > > DOM API. I won't have time to write one before 1.6 is done. ;( > > Argh. We have four such specifications: > > * 4dom > * pydom > * minidom > * http://www.python.org/doc/howto/xml/node14.html > [snip] Unfortunately I don't know too much about XML (but I'm being forced to learn quicly ). From a beginner's perspective, any of the dom APIs are more approachable than the SAX APIs. I'm comfortable with event driven programming, but still it feels much easier to think of an xml document as a tree structure rather than a set of events (investigative programming in the interpreter is tons easier too...) I therefore think it would be a mistake to not include a DOM like API in xmlcore. I don't think I have enought experience to offer an opinion about which is better (but as a datapoint, I'm using qp_xml since it is fast...) -- bjorn From paul@prescod.net Wed Jun 28 19:43:21 2000 From: paul@prescod.net (Paul Prescod) Date: Wed, 28 Jun 2000 11:43:21 -0700 Subject: [XML-SIG] Reconsidering the DOM API References: Message-ID: <395A4749.6E446FC9@prescod.net> Benjamin Saller wrote: > > ... > > I am just trying to encourage solutions with very simplistic usage > patterns and would like to be able to 'sell' that. When people ask what a > solution looks like in Java or ColdFusion or PHP I want to be able to > point to a Python solution that is simpler and uses less code. I agree, but #1. I have no time to write it. #2. People here are already nervous about including the DOM because it isn't stable enough/tested enough for them. If you think you will be embarrased to put our DOM against Java's DOM, imagine the situation if we don't put in a DOM at all? At this point there are more nay votes than yay and without a benevolent dictator it isn't clear what other mechanism would decide inclusion. Speaking only for myself, and not for anyone else in the group, I would welcome a module that did simple XPath processing on top of our DOM. I would support its introduction in 1.6 if a) it came with a complete test suite that did reasonably full coverage b) it was "vanilla XPath" with few or no extensions c) it was very little code (i.e. a very small XPath subset) In order to get the conservatives on our side, we might have to call the module: import minidom import experimental_xpath experimental_xpath.addSupport( minidom ) dom.xpath( "a/b" ) I'm actually not kidding about going that far if that's what it takes to convince people that we won't be locked into the API forever. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From wunder@ultraseek.com Wed Jun 28 19:56:54 2000 From: wunder@ultraseek.com (Walter Underwood) Date: Wed, 28 Jun 2000 11:56:54 -0700 Subject: [XML-SIG] Contents of xmlcore? In-Reply-To: <20000627233107.L29590@lyra.org> Message-ID: <120740354.962193414@serrano.infoseek.com> --On Tuesday, June 27, 2000 11:31 PM -0700 Greg Stein wrote: > > I'd have to agree with this. Punt the DOM. > > Seems like that leaves expat and sax? Not much of a core :-( Well, it is "core", not "kitchen sink". Turns out that including Expat and SAX would be a total solution for our product (Ultraseek Server search engine). We don't use DOM interfaces and have no plans to. We've done some prototyping on another product using XSLT, but the XSLT engines are evolving towards using custom, non-DOM interfaces, and we were doing that in Java anyway. wunder -- Walter R. Underwood Senior Staff Engineer, Ultraseek Corp. http://www.ultraseek.com/ From uogbuji@fourthought.com Wed Jun 28 20:07:44 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 28 Jun 2000 13:07:44 -0600 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: Message from Duncan Grisby of "Wed, 28 Jun 2000 10:30:54 BST." <200006280930.KAA01726@pineapple.uk.research.att.com> Message-ID: <200006281907.NAA10079@localhost.localdomain> > [1] The only clash which can't be fixed by escaping the IDL > identifiers with an underscore is the interface named "Range" in > module "range". This is really a problem for the W3C to fix, but > it would be easy enough to relax the name scoping rules to allow > range::Range to pass the IDL compiler. Ah, that would be why we never experienced it. We don't implement range, and I'm wondering if we'll ever be able to. I used to think ranges were useless except in browsers, but XPointer uses ranges, so we're still wondering what to do about this. The thought of implementing ranges makes my hair stand on end. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uogbuji@fourthought.com Wed Jun 28 20:13:16 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 28 Jun 2000 13:13:16 -0600 Subject: [XML-SIG] Contents of xmlcore? In-Reply-To: Message from Paul Prescod of "Wed, 28 Jun 2000 07:30:40 PDT." <395A0C10.FB5A81F0@prescod.net> Message-ID: <200006281913.NAA10101@localhost.localdomain> > "Fred L. Drake, Jr." wrote: > > > > ... > > > > Based on the recent mess regarding the DOM API, I'm not inclined to > > include a DOM-like API until we have a specification for the Python > > DOM API. I won't have time to write one before 1.6 is done. ;( > > Argh. We have four such specifications: > > * 4dom > * pydom > * minidom > * http://www.python.org/doc/howto/xml/node14.html > > Nothing has changed for MONTHS. I'm somewhat dismayed that the XML stuff > is being held to a much higher status than anything else. In what other > area would an API remain the same for months, fend off an attack from > one of the leading gurus in the Python world and then be refused entry > because it is not written up in a more formal document than a howto? > > Okay, fine. > > Here's the Python DOM 1 mapping: [snip] Blimey, Paul. I thought you had a doctor's appointment today. Did you ditch to write the binding? It's a decent start, and I'll try to merge the few paras I had (mostly cobble together after reading the Javastuff) and make any comments. We might have a formal document yet. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From case@appliedtheory.com Wed Jun 28 20:38:23 2000 From: case@appliedtheory.com (Benjamin Saller) Date: Wed, 28 Jun 2000 15:38:23 -0400 (EDT) Subject: [XML-SIG] Reconsidering the DOM API In-Reply-To: <395A4749.6E446FC9@prescod.net> Message-ID: Today, Paul Prescod wrote: Speaking only for myself, and not for anyone else in the group, I would welcome a module that did simple XPath processing on top of our DOM. I would support its introduction in 1.6 if a) it came with a complete test suite that did reasonably full coverage b) it was "vanilla XPath" with few or no extensions c) it was very little code (i.e. a very small XPath subset) Would others on the list accecpt this? Is it worth having this discussion? This would need to be done by Friday? -- Benjamin Saller Technical Strategist AppliedTheory Where tire hits pavement on the Information super-highway, that's where my head is... From fdrake@beopen.com Wed Jun 28 21:23:54 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Wed, 28 Jun 2000 16:23:54 -0400 (EDT) Subject: [XML-SIG] Reconsidering the DOM API In-Reply-To: <395A4749.6E446FC9@prescod.net> References: <395A4749.6E446FC9@prescod.net> Message-ID: <14682.24282.48074.313889@cj42289-a.reston1.va.home.com> Paul Prescod writes: > I'm actually not kidding about going that far if that's what it takes to > convince people that we won't be locked into the API forever. I think we can avoid *that* extreme! Even I'm not that anal. -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From uogbuji@fourthought.com Wed Jun 28 21:31:06 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 28 Jun 2000 14:31:06 -0600 Subject: [XML-SIG] Reconsidering the DOM API In-Reply-To: Message from Benjamin Saller of "Wed, 28 Jun 2000 15:38:23 EDT." Message-ID: <200006282031.OAA10268@localhost.localdomain> > Today, Paul Prescod wrote: > > Speaking only for myself, and not for anyone else in the group, I would > welcome a module that did simple XPath processing on top of our DOM. I > would support its introduction in 1.6 if > > a) it came with a complete test suite that did reasonably full coverage > b) it was "vanilla XPath" with few or no extensions > c) it was very little code (i.e. a very small XPath subset) > > Would others on the list accecpt this? Is it worth having this > discussion? This would need to be done by Friday? Just because I have to say it... 4XPath should work well with any DOM that supports the binding (it actually only uses attribute access). However, it uses a C module and I understand those are harder to get sanctioned into the Python distro. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From paul@prescod.net Wed Jun 28 22:24:23 2000 From: paul@prescod.net (Paul Prescod) Date: Wed, 28 Jun 2000 14:24:23 -0700 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? References: <200006281907.NAA10079@localhost.localdomain> Message-ID: <395A6D07.707C27E4@prescod.net> I fought the good fight against ranges in XPointer...evil, unmitigated evil. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From paul@prescod.net Wed Jun 28 22:29:40 2000 From: paul@prescod.net (Paul Prescod) Date: Wed, 28 Jun 2000 14:29:40 -0700 Subject: [XML-SIG] Reconsidering the DOM API References: <200006282031.OAA10268@localhost.localdomain> Message-ID: <395A6E44.FA85FC5E@prescod.net> Uche Ogbuji wrote: > > Just because I have to say it... 4XPath should work well with any DOM that > supports the binding (it actually only uses attribute access). However, it > uses a C module and I understand those are harder to get sanctioned into the > Python distro. Because it is so much C code, and has a pretty sophisticated API, and also depends on a bunch of Python code, I would rather wait until 1.7 for that. But I think we should give it serious consideration for 1.7. Having full XPath would absolutely rock. Also, it doesn't just depend on C -- it also depends on Yacc, right? I'm not yet ready to accept that no Python-coded parser could parse XPath efficiently. I want to propose that it is impossible to Fredrick and see what happens in the next beta of SRE. :) http://www.w3.org/TR/xpath Above and beyond XPath, I would like to believe that reasonably efficient parsers can be written in straight Python using SRE or at least MXTools or something... -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From paul@prescod.net Wed Jun 28 22:56:21 2000 From: paul@prescod.net (Paul Prescod) Date: Wed, 28 Jun 2000 14:56:21 -0700 Subject: [XML-SIG] Reconsidering the DOM API References: Message-ID: <395A7485.ACB39D0E@prescod.net> Benjamin Saller wrote: > ... > Would others on the list accecpt this? Nobody will disagree until you write it. No-one wants to discourage a coder. :) > Is it worth having this > discussion? This would need to be done by Friday? Er. Probably early Friday. Hard to tell. Andrew is probably the person who has to check it in. Andrew? -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From Mike.Olson@fourthought.com Wed Jun 28 23:00:14 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Wed, 28 Jun 2000 16:00:14 -0600 Subject: [XML-SIG] Reconsidering the DOM API References: <200006282031.OAA10268@localhost.localdomain> <395A6E44.FA85FC5E@prescod.net> Message-ID: <395A756E.99A3852F@FourThought.com> Paul Prescod wrote: > > Uche Ogbuji wrote: > > > > Just because I have to say it... 4XPath should work well with any DOM that > > supports the binding (it actually only uses attribute access). However, it > > uses a C module and I understand those are harder to get sanctioned into the > > Python distro. > > Because it is so much C code, and has a pretty sophisticated API, and > also depends on a bunch of Python code, I would rather wait until 1.7 > for that. But I think we should give it serious consideration for 1.7. > Having full XPath would absolutely rock. Also, it doesn't just depend on > C -- it also depends on Yacc, right? > > I'm not yet ready to accept that no Python-coded parser could parse > XPath efficiently. I want to propose that it is impossible to Fredrick > and see what happens in the next beta of SRE. :) > > http://www.w3.org/TR/xpath > We can have LEX and YACC spit out ANSI compliant C Code to remove the dependency on LEX and YACC. That's currently how we do our windows build. > I'm not yet ready to accept that no Python-coded parser could parse > XPath efficiently. I want to propose that it is impossible to Fredrick > and see what happens in the next beta of SRE. :) I actually started a conversation with Fredrick at IPC8 about SRE for parsing XPath. I'll try to pick it up where it was left off. He made some comments about not using the "public" API to get around some of the problems. > > -- > Paul Prescod - Not encumbered by corporate consensus > The calculus and the rich body of mathematical analysis to which it > gave rise made modern science possible, but it was the algorithm that > made the modern world possible. > - The Advent of the Algorithm (pending), by David Berlinski > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uogbuji@fourthought.com Thu Jun 29 06:01:18 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 28 Jun 2000 23:01:18 -0600 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: Message from Paul Prescod of "Wed, 28 Jun 2000 14:24:23 PDT." <395A6D07.707C27E4@prescod.net> Message-ID: <200006290501.XAA11132@localhost.localdomain> > I fought the good fight against ranges in XPointer...evil, unmitigated > evil. And given the stark fact of the XML infoset as fundamental model for XML, completely irresponsible of the W3C. It sniffs of something a Microsoft or Netscape wangled in. I wish I'd witnessed your fight. I would have joined in. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uogbuji@fourthought.com Thu Jun 29 06:09:49 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 28 Jun 2000 23:09:49 -0600 Subject: [XML-SIG] Reconsidering the DOM API In-Reply-To: Message from Paul Prescod of "Wed, 28 Jun 2000 14:29:40 PDT." <395A6E44.FA85FC5E@prescod.net> Message-ID: <200006290509.XAA11152@localhost.localdomain> > Uche Ogbuji wrote: > > > > Just because I have to say it... 4XPath should work well with any DOM that > > supports the binding (it actually only uses attribute access). However, it > > uses a C module and I understand those are harder to get sanctioned into the > > Python distro. > > Because it is so much C code, and has a pretty sophisticated API, and > also depends on a bunch of Python code, I would rather wait until 1.7 > for that. But I think we should give it serious consideration for 1.7. > Having full XPath would absolutely rock. Also, it doesn't just depend on > C -- it also depends on Yacc, right? True (it depends on Bison, actually). But then again, so does Python. > I'm not yet ready to accept that no Python-coded parser could parse > XPath efficiently. I want to propose that it is impossible to Fredrick > and see what happens in the next beta of SRE. :) We do plan to try again with SRE. Mike was pretty excited after talking to the Effbot (I think) at IPC8 and he thought SRE might be fast enough. We'll see. If /F beats us to it, tant mieux. > http://www.w3.org/TR/xpath > > Above and beyond XPath, I would like to believe that reasonably > efficient parsers can be written in straight Python using SRE or at > least MXTools or something... Hmm... regular expressions for XPath seems a bit sticky, but feasible, but full XML?... -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From fdrake@beopen.com Thu Jun 29 06:15:57 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Thu, 29 Jun 2000 01:15:57 -0400 (EDT) Subject: [XML-SIG] Reconsidering the DOM API In-Reply-To: <200006290509.XAA11152@localhost.localdomain> References: <395A6E44.FA85FC5E@prescod.net> <200006290509.XAA11152@localhost.localdomain> Message-ID: <14682.56205.374628.166626@cj42289-a.reston1.va.home.com> Uche Ogbuji writes: > True (it depends on Bison, actually). But then again, so does Python. No, Python includes a hand-coded tokenizer and it's own parser generator, so doesn't depend on external packages for these. (I'm not saying that's ideal, just that that's how it is.) -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From xml-sig@teleo.net Thu Jun 29 07:03:10 2000 From: xml-sig@teleo.net (Patrick Phalen) Date: Wed, 28 Jun 2000 23:03:10 -0700 Subject: [XML-SIG] Reconsidering the DOM API In-Reply-To: <14680.59651.130151.140876@fermi.eeel.nist.gov> References: <14680.59651.130151.140876@fermi.eeel.nist.gov> Message-ID: <00062823431300.11217@quadra.teleo.net> [Michael McLay, on Tue, 27 Jun 2000] :: Perhaps it is time to step back and ask how easy XML could be if :: the Python interface had nothing to do with SAX or DOM. How easy? Having just read through Sean McGrath's impressive new book, _XML Programming with Python_, I'd have to answer "very easy indeed." Sean's notion of using the very simple, Python friendly, Pyxie API is atttractive. I admit I was suspicious of the idea at first, but became a convert after seeing it in action. It allows me to do the great majority of XML work I face in a comfortable, Pythonic way. His further notion that Pyxie should support language-independent XML processing APIs (namely SAX and DOM) in a compatibility layer is also a natural. I'm frankly surprised to have seen no discussion of Pyxie for inclusion in the XML core. Why is that? Let's review its pedigree: * Based on James Clark's ESIS -compatible parsers, sgmls and nsgmls. * Open source (see http://www.pyxie.org). * Written by the author of _XML by Example_, a computer scientist who teaches Smalltalk and Java at college, but prefers Python for his own work. * Showcased in the first (and so far only) published book about Python and XML. In other words, Pyxie has a lot going for it, both as a marketing vehicle for Python as an XML language and as a great tool. It certainly seems to deserve a place beside the other weapons in the arsenal. Python has a growing reputation as a natural language for XML work. It seems to be attracting a lot of new users for this reason. Some, of course, will want to work immediately with SAX and the DOM, for job-related or other reasons. But if others have a chance to see how much they can accomplish quickly and painlessly, without hassling with "lowest common denominator" DOM and SAX issues, so much the better, eh? I've been fooling around with Pyxie and I have to attest to the fact that it's effective and fun, befitting a Pythonic solution. With Pyxie as potential "middleware" between Python and the DOM, perhaps we don't immediately have to worry about precise interface definitions. From Fredrik Lundh" Message-ID: <004801bfe1ab$c5f3a700$f2a6b5d4@hagrid> matt wrote: > I thought I'd try out soaplib 0.8. But apparently I'm missing > something or haven't set it up correctly. When I run the test > programs, 'test2.py' works, but the other two fail. my fault -- those scripts shouldn't have been in the distribution. the test1.py script indicates a real bug in the marshalling code (probably introduced when we tweaked things to work with the userland server). I'll look into this for the maintenance release (out soon). the test3.py script is broken; change "Error" to "soaplib.Error", and it'll work a little bit better (see comments in the file for why the output doesn't make much sense...) thanks /F From walter@livinglogic.de Thu Jun 29 10:26:12 2000 From: walter@livinglogic.de (Walter Doerwald) Date: Thu, 29 Jun 2000 11:26:12 +0200 Subject: [XML-SIG] Reconsidering the DOM API In-Reply-To: <395A3F07.7A812984@FourThought.com> References: <14680.59651.130151.140876@fermi.eeel.nist.gov> <4.3.1.0.20000628124707.00b1a100@mail.tmt.de> <395A0C50.A781B03F@prescod.net> Message-ID: <4.3.1.0.20000629111052.00b24830@mail.tmt.de> At 20:08 28.06.00, Mike Olson wrote: >Paul Prescod wrote: > > > > Walter Doerwald wrote: > > > > > > Why not put children and attribute access into __getitem__ > > > c =3D b[0][1][0]["abc"] > > > > Not a bad idea as syntactic sugar. We should consider it for the NEXT > > version of the DOM API mapping. > >I like it for syntatic sugar as well, but namespaces would make the >attribute access interesting > >c =3D b[("http://www.fourthought.com","abc")] > >or > >c =3D b["http://wwwifourthought.com:abc"] > > >Also, for childNode access would we do anything about node types? is >b[0] the first child, first element, first elment with a certain >tagname? All I see as useful. b[0] is the first child. b.find(type =3D xml.Element)[0] or b.find(test =3D lambda x: type(x)=3D=3Dxml.Element)[0] is the first child, that's an element. b.find(type =3D html.a)[0] or b.find(test =3D lambda x: type(x)=3D=3Dhtml.a)[0] is the first child with an element type of html.a. Of course this means, that every element type corresponds to a Python class (as the "abstract" Element already does). (But again we have the problem of bringing namespaces into this scheme). Bye, Walter D=F6rwald -- Walter D=F6rwald =B7 LivingLogic AG =B7 Bayreuth, Germany =B7 www.livinglogi= c.de From jdnier@execpc.com Thu Jun 29 13:17:00 2000 From: jdnier@execpc.com (David Niergarth) Date: Thu, 29 Jun 2000 07:17:00 -0500 (CDT) Subject: [XML-SIG] Reconsidering the DOM API In-Reply-To: <200006290509.XAA11152@localhost.localdomain> Message-ID: Uche Ogbuji wrote: > Hmm... regular expressions for XPath seems a bit sticky, but feasible, but > full XML?... I made a post some months back pointing out a "shallow parsing" regular expression that can break an XML document into a list of its markup and text items. I thought it was pretty interesting example of just what you can do with a regex! I put some details at http://starship.python.net/crew/dni/REX/index.html Incidently (for Fredrik), sre included with python 1.6a fails to compile the regex mentioned above (although re is able to compile it) -- I was hoping to see just how much faster it is! There have been a couple sre patches cheked into cvs since I last compiled the source. I've been meaning to try again but haven't had time yet. The regex is pretty large and might make a good test case. --David Niergarth From fredrik@pythonware.com Thu Jun 29 14:43:29 2000 From: fredrik@pythonware.com (Fredrik Lundh) Date: Thu, 29 Jun 2000 15:43:29 +0200 Subject: [XML-SIG] Reconsidering the DOM API References: Message-ID: <00ca01bfe1d0$0459bba0$0900a8c0@SPIFF> paul wrote: > I'm not yet ready to accept that no Python-coded parser could parse > XPath efficiently. I want to propose that it is impossible to Fredrick > and see what happens in the next beta of SRE. :) > > http://www.w3.org/TR/xpath challenge accepted. david: > I made a post some months back pointing out a "shallow parsing" regular > expression that can break an XML document into a list of its markup and > text items. I thought it was pretty interesting example of just what you > can do with a regex! I put some details at > > http://starship.python.net/crew/dni/REX/index.html the thing I call SREX is something similar (the pattern isn't as complete as REX, and it squeezes some extra performance out of SRE by using something called "template mode"). some notes on xmllib/sgmlop/sre performance can be found here: http://hem.passagen.se/eff/2000_06_01_bot-archive.htm#397730 and: http://hem.passagen.se/eff/2000_06_01_bot-archive.htm#399596 > Incidently (for Fredrik), sre included with python 1.6a fails to compile > the regex mentioned above (although re is able to compile it) -- I was > hoping to see just how much faster it is! There have been a couple sre > patches cheked into cvs since I last compiled the source. I've spent the last three days working on SRE -- the current snapshot is *much* better. > The regex is pretty large and might make a good test case. I'll take a look at it; if it doesn't work, I'll consider that as a critical bug. cheers /F From paul@prescod.net Thu Jun 29 14:41:42 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 29 Jun 2000 06:41:42 -0700 Subject: [XML-SIG] Reconsidering the DOM API References: <200006290509.XAA11152@localhost.localdomain> Message-ID: <395B5216.2A8D7E19@prescod.net> Uche Ogbuji wrote: > > > Above and beyond XPath, I would like to believe that reasonably > > efficient parsers can be written in straight Python using SRE or at > > least MXTools or something... > > Hmm... regular expressions for XPath seems a bit sticky, but feasible, but > full XML?... No, I didn't mean full XML. I meant, e.g., IDL or other languages where the data files do not tend to be megabytes long and arriving once per second. :) But hey, if I say its impossible it will probably be possible with the next version of SRE. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From paul@prescod.net Thu Jun 29 14:47:27 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 29 Jun 2000 06:47:27 -0700 Subject: [XML-SIG] Pyxie References: <14680.59651.130151.140876@fermi.eeel.nist.gov> <00062823431300.11217@quadra.teleo.net> Message-ID: <395B536F.5B2E2ADC@prescod.net> Patrick Phalen wrote: > > ... > > * Based on James Clark's ESIS -compatible parsers, sgmls and nsgmls. There's your problem right there. These parsers are slow and large compared to expat. > In other words, Pyxie has a lot going for it, both as a marketing > vehicle for Python as an XML language and as a great tool. It > certainly seems to deserve a place beside the other weapons in the > arsenal. I agree, that's why I proposed it should be in the XML-SIG's package along with Sean's RAX. Perhaps you can post some examples of things that are substantially easier with Pyxie than with the DOM. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From ken@bitsko.slc.ut.us Thu Jun 29 16:41:23 2000 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 29 Jun 2000 10:41:23 -0500 Subject: [XML-SIG] Pyxie In-Reply-To: Paul Prescod's message of "Thu, 29 Jun 2000 06:47:27 -0700" References: <14680.59651.130151.140876@fermi.eeel.nist.gov> <00062823431300.11217@quadra.teleo.net> <395B536F.5B2E2ADC@prescod.net> Message-ID: Paul Prescod writes: > Patrick Phalen wrote: > > > > ... > > > > * Based on James Clark's ESIS -compatible parsers, sgmls and nsgmls. > > There's your problem right there. These parsers are slow and large > compared to expat. "Based [_in spirit_] on [ESIS from] sgmls and nsgmls." The parsers on the Pyx site are expat and rxp ('xmln' and 'xmlv'; non-validating and validating, respectively), so they are fast and small. Pyx's ESIS puts attributes (A-lines) after the elements instead of before (see the FAQ for why). I recall patches for nsgmls somewhere but they're not on the Pyx site. -- Ken From fdrake@beopen.com Thu Jun 29 16:50:26 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Thu, 29 Jun 2000 11:50:26 -0400 (EDT) Subject: [XML-SIG] XML in Python 1.6 (PROPOSAL) Message-ID: <14683.28738.505210.766315@cj42289-a.reston1.va.home.com> Last night I decided that in order to get the XML support into Python that needs to be there, I'd better get it into CVS on Thursday. Well, today is Thursday, so I'll either have it in today or Python 1.6 may well ship without the XML support we've been working so hard on. Since there hasn't been a lot of discussion here about the "xmlcore" proposal, and since some of us aren't terribly happy about the name and potential confusion over what people should and shouldn't be able to rely on, I spent a little(!) time on the phone last night with someone who knows more about what's useful than I do, and understands the stability concerns we have for the standard library ("if it's buggy, we'll still have to support it forever"): our favorite, Paul Prescod. Here's what we came up with: 1. Create a new package in the standard library, with the following structure: xml/ dom/ __init__ # provides parse(), parseFile(), # and Document minidom # Paul's basic DOM 1 + namespaces # implementation ??? # driver to load a DOM from a SAX parser? parsers/ expat # Python Expat wrapper with namespace # support sax/ __init__ # provides parse(), parseFile(), and # some classes from the handler module xmlreader # used by parser writers handler # base classes for handlers expatreader # SAX driver for Expat saxutils # pretty much the same as now The advantage of using the "xml" name for the package is that most users will be using the most acceptable name. Additional facilities (XSLT, XPath, 4DOM, etc.) can be added as they stabilize and become widely recognized as "core" in the XML community. 2. Deal with PyXML -- two options: a. Rename the base package to something else, "pyxml", "xmlextras", ???. This was the option Paul & I discussed. b. Keep the "xml" name but treat it as a "testbed" version only suitable for use by Python+XML development experts. Not too bad, but not as good I think. Please send comments on the structure; I'm going to try to get this in sometime this afternoon, modulo updates/objections from the group as best I can. We can still change things after this weekend's release, but the sooner we get the package structure down the easier it'll be to work with. -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From ken@bitsko.slc.ut.us Thu Jun 29 17:02:35 2000 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 29 Jun 2000 11:02:35 -0500 Subject: [XML-SIG] Reconsidering the DOM API In-Reply-To: Patrick Phalen's message of "Wed, 28 Jun 2000 23:03:10 -0700" References: <14680.59651.130151.140876@fermi.eeel.nist.gov> <00062823431300.11217@quadra.teleo.net> Message-ID: Patrick Phalen writes: > [Michael McLay, on Tue, 27 Jun 2000] > :: Perhaps it is time to step back and ask how easy XML could be if > :: the Python interface had nothing to do with SAX or DOM. > > How easy? Having just read through Sean McGrath's impressive new book, > _XML Programming with Python_, I'd have to answer "very easy indeed." > > Sean's notion of using the very simple, Python friendly, Pyxie API is > atttractive. I admit I was suspicious of the idea at first, but became > a convert after seeing it in action. It allows me to do the great > majority of XML work I face in a comfortable, Pythonic way. > > His further notion that Pyxie should support language-independent XML > processing APIs (namely SAX and DOM) in a compatibility layer is also a > natural. Pyxie basically provides three things: * the ESIS format so Pyxie utilities can be easily chained together on the command line (the CLI equivalent of chaining SAX filters) * an event based parser API VERY similar to SAX (plus what would be a SAX filter) but not compatible with SAX. * a tree based API VERY similar to DOM (plus some convenience methods) but not compatible with DOM. It's these last two that bug me about Pyxie. Looking at Pyxie.py, I don't see why Pyxie could not have used Python SAX and DOM and have been just as simple. As far as I can tell, Pyxie is merely different. This is not an argument against Pyxie's convenience functions which make Pyxie so easy to use and what draw so many people to it! My concern is that this could have easily been done _with_ SAX and DOM and avoided unnecessary incompatibility. -- Ken From paul@prescod.net Thu Jun 29 17:20:11 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 29 Jun 2000 09:20:11 -0700 Subject: [XML-SIG] Pyxie References: <14680.59651.130151.140876@fermi.eeel.nist.gov> <00062823431300.11217@quadra.teleo.net> <395B536F.5B2E2ADC@prescod.net> Message-ID: <395B773B.74E5007F@prescod.net> Ken MacLeod wrote: > > The parsers on the Pyx site are expat and rxp ('xmln' and 'xmlv'; > non-validating and validating, respectively), so they are fast and > small. But you have to run them in a different process and use pipes and "double parsing." For a long time I couldn't understand why you would choose to do that and then I realized that the real reason is to allow Pyxie to use a pull model rather than a push model of parsing. But Fredrick has shown that you can do pull parsing without a separate executable so I don't really see the benefit in pyx anymore. If you want a simplified XML interchange format, you might as well use canonical XML: http://www.w3.org/TR/xml-c14n The problem with comparing the various APIs that are popping up is that the proponents of one view or the other are not typically very knowledgable about the other's approach. Consequently, I haven't heard clear descriptions of why the Pyxie approach is better. i.e. I would like to see some ugly DOM code transformed into beautiful Pyxie code. I would like to be convinced that these advantages are somehow inherently tied to the fact that Pyxie uses its own tree data structure rather than the DOM and its own event model rather than SAX. If Pyxie's features can be added to established APIs then that's the better way to go. In my opinion, it is mostly a difference in philosophy. I presume that APIs like SAX and DOM are fundamentally fine and just need a few good ideas added to them. I think that the "cross-language" versus "Pythonic" dichotomy is a false one. Pyxie would be a great Java API. DOM is also a great Python API. Languages are not *that different*. You tweak the API a little to make it "at home" and they typically work just as well in one language as in another. What is it that Pyxie really does that is cool? From my perusal, it is pull parsing and the event/tree hybrid. Based on inspiration from Pyxie I've added these features to the DOM through pulldom. I'm not as sure about the cut and paste model of tree modification -- I don't typically write tree-mutating applications. The same holds for the other new API proposed this week. It adds a good idea (XPath addressing) to a tree data structure but invents a whole new tree data structure rather than using the DOM! There's no need to do that. I do not personally feel that basing our APIs on SAX and DOM requires any compromises in terms of "Pythonicity." As we have been discussing this week, I am in favor of using every last feature of Python (including __getattr__, __getitem__, etc.). I don't see why using those features should require abandoning the core of established, popular APIs like DOM and SAX. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From paul@prescod.net Thu Jun 29 17:27:32 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 29 Jun 2000 09:27:32 -0700 Subject: [XML-SIG] Reconsidering the DOM API References: <14680.59651.130151.140876@fermi.eeel.nist.gov> <00062823431300.11217@quadra.teleo.net> Message-ID: <395B78F4.C7D8E3E3@prescod.net> Ken MacLeod wrote: > > ... > > This is not an argument against Pyxie's convenience functions which > make Pyxie so easy to use and what draw so many people to it! My > concern is that this could have easily been done _with_ SAX and DOM > and avoided unnecessary incompatibility. I guess I responded to the wrong guy. We agree! In Sean's defence, I do not think that Pyxie could be written on top of "raw SAX" because the tree building requires a pull model. Python SAX now has a tree building extension, but that might be newer than Pyxie. > * the ESIS format so Pyxie utilities can be easily chained together > on the command line (the CLI equivalent of chaining SAX filters) I think that XML itself (or at least canonical XML) can be the format passed from program to program! -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From ken@bitsko.slc.ut.us Thu Jun 29 17:30:35 2000 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 29 Jun 2000 11:30:35 -0500 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: Paul Prescod's message of "Wed, 28 Jun 2000 14:24:23 -0700" References: <200006281907.NAA10079@localhost.localdomain> <395A6D07.707C27E4@prescod.net> Message-ID: Paul Prescod writes: > I fought the good fight against ranges in XPointer...evil, > unmitigated evil. Aren't ranges necessary to link to a section of an external document? Isn't there a similar feature in the grove model? I know that I would find ranges useful for "clipping" content out of other documents for inclusion in summary docs. Insights needed here... -- Ken From wunder@ultraseek.com Thu Jun 29 17:58:40 2000 From: wunder@ultraseek.com (Walter Underwood) Date: Thu, 29 Jun 2000 09:58:40 -0700 Subject: [XML-SIG] XML in Python 1.6 (PROPOSAL) In-Reply-To: <14683.28738.505210.766315@cj42289-a.reston1.va.home.com> Message-ID: <200045835.962272720@serrano.infoseek.com> --On Thursday, June 29, 2000 11:50 AM -0400 "Fred L. Drake, Jr." wrote: > 1. Create a new package in the standard library, with the following > structure: Looks good. We're using 1.6 for the next rev of our search engine, and pulling Expat officially into Python would be really nice. We'll pound on the Unicode/XML support on NT, Linux/intel, Solaris/SPARC, and HP-UX. > parsers/ > expat # Python Expat wrapper with namespace > # support How about "parser", since we don't say "doms". > 2. Deal with PyXML -- two options: > a. Rename the base package to something else, "pyxml", > "xmlextras", ???. "xmlextra". wunder -- Walter R. Underwood Senior Staff Engineer, Ultraseek Corp. http://www.ultraseek.com/ From fdrake@beopen.com Thu Jun 29 18:20:52 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Thu, 29 Jun 2000 13:20:52 -0400 (EDT) Subject: [XML-SIG] XML in Python 1.6 (PROPOSAL) In-Reply-To: <200045835.962272720@serrano.infoseek.com> References: <14683.28738.505210.766315@cj42289-a.reston1.va.home.com> <200045835.962272720@serrano.infoseek.com> Message-ID: <14683.34164.967706.461751@cj42289-a.reston1.va.home.com> Walter Underwood writes: > Looks good. We're using 1.6 for the next rev of our search engine, > and pulling Expat officially into Python would be really nice. > We'll pound on the Unicode/XML support on NT, Linux/intel, > Solaris/SPARC, and HP-UX. Great! We'll look forward to bug reports! ;) > How about "parser", since we don't say "doms". Good -- I like that better. > > 2. Deal with PyXML -- two options: > > a. Rename the base package to something else, "pyxml", > > "xmlextras", ???. > > "xmlextra". I'll leave the final name to Andrew on this one; this big issue is that it involves a name change, which is annoying in many ways. -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From ken@bitsko.slc.ut.us Thu Jun 29 18:32:51 2000 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 29 Jun 2000 12:32:51 -0500 Subject: [XML-SIG] Pyxie In-Reply-To: Paul Prescod's message of "Thu, 29 Jun 2000 09:20:11 -0700" References: <14680.59651.130151.140876@fermi.eeel.nist.gov> <00062823431300.11217@quadra.teleo.net> <395B536F.5B2E2ADC@prescod.net> <395B773B.74E5007F@prescod.net> Message-ID: Paul Prescod writes: > Ken MacLeod wrote: > > > > The parsers on the Pyx site are expat and rxp ('xmln' and 'xmlv'; > > non-validating and validating, respectively), so they are fast and > > small. > > But you have to run them in a different process and use pipes and > "double parsing." For a long time I couldn't understand why you > would choose to do that and then I realized that the real reason is > to allow Pyxie to use a pull model rather than a push model of > parsing. > > But Fredrick has shown that you can do pull parsing without a > separate executable so I don't really see the benefit in pyx > anymore. If you want a simplified XML interchange format, you might > as well use canonical XML: > > http://www.w3.org/TR/xml-c14n The number of XML questions I've seen answered with "here's how to do that with PYX" using standard Unix CLI tools (grep, sed, awk, sh, etc.) leads me to believe that there is, in fact, great benefit in using the ESIS format directly for many tasks. Canonical XML wouldn't have that benefit because it still requires character based parsing, whereas ESIS fits in nicely with Unix's line mode tools. On the pull model, I realize it's not related to PYX or the API (per se). -- Ken From akuchlin@mems-exchange.org Thu Jun 29 18:27:51 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Thu, 29 Jun 2000 13:27:51 -0400 Subject: [XML-SIG] Pyxie In-Reply-To: <395B773B.74E5007F@prescod.net> References: <14680.59651.130151.140876@fermi.eeel.nist.gov> <00062823431300.11217@quadra.teleo.net> <395B536F.5B2E2ADC@prescod.net> <395B773B.74E5007F@prescod.net> Message-ID: <20000629132751.G29661@kronos.cnri.reston.va.us> On Thu, Jun 29, 2000 at 09:20:11AM -0700, Paul Prescod wrote: >But Fredrick has shown that you can do pull parsing without a separate >executable so I don't really see the benefit in pyx anymore. If you want >a simplified XML interchange format, you might as well use canonical >XML. A line-oriented format like PYX, however, is also better suited to existing line-oriented tools such as grep, and it's much easier to deal with a line-at-a-time format. For example, I have some unfinished and unreleased Emacs Lisp code that tries to build a data structure for an XML document by running xmln on a buffer, and then parsing the resulting PYX output. --amk From uogbuji@fourthought.com Thu Jun 29 18:59:06 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 29 Jun 2000 11:59:06 -0600 Subject: [XML-SIG] XML in Python 1.6 (PROPOSAL) In-Reply-To: Message from "Fred L. Drake, Jr." of "Thu, 29 Jun 2000 11:50:26 EDT." <14683.28738.505210.766315@cj42289-a.reston1.va.home.com> Message-ID: <200006291759.LAA12609@localhost.localdomain> > Here's what we came up with: > > 1. Create a new package in the standard library, with the following > structure: > > xml/ > dom/ > __init__ # provides parse(), parseFile(), > # and Document > minidom # Paul's basic DOM 1 + namespaces > # implementation > ??? # driver to load a DOM from a SAX parser? > parsers/ > expat # Python Expat wrapper with namespace > # support > sax/ > __init__ # provides parse(), parseFile(), and > # some classes from the handler module > xmlreader # used by parser writers > handler # base classes for handlers > expatreader # SAX driver for Expat > saxutils # pretty much the same as now > > The advantage of using the "xml" name for the package is that > most users will be using the most acceptable name. Additional > facilities (XSLT, XPath, 4DOM, etc.) can be added as they > stabilize and become widely recognized as "core" in the XML > community. I concur with this. Though I expressed some concerns about interface, I think the non-DOM interfaces are so few and so basic that there should be little problem. Unless someone has done substantial work on a driver to create a DOM from a SAX parser (4DOM's won't do as they aren't up to the latest Sax2 yet), I'd say we leave this out until Python 1.7. I guess we'd going to have to re-package 4DOM as xml/dom2 or something to avoid clashes. > 2. Deal with PyXML -- two options: > a. Rename the base package to something else, "pyxml", > "xmlextras", ???. This was the option Paul & I discussed. > b. Keep the "xml" name but treat it as a "testbed" version only > suitable for use by Python+XML development experts. Not too > bad, but not as good I think. I think we should continue to put all XML things in the xml package. Why muddy the waters? Why can't the PyXML package just know to add its extras into the existing xml package? -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From fdrake@beopen.com Thu Jun 29 19:22:52 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Thu, 29 Jun 2000 14:22:52 -0400 (EDT) Subject: [XML-SIG] XML in Python 1.6 (PROPOSAL) In-Reply-To: <200006291759.LAA12609@localhost.localdomain> References: <14683.28738.505210.766315@cj42289-a.reston1.va.home.com> <200006291759.LAA12609@localhost.localdomain> Message-ID: <14683.37884.553770.644120@cj42289-a.reston1.va.home.com> Uche Ogbuji writes: > Unless someone has done substantial work on a driver to create a > DOM from a SAX parser (4DOM's won't do as they aren't up to the > latest Sax2 yet), I'd say we leave this out until Python 1.7. I'm expecting Lars to provide SAX 2 support "out of the box" for Python 1.6. > I think we should continue to put all XML things in the xml package. Why > muddy the waters? Why can't the PyXML package just know to add its extras > into the existing xml package? The catch is that *requires* that PyXML either be installed on top of the Python distribution (since the xml package can only exist in one place in the directory tree), or that it include all the standard portions of the xml package. I'm not real happy with either approach. Another possibility may be to have PyXML provide a package "xmlextra" (or whatever), and xml.__init__ can check for it's presence and "incorporate" it somehow: ------------------------------------------------------------ try: import xmlextra except ImportError: # not present pass else: # add new subpackages __path__.append(os.path.dirname(xmlextra.__file__)) del xmlextra ------------------------------------------------------------ There are two problems with this approach: 1. The packages/modules provided by xmlextra are importable by two names: xml. and xmlextra.. I'm not entirely sure how much we should care about this one. 2. Old versions of xmlextra may be installed in a user's (or application's) private area, but a more recent version of Python's standard xml package is installed (with a more recent version of Python). Again, I'm not sure how relevant this is, but it does offer potential breakage. -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From akuchlin@mems-exchange.org Thu Jun 29 19:14:49 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Thu, 29 Jun 2000 14:14:49 -0400 Subject: [XML-SIG] XML in Python 1.6 (PROPOSAL) In-Reply-To: <14683.34164.967706.461751@cj42289-a.reston1.va.home.com> References: <14683.28738.505210.766315@cj42289-a.reston1.va.home.com> <200045835.962272720@serrano.infoseek.com> <14683.34164.967706.461751@cj42289-a.reston1.va.home.com> Message-ID: <20000629141449.J29661@kronos.cnri.reston.va.us> On Thu, Jun 29, 2000 at 01:20:52PM -0400, Fred L. Drake, Jr. wrote: > I'll leave the final name to Andrew on this one; this big issue is >that it involves a name change, which is annoying in many ways. "xmlextra" is fine. It would be nice to invent a way to automatically add things to the xml.* namespace, but that seems difficult and error-prone. The name change simplifies a good many things, and we don't want to break things *after* 1.6, so let's get it over with now. My great fear is breaking existing code and what's worse, Sean's book. (I'll have to buy a copy and see what would need to be preserved to keep the examples running.) --amk From paul@prescod.net Thu Jun 29 19:45:35 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 29 Jun 2000 11:45:35 -0700 Subject: [XML-SIG] Pyxie References: <14680.59651.130151.140876@fermi.eeel.nist.gov> <00062823431300.11217@quadra.teleo.net> <395B536F.5B2E2ADC@prescod.net> <395B773B.74E5007F@prescod.net> Message-ID: <395B994E.6A1EC40C@prescod.net> Ken MacLeod wrote: > > > http://www.w3.org/TR/xml-c14n > > The number of XML questions I've seen answered with "here's how to do > that with PYX" using standard Unix CLI tools (grep, sed, awk, sh, > etc.) leads me to believe that there is, in fact, great benefit in > using the ESIS format directly for many tasks. Canonical XML wouldn't > have that benefit because it still requires character based parsing, > whereas ESIS fits in nicely with Unix's line mode tools. You are right. I thought that Canonical XML was more line-oriented than it is. I still claim that we could improve upon pyx by making it a line-oriented *xml subset*. Then you could pipe through both XML-understanding and XML-stupid tools. Several of James Clark's tools already produce that kind of thing. A couple of years ago when I asked him about improvements to his ESIS output he said that xml subsets were where he was planning to go (none of his future tools supported either ESIS nor a formally defined XML subset, though). -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From uogbuji@fourthought.com Thu Jun 29 19:05:10 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 29 Jun 2000 12:05:10 -0600 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: Message from Ken MacLeod of "29 Jun 2000 11:30:35 CDT." Message-ID: <200006291805.MAA12626@localhost.localdomain> > Paul Prescod writes: > > > I fought the good fight against ranges in XPointer...evil, > > unmitigated evil. > > Aren't ranges necessary to link to a section of an external document? > Isn't there a similar feature in the grove model? Any grove guru can smack me down whenever they like, but I'm pretty sure that the SGML grove model that underlies the XML infoset only allows character-level addressability in sane places (such as within character data). Ranges go above and beyond. > I know that I would find ranges useful for "clipping" content out of > other documents for inclusion in summary docs. Sure, you could do this just fine with XPointer's (via XPath's) substring function or by string manipulation on the DOM. You don't need ranges for this sensible use-case. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From paul@prescod.net Thu Jun 29 20:22:58 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 29 Jun 2000 12:22:58 -0700 Subject: [XML-SIG] XML in Python 1.6 (PROPOSAL) References: <14683.28738.505210.766315@cj42289-a.reston1.va.home.com> <200006291759.LAA12609@localhost.localdomain> <14683.37884.553770.644120@cj42289-a.reston1.va.home.com> Message-ID: <395BA212.6137AA0C@prescod.net> "Fred L. Drake, Jr." wrote: > > I'm expecting Lars to provide SAX 2 support "out of the box" for > Python 1.6. Most of it should be in that package I sent you today. Lars did most of the work but I finished up by cutting out some things I thought were not "core"-worthy (in terms of the 80/20 rule). -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From paul@prescod.net Thu Jun 29 20:31:29 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 29 Jun 2000 12:31:29 -0700 Subject: [XML-SIG] XML in Python 1.6 (PROPOSAL) References: <14683.28738.505210.766315@cj42289-a.reston1.va.home.com> <200045835.962272720@serrano.infoseek.com> <14683.34164.967706.461751@cj42289-a.reston1.va.home.com> <20000629141449.J29661@kronos.cnri.reston.va.us> Message-ID: <395BA411.9D7CD493@prescod.net> Andrew Kuchling wrote: > > "xmlextra" is fine. It would be nice to invent a way to automatically > add things to the xml.* namespace, but that seems difficult and > error-prone. As an aside: couldn't Python's package mechanism union the contents of the various packages the way Java does? Perhaps it is a little harder with the __init__.py stuff.... > The name change simplifies a good many things, and we don't want to > break things *after* 1.6, so let's get it over with now. My great > fear is breaking existing code and what's worse, Sean's book. (I'll > have to buy a copy and see what would need to be preserved to keep the > examples running.) Well, Pyxie uses pyexpat, which we probably broke when we "fixed" the attribute handling to use a dictionary instead of a list and it can also use SAX1, which we would break if we renamed the package. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From fdrake@beopen.com Thu Jun 29 20:58:12 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Thu, 29 Jun 2000 15:58:12 -0400 (EDT) Subject: [XML-SIG] XML in Python 1.6 (PROPOSAL) In-Reply-To: <395BA212.6137AA0C@prescod.net> References: <14683.28738.505210.766315@cj42289-a.reston1.va.home.com> <200006291759.LAA12609@localhost.localdomain> <14683.37884.553770.644120@cj42289-a.reston1.va.home.com> <395BA212.6137AA0C@prescod.net> Message-ID: <14683.43604.454442.100574@cj42289-a.reston1.va.home.com> Paul Prescod writes: > Most of it should be in that package I sent you today. Lars did most of > the work but I finished up by cutting out some things I thought were not > "core"-worthy (in terms of the 80/20 rule). Sounds good -- I'm sure Lars will tell us if we got it wrong. ;) -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From fdrake@beopen.com Thu Jun 29 21:02:04 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Thu, 29 Jun 2000 16:02:04 -0400 (EDT) Subject: [XML-SIG] XML in Python 1.6 (PROPOSAL) In-Reply-To: <395BA411.9D7CD493@prescod.net> References: <14683.28738.505210.766315@cj42289-a.reston1.va.home.com> <200045835.962272720@serrano.infoseek.com> <14683.34164.967706.461751@cj42289-a.reston1.va.home.com> <20000629141449.J29661@kronos.cnri.reston.va.us> <395BA411.9D7CD493@prescod.net> Message-ID: <14683.43836.688462.146274@cj42289-a.reston1.va.home.com> Paul Prescod writes: > As an aside: couldn't Python's package mechanism union the contents of > the various packages the way Java does? Perhaps it is a little harder > with the __init__.py stuff.... I think this is something to seriously consider for Python 1.7. I don't think it's so hard technically, but there are a couple of issues. (One is: if the first directory found doesn't have __init__.py, and the second one does, is the first part of the package? Currently, it's not.) This should be brought up on python-dev after 1.6 has been released; we don't have time to deal with that much mail before then! ;) -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From uogbuji@fourthought.com Thu Jun 29 21:31:24 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 29 Jun 2000 14:31:24 -0600 Subject: [XML-SIG] XML in Python 1.6 (PROPOSAL) In-Reply-To: Message from "Fred L. Drake, Jr." of "Thu, 29 Jun 2000 14:22:52 EDT." <14683.37884.553770.644120@cj42289-a.reston1.va.home.com> Message-ID: <200006292031.OAA13312@localhost.localdomain> > > Uche Ogbuji writes: > > Unless someone has done substantial work on a driver to create a > > DOM from a SAX parser (4DOM's won't do as they aren't up to the > > latest Sax2 yet), I'd say we leave this out until Python 1.7. > > I'm expecting Lars to provide SAX 2 support "out of the box" for > Python 1.6. If you are answering about our ability to use 4DOM's Sax2, OK, we could get it in line with Lars's latest (not that much is different from our Sax2 draft besides the namespace stuff). But I could also understand your response to indicate that Lars would be shipping a DOM builder with SAX2, or am I muddling things up? > > I think we should continue to put all XML things in the xml package. Why > > muddy the waters? Why can't the PyXML package just know to add its extras > > into the existing xml package? > > The catch is that *requires* that PyXML either be installed on top > of the Python distribution (since the xml package can only exist in > one place in the directory tree), or that it include all the standard > portions of the xml package. I'm not real happy with either > approach. All approaches seem kludgy to me as well. I guess I'm not so worried as long as whatever tricks we use, the final import for core Python _and_ xml-sig is "xml". > Another possibility may be to have PyXML provide a package > "xmlextra" (or whatever), and xml.__init__ can check for it's presence > and "incorporate" it somehow: > > ------------------------------------------------------------ > try: > import xmlextra > except ImportError: > # not present > pass > else: > # add new subpackages > __path__.append(os.path.dirname(xmlextra.__file__)) > del xmlextra > ------------------------------------------------------------ > > There are two problems with this approach: > > 1. The packages/modules provided by xmlextra are importable by two > names: xml. and xmlextra.. I'm not entirely sure how > much we should care about this one. I wouldn't worry about it: we can either not publish the xmlextra package, or we can explicitly forbid using it in the docs. > 2. Old versions of xmlextra may be installed in a user's (or > application's) private area, but a more recent version of > Python's standard xml package is installed (with a more recent > version of Python). Again, I'm not sure how relevant this is, > but it does offer potential breakage. This sort of version skew could happen in many situations and I'm not sure we should lose sleep over it. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From fdrake@beopen.com Thu Jun 29 21:41:13 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Thu, 29 Jun 2000 16:41:13 -0400 (EDT) Subject: [XML-SIG] XML in Python 1.6 (PROPOSAL) In-Reply-To: <200006292031.OAA13312@localhost.localdomain> References: <14683.37884.553770.644120@cj42289-a.reston1.va.home.com> <200006292031.OAA13312@localhost.localdomain> Message-ID: <14683.46185.585569.1847@cj42289-a.reston1.va.home.com> Uche Ogbuji writes: > But I could also understand your response to indicate that Lars would be > shipping a DOM builder with SAX2, or am I muddling things up? You're muddling it up; sorry! :) > All approaches seem kludgy to me as well. I guess I'm not so > worried as long as whatever tricks we use, the final import for > core Python _and_ xml-sig is "xml". Avoiding the kludginess here is exactly the hard part! -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From sean@digitome.com Thu Jun 29 21:25:46 2000 From: sean@digitome.com (Sean McGrath) Date: Thu, 29 Jun 2000 21:25:46 +0100 Subject: [XML-SIG] Pyxie In-Reply-To: <395B773B.74E5007F@prescod.net> References: <14680.59651.130151.140876@fermi.eeel.nist.gov> <00062823431300.11217@quadra.teleo.net> <395B536F.5B2E2ADC@prescod.net> Message-ID: <3.0.6.32.20000629212546.009ed100@gpo.iol.ie> At 09:20 AM 6/29/00 -0700, Paul Prescod wrote: >Ken MacLeod wrote: >> >> The parsers on the Pyx site are expat and rxp ('xmln' and 'xmlv'; >> non-validating and validating, respectively), so they are fast and >> small. > [Paul Prescod] >But you have to run them in a different process and use pipes and >"double parsing." For a long time I couldn't understand why you would >choose to do that and then I realized that the real reason is to allow >Pyxie to use a pull model rather than a push model of parsing. > This is not correct. Pyxie uses pyexpat directly. xmln and xmlv are separate, utility programs, intended for command line use in pipeline XML processing. >But Fredrick has shown that you can do pull parsing without a separate >executable so I don't really see the benefit in pyx anymore. If you want >a simplified XML interchange format, you might as well use canonical >XML: You cannot use canonical XML to achieve what PYX achieves. PYX is a line oriented, utterly trivial representation of the logical structure of an XML instance. Canonical XML is neither of these things. regards, Sean, XML Processing With Python ISBN: 0 13 021119 2 Prentice Hall From sean@digitome.com Thu Jun 29 21:37:04 2000 From: sean@digitome.com (Sean McGrath) Date: Thu, 29 Jun 2000 21:37:04 +0100 Subject: [XML-SIG] Pyxie In-Reply-To: <395B994E.6A1EC40C@prescod.net> References: <14680.59651.130151.140876@fermi.eeel.nist.gov> <00062823431300.11217@quadra.teleo.net> <395B536F.5B2E2ADC@prescod.net> <395B773B.74E5007F@prescod.net> Message-ID: <3.0.6.32.20000629213704.009ee880@gpo.iol.ie> [Paul Prescod] >I still claim that we could improve upon pyx by making it a >line-oriented *xml subset*. Then you could pipe through both >XML-understanding and XML-stupid tools. Several of James Clark's tools >already produce that kind of thing. I toyed with this in the early days of Pyxie but (perhaps ironically) dropped it because it was to far removed from James Clarks ESIS notation! The idea of a simplified XML that would be its own canonical notation is an idea we discussed some months ago on the SML list. I am very much a fan of the idea of a line-oriented XML subset. (My eyebtows are still singed from the flaming I got defending the idea of line oriented markup on the W3C XML SIG all those years ago...) regards, Sean, XML Processing With Python ISBN: 0 13 021119 2 Prentice Hall From akuchlin@mems-exchange.org Thu Jun 29 22:15:16 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Thu, 29 Jun 2000 17:15:16 -0400 Subject: [XML-SIG] XML in Python 1.6 (PROPOSAL) In-Reply-To: <200006292031.OAA13312@localhost.localdomain> References: <200006292031.OAA13312@localhost.localdomain> Message-ID: <20000629171516.L29661@kronos.cnri.reston.va.us> On Thu, Jun 29, 2000 at 02:31:24PM -0600, Uche Ogbuji wrote: >All approaches seem kludgy to me as well. I guess I'm not so worried as long >as whatever tricks we use, the final import for core Python _and_ xml-sig is >"xml". I'm not sure it's possible to have the same package name for both *without* import hackery, and more and more I'm believing that import hackery is evil. MAL does it in some of his mx extensions, Zope does it in a few places, and it's evil in both cases, leading to hard-to-debug problems. ("Why isn't this module importing? Go trace through the code...") --amk From sean@digitome.com Thu Jun 29 22:30:40 2000 From: sean@digitome.com (Sean McGrath) Date: Thu, 29 Jun 2000 22:30:40 +0100 Subject: [XML-SIG] Reconsidering the DOM AP Message-ID: <3.0.6.32.20000629223040.009eda80@www.digitome.com> [Ken MacLeod] >Looking at Pyxie.py, I >don't see why Pyxie could not have used Python SAX and DOM and have >been just as simple. As far as I can tell, Pyxie is merely different. >This is not an argument against Pyxie's convenience functions which >make Pyxie so easy to use and what draw so many people to it! My >concern is that this could have easily been done _with_ SAX and DOM >and avoided unnecessary incompatibility. For the event handling stuff, the principle difference is just down to the convenience of event handlers named after element type names. If it were just the event-oriented stuff, then Pyxie would not offer enough to drag even me away from the industry APIs:-) The big differences come in the tree process stuff which is what I personally use day in day out. 1)Pyxie uses a "cursor" location metaphor and a cut/paste approach which is very different from the DOM. I find the Pyxie approach more natural than the DOM approach. 2)Pyxie blends the ease of use of tree-oriented processing with the memory efficiency of event-oriented processing using a sparse-tree facility. This is no such facility in industry APIs (that I know of). 3)Pyxie allows you to mix logical navigation with parsing and content insertion in a way I find very useful in my day to day work. This sort of thing:- T1.Home() # Root of tree T1 T2 = T3.Cut() # Cut branch out of T3 T1.PasteDown() # Paste into T1 T1.Down # First child of T1 # Paste in the current time as a element T1.PasteDown (StringxTree("%s" % time.ctime(time.time())) I naturally think in terms of cut/paste when doing tree transformations. The Pyxie API gives me a simple syntax to express cut/paste-oriented algorithms in. 4) Pyxie is unashamedly focused on the logical model of XML documents. It does not concern itself with general entity references, DTD info etc. etc. Pyxie achieves API simplicity by purposely leaving a lot of things out:-) I happen to believe that it keeps all the important stuff. This is a controversial opinion. regards, Sean, XML Processing With Python ISBN: 0 13 021119 2 Prentice Hall From paul@prescod.net Thu Jun 29 23:00:26 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 29 Jun 2000 17:00:26 -0500 Subject: [XML-SIG] Pyxie References: <14680.59651.130151.140876@fermi.eeel.nist.gov> <00062823431300.11217@quadra.teleo.net> <395B536F.5B2E2ADC@prescod.net> <3.0.6.32.20000629212546.009ed100@gpo.iol.ie> Message-ID: <395BC6FA.EDC11FE8@prescod.net> Sean McGrath wrote: > > ... > > This is not correct. Pyxie uses pyexpat directly. xmln and > xmlv are separate, utility programs, intended for command > line use in pipeline XML processing. Then in what sense are "pyx" and "pyxie" related? Are they just two independent things under a single banner? -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From sean@digitome.com Thu Jun 29 22:46:41 2000 From: sean@digitome.com (Sean McGrath) Date: Thu, 29 Jun 2000 22:46:41 +0100 Subject: [XML-SIG] Pyxie In-Reply-To: <395BC6FA.EDC11FE8@prescod.net> References: <14680.59651.130151.140876@fermi.eeel.nist.gov> <00062823431300.11217@quadra.teleo.net> <395B536F.5B2E2ADC@prescod.net> <3.0.6.32.20000629212546.009ed100@gpo.iol.ie> Message-ID: <3.0.6.32.20000629224641.009e6e90@gpo.iol.ie> [Paul Prescod] >Then in what sense are "pyx" and "pyxie" related? Are they just two >independent things under a single banner? > They are tightly related but can be used independantly. (For example, I recently was involved in getting Pyxie working with Python 1.4 on an IBM mainframe using a Java based PYX generator!). The common thread is the PYX notation. Pyxie is a library that is based on manipulating the notation. XMLN and XMLV are two standalone utilities for generating the notation from well-formed and valid (respectively) XML instances. regards, Sean, XML Processing With Python ISBN: 0 13 021119 2 Prentice Hall From paul@prescod.net Thu Jun 29 23:22:42 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 29 Jun 2000 17:22:42 -0500 Subject: [XML-SIG] Pyxie References: <14680.59651.130151.140876@fermi.eeel.nist.gov> <00062823431300.11217@quadra.teleo.net> <395B536F.5B2E2ADC@prescod.net> <3.0.6.32.20000629212546.009ed100@gpo.iol.ie> <3.0.6.32.20000629224641.009e6e90@gpo.iol.ie> Message-ID: <395BCC32.42F8F7E1@prescod.net> Sean McGrath wrote: > > ... > > The common thread is the PYX notation. Pyxie is a library > that is based on manipulating the notation. I still don't understand. You just said that Pyxie can work directly from the output of pyexat. It seems that you could have a perfectly useful Pyxie app that doesn't use pyx, right? And a perfectly useful PYX app that doesn't use Pyxie (or Python at all). Perhaps you could expand on your statement that Pyxie is for "manipulating the notation." Does pyxie always use pyx as a) input b) output c) internal data structure -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From xml-sig@teleo.net Thu Jun 29 23:24:45 2000 From: xml-sig@teleo.net (Patrick Phalen) Date: Thu, 29 Jun 2000 15:24:45 -0700 Subject: [XML-SIG] Pyxie In-Reply-To: <3.0.6.32.20000629224641.009e6e90@gpo.iol.ie> References: <14680.59651.130151.140876@fermi.eeel.nist.gov> <3.0.6.32.20000629212546.009ed100@gpo.iol.ie> <3.0.6.32.20000629224641.009e6e90@gpo.iol.ie> Message-ID: <00062915312800.11470@quadra.teleo.net> [Sean McGrath, on Thu, 29 Jun 2000] :: [Paul Prescod] :: >Then in what sense are "pyx" and "pyxie" related? Are they just two :: >independent things under a single banner? :: > :: :: They are tightly related but can be used independantly. (For :: example, I recently was involved in getting Pyxie working with :: Python 1.4 on an IBM mainframe using a Java based PYX generator!). :: :: The common thread is the PYX notation. Pyxie is a library :: that is based on manipulating the notation. XMLN and XMLV :: are two standalone utilities for generating the :: notation from well-formed and valid (respectively) XML :: instances. PYX appears to have some general appeal. In March, Shawn published an article on Pyxie in xml.com and within a few days, readers had been spurred to write PYX processors in Perl and Java. http://www.xml.com/pub/2000/03/15/feature/index.html http://www.xml.com/pub/2000/03/22/pyxie/index.html From ken@bitsko.slc.ut.us Fri Jun 30 00:02:25 2000 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 29 Jun 2000 18:02:25 -0500 Subject: [XML-SIG] Pyxie In-Reply-To: Patrick Phalen's message of "Thu, 29 Jun 2000 15:24:45 -0700" References: <14680.59651.130151.140876@fermi.eeel.nist.gov> <3.0.6.32.20000629212546.009ed100@gpo.iol.ie> <3.0.6.32.20000629224641.009e6e90@gpo.iol.ie> <00062915312800.11470@quadra.teleo.net> Message-ID: Patrick Phalen writes: > PYX appears to have some general appeal. In March, Shawn published > an article on Pyxie in xml.com and within a few days, readers had been > spurred to write PYX processors in Perl and Java. > > http://www.xml.com/pub/2000/03/15/feature/index.html > http://www.xml.com/pub/2000/03/22/pyxie/index.html The appeal is not being questioned! Both the PYX format and the Pyxie library's value-add functions are great. I think the only big remaining question is why the value-add wasn't on top of SAX and DOM -- more on that in reply to Sean's message... -- Ken From ken@bitsko.slc.ut.us Fri Jun 30 00:54:08 2000 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 29 Jun 2000 18:54:08 -0500 Subject: [XML-SIG] Reconsidering the DOM AP In-Reply-To: Sean McGrath's message of "Thu, 29 Jun 2000 22:30:40 +0100" References: <3.0.6.32.20000629223040.009eda80@www.digitome.com> Message-ID: Sean McGrath writes: > [Ken MacLeod] > > >Looking at Pyxie.py, I > >don't see why Pyxie could not have used Python SAX and DOM and have > >been just as simple. As far as I can tell, Pyxie is merely different. > > >This is not an argument against Pyxie's convenience functions which > >make Pyxie so easy to use and what draw so many people to it! My > >concern is that this could have easily been done _with_ SAX and DOM > >and avoided unnecessary incompatibility. > > For the event handling stuff [...] the convenience of event handlers > named after element type names. > The big differences come in the tree process stuff which > is what I personally use day in day out. > > 1)Pyxie uses a "cursor" location metaphor and a > cut/paste approach [...] > 2)Pyxie blends the ease of use of tree-oriented processing > with the memory efficiency of event-oriented processing > using a sparse-tree facility. [...] > 3)Pyxie allows you to mix logical navigation with > parsing and content insertion in a way I find > very useful in my day to day work. [...] > 4) Pyxie is unashamedly focused on the logical > model of XML documents. It does not concern itself > with general entity references, DTD info etc. etc. Considering the amount of effort already in this module, it would definitely be argumentative of me to try to "convince" you that this could be done, possibly just as easily to use, over SAX and DOM (using pulldom, a SAX filter, or similar, for example). What I would do is ask that the next time someone is looking for a good module to write would be to take a look at pulldom (SAX+[mini]DOM) and consider writing a SAX filter that has all the Pyxie value-add. Effectively, this would be Pyxie but using SAX/DOM. In this way, we could wait until there's viable proof that it would work as simply and then we could look at merging these very valuable ideas back together in to a more cohesive whole. -- Ken From mgushee@havenrock.com Fri Jun 30 02:22:00 2000 From: mgushee@havenrock.com (Matt Gushee) Date: Thu, 29 Jun 2000 21:22:00 -0400 (EDT) Subject: [XML-SIG] soaplib errors In-Reply-To: <004801bfe1ab$c5f3a700$f2a6b5d4@hagrid> References: <14682.6238.219244.669161@kirin.architag.com> <004801bfe1ab$c5f3a700$f2a6b5d4@hagrid> Message-ID: <14683.63033.25949.577349@kirin.architag.com> Fredrik Lundh writes: > > I thought I'd try out soaplib 0.8. But apparently I'm missing > > something or haven't set it up correctly. When I run the test > > programs, 'test2.py' works, but the other two fail. > > my fault -- those scripts shouldn't have been in the distribution. Thank you! That puts my mind at ease. From sean@digitome.com Fri Jun 30 06:23:09 2000 From: sean@digitome.com (Sean McGrath) Date: Fri, 30 Jun 2000 06:23:09 +0100 Subject: [XML-SIG] Pyxie In-Reply-To: <395BCC32.42F8F7E1@prescod.net> References: <14680.59651.130151.140876@fermi.eeel.nist.gov> <00062823431300.11217@quadra.teleo.net> <395B536F.5B2E2ADC@prescod.net> <3.0.6.32.20000629212546.009ed100@gpo.iol.ie> <3.0.6.32.20000629224641.009e6e90@gpo.iol.ie> Message-ID: <3.0.6.32.20000630062309.00a55100@www.digitome.com> At 05:22 PM 6/29/00 -0500, Paul Prescod wrote: >Sean McGrath wrote: >> >> ... >> >> The common thread is the PYX notation. Pyxie is a library >> that is based on manipulating the notation. > >I still don't understand. > >You just said that Pyxie can work directly from the output of pyexat. Yes. Internally, in order to avoid the unnecessary overhead of forking a subprocess, Pyxie uses Pyexpat to parse XML and create a PYX stream. > It >seems that you could have a perfectly useful Pyxie app that doesn't use >pyx, right? No. The tree builder and event dispatcher need to be fed data in PYX notation. How you get the PYX data stream is up to you. For example, you could generate PYX from an Access database using COM and feed it to the event dispatcher. ? And a perfectly useful PYX app that doesn't use Pyxie (or >Python at all). Yes. This part is true. For example, this is from Matt Sergent:- pyx | grep ^- | perl -pe 's/^-//; s/\\n\n//;' | diction See http://www.xml.org/archives/xml-dev/2000/05/0216.html. > >Perhaps you could expand on your statement that Pyxie is for >"manipulating the notation." Does pyxie always use pyx as > > a) input > b) output > c) internal data structure > Pyxie uses PYX as input. It can generate PYX as output more more typically will generate XML as output. PYX is not used as an internal data structure. I hope this clarifies matters. regards, Sean, XML Processing With Python ISBN: 0 13 021119 2 Prentice Hall From sean@digitome.com Fri Jun 30 06:27:13 2000 From: sean@digitome.com (Sean McGrath) Date: Fri, 30 Jun 2000 06:27:13 +0100 Subject: [XML-SIG] Reconsidering the DOM AP In-Reply-To: References: <3.0.6.32.20000629223040.009eda80@www.digitome.com> Message-ID: <3.0.6.32.20000630062713.00a57100@www.digitome.com> At 06:54 PM 6/29/00 -0500, Ken MacLeod wrote: > >Considering the amount of effort already in this module, it would >definitely be argumentative of me to try to "convince" you that this >could be done, possibly just as easily to use, over SAX and DOM (using >pulldom, a SAX filter, or similar, for example). > >What I would do is ask that the next time someone is looking for a >good module to write would be to take a look at pulldom >(SAX+[mini]DOM) and consider writing a SAX filter that has all the >Pyxie value-add. Effectively, this would be Pyxie but using SAX/DOM. > >In this way, we could wait until there's viable proof that it would >work as simply and then we could look at merging these very valuable >ideas back together in to a more cohesive whole. > Alternatively we could be more proactive than this. Why don't we ask the community for half a dozen XML processing "use cases"? I hereby offer to implement them using Pyxie so that the implementation can be compared with implementations using other approaches. I think the code samples (plus the discussion that would ensue) would make a valuable contribution to the XML-SIG materials. regards, Sean, XML Processing With Python ISBN: 0 13 021119 2 Prentice Hall From larsga@garshol.priv.no Fri Jun 30 09:17:54 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 30 Jun 2000 10:17:54 +0200 Subject: [XML-SIG] XML in Python 1.6 (PROPOSAL) In-Reply-To: <14683.28738.505210.766315@cj42289-a.reston1.va.home.com> References: <14683.28738.505210.766315@cj42289-a.reston1.va.home.com> Message-ID: * Fred L. Drake, Jr. | | 1. Create a new package in the standard library, with the following | structure: | | xml/ | dom/ | __init__ # provides parse(), parseFile(), | # and Document | minidom # Paul's basic DOM 1 + namespaces | # implementation | ??? # driver to load a DOM from a SAX parser? Should be there, I think. Little point in having a DOM implementation that can't load from XML documents. | parsers/ | expat # Python Expat wrapper with namespace | # support IMHO we should use the namespace support that is built-in to expat. Anything else is bound to slow us down. | sax/ | __init__ # provides parse(), parseFile(), and | # some classes from the handler module Whoops! parseFile() no longer exists! We now use the InputSource class instead. | saxutils # pretty much the same as now Probably not. There's a lot of SAX 1.0 legacy there now. That would need to be removed. The basic structure looks good to me, however. --Lars M. From paul@prescod.net Fri Jun 30 10:33:42 2000 From: paul@prescod.net (Paul Prescod) Date: Fri, 30 Jun 2000 04:33:42 -0500 Subject: [XML-SIG] Pyxie Message-ID: <395C6976.CD996C03@prescod.net> [from my vantage point the Internet is doing strange things right now but I'll give it a try anyhow] > >You just said that Pyxie can work directly from the output of pyexat. > > Yes. Internally, in order to avoid the unnecessary overhead > of forking a subprocess, Pyxie uses Pyexpat to parse > XML and create a PYX stream. This is the "double parsing" I mentioned. If Pyxie is parsing a one gigabyte document (as an extreme example) it needs 1 gigabyte of extra disk space for its tempfile. Fredrick's pull parsing technique can eliminate this and eliminate the need to use pyx internally. With pulldom, I can parse a gigabyte document with 0 bytes free disk space and as little as 1K of RAM (above and beyond that required by Python+modules). Python optimization is a tricky issue but I think that even in the case of small files, the fact that you don't do double the disk IO should make the pulldom approach more efficient. And to the end user there is no difference. Also, the pull approach can be used in a streaming environment, You can download the gigabyte document over a 300baud modem and get "output" immediately. In short, PYX is okay as an XML normalization syntax (though I would prefer a line-oriented XML subset) but I still do not believe that it needs to be the core of the Pyxie XML processing library. I bet I could rewrite Pyxie without using PYX internally and Pyxie's users would would not know that I had done so (except that they would get less disk IO). Sometime after Python 1.6 is shipping, I'll implement this to demonstrate. > > It > >seems that you could have a perfectly useful Pyxie app that doesn't use > >pyx, right? > > No. The tree builder and event dispatcher need to be fed > data in PYX notation. How you get the PYX data stream > is up to you. For example, you could generate PYX from > an Access database using COM and feed it to the event > dispatcher. Why would you generate PYX rather than XML? If we start moving PYX between XML-aware programs, it becomes an XML competitor. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From ken@bitsko.slc.ut.us Fri Jun 30 13:21:27 2000 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 30 Jun 2000 07:21:27 -0500 Subject: [XML-SIG] Pyxie In-Reply-To: Paul Prescod's message of "Fri, 30 Jun 2000 04:33:42 -0500" References: <395C6976.CD996C03@prescod.net> Message-ID: Paul Prescod writes: > [from my vantage point the Internet is doing strange things right now > but I'll give it a try anyhow] > > > >You just said that Pyxie can work directly from the output of pyexat. > > > > Yes. Internally, in order to avoid the unnecessary overhead > > of forking a subprocess, Pyxie uses Pyexpat to parse > > XML and create a PYX stream. > > This is the "double parsing" I mentioned. If Pyxie is parsing a one > gigabyte document (as an extreme example) it needs 1 gigabyte of > extra disk space for its tempfile. Fredrick's pull parsing technique > can eliminate this and eliminate the need to use pyx > internally. With pulldom, I can parse a gigabyte document with 0 > bytes free disk space and as little as 1K of RAM (above and beyond > that required by Python+modules). Sean, is "create a PYX stream" correct? I read between the lines there and assumed Pyxie used pyexpat to parse the XML and "create PYX [events]", so no subprocess was used. In which case the double parse isn't happening. -- Ken From ken@bitsko.slc.ut.us Fri Jun 30 14:02:19 2000 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 30 Jun 2000 08:02:19 -0500 Subject: [XML-SIG] Reconsidering the DOM AP In-Reply-To: Sean McGrath's message of "Fri, 30 Jun 2000 06:27:13 +0100" References: <3.0.6.32.20000629223040.009eda80@www.digitome.com> <3.0.6.32.20000630062713.00a57100@www.digitome.com> Message-ID: Sean McGrath writes: > Alternatively we could be more proactive than this. Why don't we ask > the community for half a dozen XML processing "use cases"? I hereby > offer to implement them using Pyxie so that the implementation can > be compared with implementations using other approaches. > > I think the code samples (plus the discussion that would ensue) > would make a valuable contribution to the XML-SIG materials. Excellent idea! Some sources of use cases include several of the tutorial articles on xml.com, which often especially beg for a simpler approach! Here's a list of articles we can cull examples from (note some of the freebies there for Pyxie ;-): Note: the emphasis is on what's happing _inside_ Python using the Pyxie API, what happens using PYX outside of Python is assumed to be the same in either case. -- Ken From fdrake@beopen.com Fri Jun 30 14:41:56 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Fri, 30 Jun 2000 09:41:56 -0400 (EDT) Subject: [XML-SIG] XML in Python 1.6 (PROPOSAL) In-Reply-To: References: <14683.28738.505210.766315@cj42289-a.reston1.va.home.com> Message-ID: <14684.41892.554969.327867@cj42289-a.reston1.va.home.com> Lars Marius Garshol writes: > | ??? # driver to load a DOM from a SAX parser? > > Should be there, I think. Little point in having a DOM implementation > that can't load from XML documents. Ah, but the "???" was because I wasn't interested in the name, not questioning whether we needed it! > | parsers/ > | expat # Python Expat wrapper with namespace > | # support > > IMHO we should use the namespace support that is built-in to expat. > Anything else is bound to slow us down. Paul and I talked about this the other day. We have a two-phase approach: use a Python wrapper so we can get both the URI and prefix with the current (stable) Expat, and modify Expat so that this information is reported directly with more recent versions. This lets us set the API and provide the functionality now, and deal with the implementation when Paul has time to make/test the changes to Expat, and get James Clark to accept them -- there's still no desire to fork Expat development. > | sax/ > | __init__ # provides parse(), parseFile(), and > | # some classes from the handler module > > Whoops! parseFile() no longer exists! We now use the InputSource class > instead. > > | saxutils # pretty much the same as now > > Probably not. There's a lot of SAX 1.0 legacy there now. That would > need to be removed. I've checked in the files Paul sent me yesterday; please take a look at the Python CVS and let me know if we need to change anything. Feel free to check in any changes whenever you're ready. We're planning to release the beta tomorrow, so the sooner the better! -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From paul@prescod.net Fri Jun 30 15:25:05 2000 From: paul@prescod.net (Paul Prescod) Date: Fri, 30 Jun 2000 09:25:05 -0500 Subject: [XML-SIG] Re: Pyxie Message-ID: <395CADC1.8914668A@prescod.net> > Sean, is "create a PYX stream" correct? > > I read between the lines there and assumed Pyxie used pyexpat to parse > the XML and "create PYX [events]", so no subprocess was used. In > which case the double parse isn't happening. Pyxie uses a tempfile: http://www.digitome.com/pyxie.py import tempfile tempfilename = tempfile.mktemp() global tfo tfo = open (tempfilename,"w") def StartElementHandler(name,attrs): global tfo tfo.write ("(%s\n" % name) i = 0 while i < len(attrs): tfo.write ("A%s %s\n" % (attrs[i] , attrs[i+1])) i = i + 2 -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From jeremy@beopen.com Fri Jun 30 16:37:28 2000 From: jeremy@beopen.com (Jeremy Hylton) Date: Fri, 30 Jun 2000 11:37:28 -0400 (EDT) Subject: [XML-SIG] XML code that creates cyclic garbage Message-ID: <14684.48824.420224.338942@bitdiddle.concentric.net> I have heard several reports that there is XML-handling code that produces cyclic references that the standard reference counting garbage collection scheme can not collect. Can someone point me at some example code that has this property? I'd like to test the garbage collector that is in Python 2.0. Jeremy From ken@bitsko.slc.ut.us Fri Jun 30 16:36:05 2000 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 30 Jun 2000 10:36:05 -0500 Subject: [XML-SIG] Reconsidering the DOM AP In-Reply-To: Ken MacLeod's message of "29 Jun 2000 18:54:08 -0500" References: <3.0.6.32.20000629223040.009eda80@www.digitome.com> Message-ID: Ken MacLeod writes: > What I would do is ask that the next time someone is looking for a > good module to write would be to take a look at pulldom > (SAX+[mini]DOM) and consider writing a SAX filter that has all the > Pyxie value-add. Effectively, this would be Pyxie but using > SAX/DOM. It looks like we're gonna move forward quickly with a Pyxie-SAX/DOM combo. I'm not expecting anyone to read my mind on what "writing a SAX filter that has all the Pyxie value-add" really means, so if you can't write that module that's OK. What would really save me time is if someone can pick up writing a SAX parser that reads PYX and a SAX handler that writes PYX. These would be very easy modules that anyone could volunteer for. Thanks, -- Ken From paul@prescod.net Fri Jun 30 16:44:04 2000 From: paul@prescod.net (Paul Prescod) Date: Fri, 30 Jun 2000 10:44:04 -0500 Subject: [XML-SIG] Efficient Namespace Handling Message-ID: <395CC044.3EABA518@prescod.net> [still having Internet problems...I probably won't get replies immediately] --- The various APIs all have well-defined ways for handling namespaces of elements and attributes. I want to describe a simple data structure for passing namespace information between APIs. I want it to be as efficient as possible. In namespace mode PyExpat should produce tuples: (URI, localname, rawname) Those should be passed as the "name" parameter to SAX event handlers of the form: def startElement((URI,localname,rawname), attrs):... ... Those can in turn be passed to the DOM createElement method which can check the type of its first parameter and "do the right thing" when it is a tuple. This is more efficient than the DOM's createElementNS method which requires string manipulations. --- The second issue is an efficient way to pass around attributes. Note that I am not talking about how to query or fetch attributes. Just how to pass them around. The obvious representation for an attribute value is ((URI,localname,rawname),value). Tuples and list are much, much faster to create than instances in Python. Java doesn't really have equivalents. I propose that in beta 2 of Python, PyExpat in namespace mode should pass list structures of the form: [((URI,localname,rawname),value),...] I choose not to use dictionaries because it isn't clear whether to index on rawnames or localname/URI tuples. It depends on the application so it is better to build dictionary-based indexes at the application level. If a particular user wants a more friendly data structure they can construct a DOM AttributeList object: def startElement((URI,localname,rawname), attrs):... attrs=xml.dom.AttributeList( attrs ) (as an optimization, AttributeList objects would probably be lazily indexed based on either qname or URI/localname, depending on what the user asked for) -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From akuchlin@mems-exchange.org Fri Jun 30 16:56:49 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Fri, 30 Jun 2000 11:56:49 -0400 Subject: [XML-SIG] XML code that creates cyclic garbage In-Reply-To: <14684.48824.420224.338942@bitdiddle.concentric.net>; from jeremy@beopen.com on Fri, Jun 30, 2000 at 11:37:28AM -0400 References: <14684.48824.420224.338942@bitdiddle.concentric.net> Message-ID: <20000630115649.B19597@kronos.cnri.reston.va.us> On Fri, Jun 30, 2000 at 11:37:28AM -0400, Jeremy Hylton wrote: >garbage collection scheme can not collect. Can someone point me at >some example code that has this property? I'd like to test the >garbage collector that is in Python 2.0. I believe 4DOM uses cyclical references, and requires that you always call a .releaseNode() method to break the cycles. So, you could just try running 4DOM and never calling .releaseNode(); with the GC, it shouldn't leak. The old PyDOM code tried to avoid creating cycles, so it wouldn't make any garbage to be collected (modulo bugs). --amk From uogbuji@fourthought.com Fri Jun 30 17:04:53 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Fri, 30 Jun 2000 10:04:53 -0600 Subject: [XML-SIG] XML code that creates cyclic garbage In-Reply-To: Message from Andrew Kuchling of "Fri, 30 Jun 2000 11:56:49 EDT." <20000630115649.B19597@kronos.cnri.reston.va.us> Message-ID: <200006301604.KAA17302@localhost.localdomain> > On Fri, Jun 30, 2000 at 11:37:28AM -0400, Jeremy Hylton wrote: > >garbage collection scheme can not collect. Can someone point me at > >some example code that has this property? I'd like to test the > >garbage collector that is in Python 2.0. > > I believe 4DOM uses cyclical references, and requires that you always > call a .releaseNode() method to break the cycles. So, you could just > try running 4DOM and never calling .releaseNode(); with the GC, it > shouldn't leak. The old PyDOM code tried to avoid creating cycles, so > it wouldn't make any garbage to be collected (modulo bugs). Yes. You can simply take any of the 4DOM demos, replace the ReleaseNode calls with a del of the corresponding root node, and check that all memory was reclaimed. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From paul@prescod.net Fri Jun 30 17:06:52 2000 From: paul@prescod.net (Paul Prescod) Date: Fri, 30 Jun 2000 11:06:52 -0500 Subject: [XML-SIG] SAX in Python 1.6 Message-ID: <395CC59C.1BA50497@prescod.net> > IMHO we should use the namespace support that is built-in to expat. > Anything else is bound to slow us down. Unfortunately expat's namespaces support is broken from the point of view of SAX and DOM. > Whoops! parseFile() no longer exists! We now use the InputSource class > instead. InputSource seemed like overkill to me. More of a Java-ish type safety thing. I'd appreciate your opinion. In my opinion, parse() should accept a string or a stream. If a string, it should be treated as a URL or filename and opened. We will also provide a convenience method parseString() that parses an XML string (probably by wrapping it in a cStringIO. Also Fred was talking about convenience functions we devised of the form: __init__.py: parse( file, handler=None ): import pyexpat parser=CreateParser() parser.setContentHandler( handler ) parser.parse( file ) parse( string, handler=None ): import pyexpat parser=CreateParser() parser.setContentHandler( handler ) parser.parse( file ) These convenience functions are not in the package we sent up yesterday. > | saxutils # pretty much the same as now > Probably not. There's a lot of SAX 1.0 legacy there now. That would > need to be removed. It has been removed. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From larsga@garshol.priv.no Fri Jun 30 17:26:04 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 30 Jun 2000 18:26:04 +0200 Subject: [XML-SIG] SAX in Python 1.6 In-Reply-To: <395CC59C.1BA50497@prescod.net> References: <395CC59C.1BA50497@prescod.net> Message-ID: * Lars Marius Garshol | | IMHO we should use the namespace support that is built-in to expat. | Anything else is bound to slow us down. * Paul Prescod | | Unfortunately expat's namespaces support is broken from the point of | view of SAX and DOM. I know, but it's much better to simply modify the output from expat (preferably in C source) than to implement namespaces in Python. Remember: we have to map from the 'uri localname' to a tuple for every single tag in the entire XML document. That is going to have an appreciable performance hit if you implement it in Python no matter how well you implement it. If you do something once for every element it has a performance impact. This is done twice, and it's rather complex. * Lars Marius Garshol | | Whoops! parseFile() no longer exists! We now use the InputSource class | instead. * Paul Prescod | | InputSource seemed like overkill to me. More of a Java-ish type | safety thing. I'd appreciate your opinion. This was what I thought initially as well, but it turns out that InputSource is in fact extremely useful. The trouble is that getting a stream is not enough in the general case. You need to know the base URI. You may want to know the public id. You may need to know the encoding. InputSource is very handy in that it bundles all that information in a single object, making both parse(...) and resolveEntity(...) much more elegant than they would otherwise be. | In my opinion, parse() should accept a string or a stream. If a | string, it should be treated as a URL or filename and opened. Accepting a string is what it does right now. Streams I think should not be directly accepted, but a convenience function or method for them is OK. | We will also provide a convenience method parseString() that parses an | XML string (probably by wrapping it in a cStringIO. Sounds good, as did the rest of the mail. --Lars M. From joel@eccelerate.com Fri Jun 30 20:26:20 2000 From: joel@eccelerate.com (Goldstein, Joel) Date: Fri, 30 Jun 2000 15:26:20 -0400 Subject: [XML-SIG] xml parser Message-ID: I would like to use the SAX parser on an NT platform and I'm having trouble doing so. Has anyone done it and if yes would you tell me how? I get an eror saying "No parser found" From sean@digitome.com Fri Jun 30 21:06:01 2000 From: sean@digitome.com (Sean McGrath) Date: Fri, 30 Jun 2000 21:06:01 +0100 Subject: [XML-SIG] Re: Pyxie In-Reply-To: <395C6976.CD996C03@prescod.net> Message-ID: <3.0.6.32.20000630210601.00a596d0@www.digitome.com> [...] > >This is the "double parsing" I mentioned. If Pyxie is parsing a one >gigabyte document (as an extreme example) it needs 1 gigabyte of extra >disk space for its tempfile. Nope. Think pipes. Think os.popen(). [1] You have two choices, use pyexat directly and write the external file. This avoids the sub-process but costs more disk space. Alternatively, live with the sub-process call (hardly an issue these days) and use popen() to create a piped connection to the created PYX. This is a very disk efficient way of doing things. [Other conclusions by Paul that are erroneous given this fundamental misconception about how Pyxie works elided...] > >Why would you generate PYX rather than XML? If we start moving PYX >between XML-aware programs, it becomes an XML competitor. There is obviously a fundamental misconncect here. I don't know what else I can do to explain this to you! PYX is *line oriented*, I pass it through line oriented tools using the Unix pipe philsophy. I cannot do that with XML. Sorry, but I cannot think how I can make this any simpler for you. You seem hell bent on debunking PYX for some reason. You will not succeed. Not because I am very clever (I'm not), but because James Clark is very clever and a legion of SGML developers know the benefit of a line oriented post-parse notation for hierarchical data structures thanks to ESIS. PYX is simply a simplified incarnation of a tried and trusted structured document processing technique. Why are you so hostile to it? [1] The pyxie.py on www.pyxie.org does not use os.popen because of a problem with stderr redirection on NT platforms that I am struggling with. Sean, XML Processing With Python ISBN: 0 13 021119 2 Prentice Hall From sean@digitome.com Fri Jun 30 21:09:23 2000 From: sean@digitome.com (Sean McGrath) Date: Fri, 30 Jun 2000 21:09:23 +0100 Subject: [XML-SIG] Pyxie In-Reply-To: References: <395C6976.CD996C03@prescod.net> Message-ID: <3.0.6.32.20000630210923.00a5cb50@www.digitome.com> [Ken MacLeod] > >Sean, is "create a PYX stream" correct? > >I read between the lines there and assumed Pyxie used pyexpat to parse >the XML and "create PYX [events]", so no subprocess was used. In >which case the double parse isn't happening. > There are two ways to get PYX commonly used in Pyxie apps. 1) Use a subprocess and connect to it with a pipe:- fo = os.popen ("xmln foo.xml") This is highly disk efficient as it can chew multi-gigabyte XML files without any temporary files. 2) Use pyxepat directly and write a temporary file first. This saves a fork() but costs more in terms of disk space. I have had occasion to use both. regards, Sean, XML Processing With Python ISBN: 0 13 021119 2 Prentice Hall From jday@csihq.com Fri Jun 30 21:51:37 2000 From: jday@csihq.com (John Day) Date: Fri, 30 Jun 2000 16:51:37 -0400 Subject: [XML-SIG] Re: Pyxie In-Reply-To: <3.0.6.32.20000630210601.00a596d0@www.digitome.com> References: <395C6976.CD996C03@prescod.net> Message-ID: <4.3.1.0.20000630164505.00afb220@mail.csihq.com> At 09:06 PM 6/30/00 +0100, Sean McGrath wrote: >[...] > > > >PYX is *line oriented*, I pass it through line oriented tools >using the Unix pipe philsophy. I cannot do that with >XML. Sean, I think what Paul et al. are suggesting is a line-oriented subset of XML, like a 1-to-1 mapping to ESIS. That keeps everything looking like XML, but introduces a new problem: the "<" and ">" symbols will have to be heavily escaped to prevent unintended redirection with CLI tools. IMHO: keep the ESIS look&feel. :-) John Day Palm Bay, FL From larsga@garshol.priv.no Fri Jun 30 22:26:47 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 30 Jun 2000 23:26:47 +0200 Subject: [XML-SIG] xml parser In-Reply-To: