From teep@mozart.inet.co.th Mon Jan 3 14:11:09 2000 From: teep@mozart.inet.co.th (Prateep Siamwalla) Date: Mon, 3 Jan 2000 21:11:09 +0700 (GMT+0700) Subject: [XML-SIG] Trouble installing PyXML-0.5.2 Message-ID: Hello pythoners, I have been having some problems installing the latest python xml package (downloaded from the XML-SIG pages) My system is a RedHat 6.0, and I have installed rpms of Python 1.5.2-2 and pythonlib-1.22-5. I downloaded the PyXML 0.5.2 package and extracted to /usr/local/PyXML-0.5.2 I've tried running "make -f Makefile.pre.in boot" and "python setup.py build" from /usr/local/PyXML-0.5.2/ and I seem to be repeatedly running against a wall which reads : make[1]: Entering directory `/usr/local/pythonish/xml-0.5.1' make[1]: *** No rule to make target `/usr/lib/python1.5/config/Makefile', needed by `sedscript'. Stop. make[1]: Leaving directory `/usr/local/pythonish/xml-0.5.1' make: *** [boot] Error 2 Incidentally VERSION=1.5, installdir=/usr, and exec_installdir=/usr I am very new to python and I fear I am doing something completely stupid, hopefully this description is enough for someone to point out my errors, if not, please tell me what other information I should provide. -looking forward to xmling, teep From akuchlin@mems-exchange.org Mon Jan 3 15:10:49 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Mon, 3 Jan 2000 10:10:49 -0500 (EST) Subject: [XML-SIG] Trouble installing PyXML-0.5.2 In-Reply-To: References: Message-ID: <14448.48121.821005.55136@amarok.cnri.reston.va.us> Prateep Siamwalla writes: >My system is a RedHat 6.0, and I have installed rpms of Python 1.5.2-2 and >pythonlib-1.22-5. I downloaded the PyXML 0.5.2 package and extracted to >/usr/local/PyXML-0.5.2 You also need to install the python-devel RPM to get the /usr/lib/python1.5/config/ directory, which contains the files needed to compile new Python extensions. You shouldn't have to extract the files into /usr/local/PyXML-0.5.2, though that shouldn't cause any problems; running 'python setup.py install' should copy all the files into /usr/lib/python1.5/site-packages/xml/ . -- A.M. Kuchling http://starship.python.net/crew/amk/ Every man is wise when attacked by a mad dog; fewer when pursued by a mad woman; only the wisest survive when attacked by a mad notion. -- Robertson Davies, _Marchbanks' Almanac_ From guido@CNRI.Reston.VA.US Mon Jan 3 17:37:59 2000 From: guido@CNRI.Reston.VA.US (Guido van Rossum) Date: Mon, 03 Jan 2000 12:37:59 -0500 Subject: [XML-SIG] Python Conference -- Early Bird Registration ends Jan 5! Message-ID: <200001031737.MAA24342@eric.cnri.reston.va.us> This is the last warning. The conference is getting booked full, don't wait till the last moment! If you haven't registered and paid by January 5, you will paying full price... So, be smart and register NOW. Also don't forget to book your hotel room by January 3 to qualify for the conference rate! Some highlights from the conference program: - 8 tutorials on topics ranging from JPython to Fnorb; - a keynote by Open Source evangelist Eric Raymond; - another by Randy Pausch, father of the Alice Virtual Reality project; - a separate track for Zope developers and users; - live demonstrations of important Python applications; - refereed papers, and short talks on current topics; - a developers' day where the feature set of Python 2.0 is worked out. Our motto, due to Bruce Eckel, is: "Life's better without braces." Come and join us at the Key Bridge Marriott in Rosslyn (across the bridge from Georgetown), January 24-27 in 2000. Make the Python conference the first conference you attend in the new millennium! The early bird registration deadline is January 5. More info: http://www.python.org/workshops/2000-01/ The program is now complete with the titles of all presentations. There is still space in the demo session and in the short talks session. --Guido van Rossum (home page: http://www.python.org/~guido/) From uche.ogbuji@fourthought.com Tue Jan 4 09:14:30 2000 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 04 Jan 2000 02:14:30 -0700 Subject: [XML-SIG] ANN: 4DOM 0.9.1 Message-ID: <3871B9F6.F55BF3B2@fourthought.com> FourThought LLC (http://FourThought.com) announces the release of 4DOM 0.9.1 ----------------------- An XML/HTML Python library using the Document Object Model interface 4DOM is a Python library for XML and HTML processing and manipulation using the W3C's Document Object Model for interface. 4DOM implements DOM Core level 2, HTML level 2 and Level 2 Document Traversal. 4DOM should work on all platforms supported by Python. If you have any problems with a particular platform, please e-mail the authors. 4DOM is designed to allow developers rapidly design applications that read, write or manipulate HTML and XML. News ---- This is a bug-fix release. More info and Obtaining 4DOM ---------------------------- Please see http://FourThought.com/4Suite/4DOM Or you can download 4DOM from ftp://FourThought.com/pub/4Suite/4DOM 4DOM is distributed under a license similar to that of Python. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From uche.ogbuji@fourthought.com Tue Jan 4 09:29:34 2000 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 04 Jan 2000 02:29:34 -0700 Subject: [XML-SIG] ANN: 4XPath and 4XSLT 0.8.1 Message-ID: <3871BD7E.A6AF1AC0@fourthought.com> FourThought LLC (http://FourThought.com) announces the release of 4XSLT and 4XPath 0.8.1 ---------------------- A python implementation of the W3C's XSLT language 4XSLT is an XML transformation processor based on the W3C's specification for the XSLT transform language. 4XPath implements the W3C XPath language for indicating and selecting XML document components. http://www.w3.org/TR/xslt 4XPath implements the full 4XPath recommendation except for the 'lang' core function. Currently, 4XSLT supports a sub-set of the XSLT recommendation including the following: Full expression support and attribute-value template expansion xsl:include xsl:import xsl:template xsl:apply-imports xsl:apply-templates xsl:copy xsl:call-template xsl:if xsl:for-each xsl:choose xsl:element xsl:when xsl:attribute xsl:otherwise xsl:text xsl:message xsl:value-of xsl:variable xsl:processing-instruction xsl:param xsl:comment xsl:with-param xsl:strip-space xsl:key xsl:preserve-space xsl:copy-of xsl:sort xsl:namespace-alias xsl:output and, of course, xsl:stylesheet, xsl:transform, literal elements and text Using the xml output method, 4XSLT produces the result tree by throwing events from the emerging SAX 2 standard to a handler, so it can be easily modified to supply results to any SAX 2 consumer. For the 'html' and 'text' output methods special SAX consumers produce HTML DOM nodes and plain text respectively. News ---- Changes in 0.8.1 ---------------- - 4XSLT implements xsl:xsl:sort and xsl:namespace-alias - 4XSLT now implements template priorities - 4XPath now has a clear DOM-query interface - many big-fixes and more extensive testing More info and Obtaining 4XPath and 4XSLT ---------------------------------------- Please see http://FourThought.com/4Suite/4XPath http://FourThought.com/4Suite/4XSLT Or you can download 4XSLT from ftp://FourThought.com/pub/4Suite/4XPath ftp://FourThought.com/pub/4Suite/4XSLT 4XPath and 4XSLT are distributed under a license similar to that of Python. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From teep@inet.co.th Tue Jan 4 14:12:10 2000 From: teep@inet.co.th (Prateep Siamwalla) Date: Tue, 4 Jan 2000 21:12:10 +0700 Subject: [XML-SIG] Trouble installing PyXML-0.5.2 Message-ID: <001101bf56bd$b3603e60$46a809c0@tarzan> Thanks very much, I've downloaded the extra rpms and have the modules compiled. -teep -----Original Message----- From: Andrew M. Kuchling To: Prateep Siamwalla Cc: xml-sig@python.org Date: Monday, January 03, 2000 10:10 PM Subject: Re: [XML-SIG] Trouble installing PyXML-0.5.2 >Prateep Siamwalla writes: >>My system is a RedHat 6.0, and I have installed rpms of Python 1.5.2-2 and >>pythonlib-1.22-5. I downloaded the PyXML 0.5.2 package and extracted to >>/usr/local/PyXML-0.5.2 > >You also need to install the python-devel RPM to get the >/usr/lib/python1.5/config/ directory, which contains the files needed >to compile new Python extensions. > >You shouldn't have to extract the files into /usr/local/PyXML-0.5.2, >though that shouldn't cause any problems; running 'python setup.py >install' should copy all the files into >/usr/lib/python1.5/site-packages/xml/ . > >-- >A.M. Kuchling http://starship.python.net/crew/amk/ >Every man is wise when attacked by a mad dog; fewer when pursued by a mad >woman; only the wisest survive when attacked by a mad notion. > -- Robertson Davies, _Marchbanks' Almanac_ > > From larsga@garshol.priv.no Wed Jan 5 11:12:07 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: Wed, 5 Jan 2000 12:12:07 +0100 Subject: [XML-SIG] dtddoc: Version 0.11 released! Message-ID: <200001051112.MAA01505@lambda.garshol.priv.no> Changes since version 0.11: The DTD is unchanged. Other than that, the following has changed: * The makeskel tool has been added. * A very experimental DocBook RefEntry backend has been added. * The -t strict option has been added. * dtddoc now checks for the correct xmlproc version. * All reported bugs have been fixed. (Thanks to Stig Erik Sandø, Phong Vu and Alan Karben.) This version is mainly released as a bug fix release, but some users may also find the other changes useful. The home page has moved to a permanent new location, which is: --Lars M. From larsga@garshol.priv.no Wed Jan 5 11:29:33 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 05 Jan 2000 12:29:33 +0100 Subject: [XML-SIG] Future plans In-Reply-To: <199912201809.LAA06565@localhost.localdomain> References: <199912201809.LAA06565@localhost.localdomain> Message-ID: * uche ogbuji | | Lars published a SAX2 module that pretty much covers the ground of | the current status. I've been cajoling the folks on XML-DEV to | finish the SAX2 spec, and things are coming about slowly. I've been trying to follow the current discussion on XML-DEV (of course it had to happen when I'm on vacation), and the plan is to let the dust settle a little on XML-DEV before discussion is started here. There are some things we might want to do differently from the Java folks, so there will probably need to be some discussion. In any case, things are moving along, although slowly. | 4DOM comes with a pretty complete SAX2 -> DOM reader, which is used | by 4XSLT. Do you use my package or do you use the stuff that you put together? --Lars M. From larsga@garshol.priv.no Wed Jan 5 11:34:29 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 05 Jan 2000 12:34:29 +0100 Subject: [XML-SIG] Future plans In-Reply-To: <199912201637.LAA12992@amarok.cnri.reston.va.us> References: <199912201637.LAA12992@amarok.cnri.reston.va.us> Message-ID: * Andrew M. Kuchling | | Some things to do: | | * I propose dropping the wstrop and xmlarch code from the CVS | tree: wstrop because Python 1.6 will have built-in Unicode | support of some strip, and xmlarch because architectual forms | are fairly rarely used, and don't need to be in the core. I agree that wstrop should be dropped. | * What about namespace support in SAX -- what's the status of SAX2? SAX2 will have namespace support, but the actual form of it is uncertain at the moment. I've also been thinking that we may want qualified names to be represented as tuples, either (namespace name (URI), localpart (element type name), prefix) or (namespace name (URI), localpart (element type name)) --Lars M. From uche.ogbuji@fourthought.com Wed Jan 5 16:09:37 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Wed, 05 Jan 2000 09:09:37 -0700 Subject: [XML-SIG] Future plans In-Reply-To: Your message of "05 Jan 2000 12:29:33 +0100." Message-ID: <200001051609.JAA02544@localhost.localdomain> > | 4DOM comes with a pretty complete SAX2 -> DOM reader, which is used > | by 4XSLT. > > Do you use my package or do you use the stuff that you put together? We use your package. We bundled it with 4Suite-base. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From uche.ogbuji@fourthought.com Wed Jan 5 16:13:23 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Wed, 05 Jan 2000 09:13:23 -0700 Subject: [XML-SIG] Future plans In-Reply-To: Your message of "05 Jan 2000 12:34:29 +0100." Message-ID: <200001051613.JAA02566@localhost.localdomain> > | * What about namespace support in SAX -- what's the status of SAX2? > > SAX2 will have namespace support, but the actual form of it is > uncertain at the moment. Actually, it has settled down, and is probably pretty much determined at this point. There was much good debate about it on XML-DEV. > I've also been thinking that we may want > qualified names to be represented as tuples, either > > (namespace name (URI), localpart (element type name), prefix) > > or > > (namespace name (URI), localpart (element type name)) I think it might be more natural to always make it a triple, and simply have '' as the third item when there is no namespace. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From larsga@garshol.priv.no Wed Jan 5 16:30:34 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 05 Jan 2000 17:30:34 +0100 Subject: [XML-SIG] Future plans In-Reply-To: <200001051613.JAA02566@localhost.localdomain> References: <200001051613.JAA02566@localhost.localdomain> Message-ID: * Lars Marius Garshol | | SAX2 will have namespace support, but the actual form of it is | uncertain at the moment. * uche ogbuji | | Actually, it has settled down, and is probably pretty much | determined at this point. There was much good debate about it on | XML-DEV. I've read through the debate, but I've failed to notice any agreement. However, I've printed out the main posts for perusal at home, so maybe I'll find it there. * Lars Marius Garshol | | I've also been thinking that we may want | qualified names to be represented as tuples, either | | (namespace name (URI), localpart (element type name), prefix) | | or | | (namespace name (URI), localpart (element type name)) * uche ogbuji | | I think it might be more natural to always make it a triple, and | simply have '' as the third item when there is no namespace. No prefix, you mean? I agree, but the question is whether we really want the prefix here or whether we should just always use a binary tuple. --Lars M. From gstein@lyra.org Wed Jan 5 22:19:09 2000 From: gstein@lyra.org (Greg Stein) Date: Wed, 5 Jan 2000 14:19:09 -0800 (PST) Subject: [XML-SIG] namespace/localpart tuples (was: Future plans) In-Reply-To: <200001051613.JAA02566@localhost.localdomain> Message-ID: On Wed, 5 Jan 2000 uche.ogbuji@fourthought.com wrote: >... > > I've also been thinking that we may want > > qualified names to be represented as tuples, either > > > > (namespace name (URI), localpart (element type name), prefix) > > > > or > > > > (namespace name (URI), localpart (element type name)) > > I think it might be more natural to always make it a triple, and simply have > '' as the third item when there is no namespace. At processing time, the prefix that was used is irrelevant. It shouldn't be preserved. You could end up in a situation where a client thinks that prefix "should" be used when regenerating XML output... the problem is that it may conflict (say, if you combined a couple XML docs) or not be defined in the (new) output (if you dropped some portion that defined the namespace). IMO, it is much better to regenerate a new set of prefixes for the set of namespace URIs that are present in an XML document. Cheers, -g -- Greg Stein, http://www.lyra.org/ From uche.ogbuji@fourthought.com Wed Jan 5 23:13:46 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Wed, 05 Jan 2000 16:13:46 -0700 Subject: [XML-SIG] namespace/localpart tuples (was: Future plans) In-Reply-To: Your message of "Wed, 05 Jan 2000 14:19:09 PST." Message-ID: <200001052313.QAA03676@localhost.localdomain> > > > I've also been thinking that we may want > > > qualified names to be represented as tuples, either > > > > > > (namespace name (URI), localpart (element type name), prefix) > > > > > > or > > > > > > (namespace name (URI), localpart (element type name)) > > > > I think it might be more natural to always make it a triple, and simply have > > '' as the third item when there is no namespace. > > At processing time, the prefix that was used is irrelevant. It shouldn't > be preserved. The prefix has no semantic value: it is indeed syntactic sugar. However, it is very important to maintain the "principle of least surprise" for users. If a user runs his XSLT stylesheet through a SAX processor and finds that all his "xsl:template" elements have been renamed to "prefix00001:template", he might be very confused indeed. Note that there is at least one case in which the prefix does matter: XSLT uses the prefix to match declared namespaces in the stylesheet to namespaces in the source document. Now many people have already railed against this violation of the spirit of XML Namespaces 1.0, but there is no srguing that it was the most elegant solution to a difficult problem that the XSLT WG faced in dealing with namespaces. So, in short, though prefixes are not technically part of the document, there are good arguments for including them in the SAX binding. > You could end up in a situation where a client thinks that prefix "should" > be used when regenerating XML output... the problem is that it may > conflict (say, if you combined a couple XML docs) or not be defined in the > (new) output (if you dropped some portion that defined the namespace). The best solution to this is education. If the interface documentation clearly states that prefixes are not technically part of the document, hopefully users will avoid mis-using them. This is not ideal, but there's not much better to do given the practical issues involved. > IMO, it is much better to regenerate a new set of prefixes for the set of > namespace URIs that are present in an XML document. Even as a user who knows better about the meaning of prefixes, I would be very annoyed at a processor that did this. I often deal with documented with 4 or more namespaces (this is not too unusual: very common in RDF) and I give my prefixes mnemonic names to help sort things out. I don't want processors renaming them to "p01a3", etc. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From gstein@lyra.org Wed Jan 5 23:30:04 2000 From: gstein@lyra.org (Greg Stein) Date: Wed, 5 Jan 2000 15:30:04 -0800 (PST) Subject: [XML-SIG] namespace/localpart tuples In-Reply-To: <200001052313.QAA03676@localhost.localdomain> Message-ID: On Wed, 5 Jan 2000 uche.ogbuji@fourthought.com wrote: >... > The prefix has no semantic value: it is indeed syntactic sugar. However, it > is very important to maintain the "principle of least surprise" for users. > > If a user runs his XSLT stylesheet through a SAX processor and finds that all > his "xsl:template" elements have been renamed to "prefix00001:template", he > might be very confused indeed. hehe... agreed on that one :-) > Note that there is at least one case in which the prefix does matter: XSLT > uses the prefix to match declared namespaces in the stylesheet to namespaces > in the source document. Now many people have already railed against this > violation of the spirit of XML Namespaces 1.0, but there is no srguing that it > was the most elegant solution to a difficult problem that the XSLT WG faced in > dealing with namespaces. Oh, kee-rist. What dickheads. All right... then there is a reason to keep the prefix. Sigh. The three-tuple containing the prefix should be used since XSLT applies semantic meaning to it. >... > > IMO, it is much better to regenerate a new set of prefixes for the set of > > namespace URIs that are present in an XML document. > > Even as a user who knows better about the meaning of prefixes, I would be very > annoyed at a processor that did this. I often deal with documented with 4 or > more namespaces (this is not too unusual: very common in RDF) and I give my > prefixes mnemonic names to help sort things out. I don't want processors > renaming them to "p01a3", etc. Yah. DAV typically uses a few namespace, too (DAV: itself plus product-specific properties), so I'm familiar with this. If you don't like the renaming, then avoid mod_dav :-) (it renames stuff to things like ns0, ns1, i0, i1, ...). I could explain why, but you probably don't want to hear... hehe Cheers, -g -- Greg Stein, http://www.lyra.org/ From tpassin@idsonline.com Fri Jan 7 03:31:20 2000 From: tpassin@idsonline.com (Thomas B. Passin) Date: Thu, 6 Jan 2000 22:31:20 -0500 Subject: [XML-SIG] Future plans References: <199912201637.LAA12992@amarok.cnri.reston.va.us> Message-ID: <003601bf58c0$aae89e80$de2a08d1@tomshp> Lars Marius Garshol wrote: > > * Andrew M. Kuchling > | > | Some things to do: > | > | * I propose dropping the wstrop and xmlarch code from the CVS > | tree: wstrop because Python 1.6 will have built-in Unicode > | support of some strip, and xmlarch because architectual forms > | are fairly rarely used, and don't need to be in the core. > > I agree that wstrop should be dropped. > > | * What about namespace support in SAX -- what's the status of SAX2? > > SAX2 will have namespace support, but the actual form of it is > uncertain at the moment. I've also been thinking that we may want > qualified names to be represented as tuples, either > > (namespace name (URI), localpart (element type name), prefix) > > or > > (namespace name (URI), localpart (element type name)) > I think we should follow the lead of Megginson and the XML-DEV discussions on whether there should be a separate prefix part - I personally think there should be, but let's follow what they end up with on this. Yes, tuples seem to be the perfect way to do qualified names, no matter how the others want to do them for Java or C++. > --Lars M. > > Regards, Tom Passin From tpassin@idsonline.com Fri Jan 7 03:36:40 2000 From: tpassin@idsonline.com (Thomas B. Passin) Date: Thu, 6 Jan 2000 22:36:40 -0500 Subject: [XML-SIG] namespace/localpart tuples (was: Future plans) References: <200001052313.QAA03676@localhost.localdomain> Message-ID: <003701bf58c0$abed8b60$de2a08d1@tomshp> uche.ogbuji@fourthought.com> wrote: > > > > I've also been thinking that we may want > > > > qualified names to be represented as tuples, either > > > > > > > > (namespace name (URI), localpart (element type name), prefix) > > > > > > > > or > > > > > > > > (namespace name (URI), localpart (element type name)) > > > > > > I think it might be more natural to always make it a triple, and simply have > > > '' as the third item when there is no namespace. > > > > At processing time, the prefix that was used is irrelevant. It shouldn't > > be preserved. > > The prefix has no semantic value: it is indeed syntactic sugar. However, it > is very important to maintain the "principle of least surprise" for users. > > If a user runs his XSLT stylesheet through a SAX processor and finds that all > his "xsl:template" elements have been renamed to "prefix00001:template", he > might be very confused indeed. > > Note that there is at least one case in which the prefix does matter: XSLT > uses the prefix to match declared namespaces in the stylesheet to namespaces > in the source document. Now many people have already railed against this > violation of the spirit of XML Namespaces 1.0, but there is no srguing that it > was the most elegant solution to a difficult problem that the XSLT WG faced in > dealing with namespaces. > > So, in short, though prefixes are not technically part of the document, there > are good arguments for including them in the SAX binding. > > > You could end up in a situation where a client thinks that prefix "should" > > be used when regenerating XML output... the problem is that it may > > conflict (say, if you combined a couple XML docs) or not be defined in the > > (new) output (if you dropped some portion that defined the namespace). > > The best solution to this is education. If the interface documentation > clearly states that prefixes are not technically part of the document, > hopefully users will avoid mis-using them. This is not ideal, but there's not > much better to do given the practical issues involved. > > > IMO, it is much better to regenerate a new set of prefixes for the set of > > namespace URIs that are present in an XML document. > > Even as a user who knows better about the meaning of prefixes, I would be very > annoyed at a processor that did this. I often deal with documented with 4 or > more namespaces (this is not too unusual: very common in RDF) and I give my > prefixes mnemonic names to help sort things out. I don't want processors > renaming them to "p01a3", etc. > > I'm completely with Uche on this - sugar or not, we should preserve the prefixes. After all, your software can always ignore them later if you don't care. And, again as Uche says, the prefixes sometimes are chosen to help document the meaning of the document. Tom Passin From aa8vb@yahoo.com Fri Jan 7 16:23:40 2000 From: aa8vb@yahoo.com (Randall Hopper) Date: Fri, 7 Jan 2000 11:23:40 -0500 Subject: [XML-SIG] XML Writing Tools (& escape_markup) Message-ID: <20000107112340.A27660@vislab.epa.gov> Do any Python tools exist to write (or aid in writing) indented XML? Thanks, -- Randall Hopper aa8vb@yahoo.com From fdrake@acm.org Fri Jan 7 16:48:56 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 7 Jan 2000 11:48:56 -0500 (EST) Subject: [XML-SIG] XML Writing Tools (& escape_markup) In-Reply-To: <20000107112340.A27660@vislab.epa.gov> References: <20000107112340.A27660@vislab.epa.gov> Message-ID: <14454.6392.791758.466469@weyr.cnri.reston.va.us> Randall Hopper writes: > Do any Python tools exist to write (or aid in writing) indented XML? In the CVS repository, check xml.sax.writer. It needs documentation and a bit of work, but my working copy at home is pretty messed up at the moment, and I don't know just when I'll get back to it. ;( On the other hand, if you try it and have any specific suggestions or comments, I'd appreciate hearing them! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From uche.ogbuji@fourthought.com Fri Jan 7 18:28:21 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Fri, 07 Jan 2000 11:28:21 -0700 Subject: [XML-SIG] XML Writing Tools (& escape_markup) In-Reply-To: Your message of "Fri, 07 Jan 2000 11:23:40 EST." <20000107112340.A27660@vislab.epa.gov> Message-ID: <200001071828.LAA03595@localhost.localdomain> > Do any Python tools exist to write (or aid in writing) indented XML? 4DOM (http://Fourthought/4Suite/4DOM) has a pretty-printer which emits indented XML. Here is an example: Pieter Aaron
404 Error Way
404-555-1234 404-555-4321 404-555-5555 pieter.aaron@inter.net
Emeka Ndubuisi
42 Spam Blvd
767-555-7676 767-555-7642 800-SKY-PAGEx767676 endubuisi@spamtron.com
-- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From chapmanb@arches.uga.edu Fri Jan 7 22:50:08 2000 From: chapmanb@arches.uga.edu (Brad Chapman) Date: Fri, 7 Jan 2000 17:50:08 -0500 Subject: [XML-SIG] Removing a set of DOM nodes Message-ID: Hello all! I have a quick question about removing a node and all it's children nodes from a DOM tree. I have an application where I identify a particular node in a DOM tree and then need to remove it and all of its subnodes from the tree. The tree can branch quite far beneath the node I am deleting, so there are many sub-sub...-sub-nodes under the parent node. So far I have just been dealing with this by destroying all of the nodes immmediate child nodes by iterating through them and applying nodeToDelete.removeChild(oldNode), and then deleting the parent node. Although this has the desired affect (gets the node and everything under it out of the tree) I am curious about memory leakage. Are all of the sub-sub...-sub-nodes that were not explicitly deleted removed once they are detached from the main document node, or do they continue to exist? And, if they do need to be explicitly deleted, is there any code already available that does this? (ie. a class inheriting from the Walker class) Thanks in advance for any advice! Brad From dieter@handshake.de Sun Jan 9 10:04:09 2000 From: dieter@handshake.de (Dieter Maurer) Date: Sun, 9 Jan 2000 11:04:09 +0100 (CET) Subject: [XML-SIG] Removing a set of DOM nodes In-Reply-To: References: Message-ID: <14456.23567.554931.949249@lindm.dm> Hello Brad, if you use PyDOM (i.e. the DOM implementation of the XML SIG) then it will be sufficient to just remove the parent of the subtree. PyDOM uses proxies to avoid any cyclic references such that the usual Python garbage collection can clean up without problem. If you use 4DOM, the FourThought people should tell you about the necessary procedure to perform manual garbage collection. I assume, you must call a "destroy" method after the subtree has been removed from the tree. This will take care of any recursion down the tree. Dieter From JKnight496@aol.com Sun Jan 9 20:05:29 2000 From: JKnight496@aol.com (JKnight496@aol.com) Date: Sun, 9 Jan 2000 15:05:29 EST Subject: [XML-SIG] DATABASE PUBLISHING Message-ID: <39.3926f5ad.25aa4409@aol.com> To whom it may conern, I recently checked-out an XML book from a library. Within it, Microsoft ACCESS was an example database that the author used to publish an ACCESS database on line. To do this, he refrenced a "python" script. I have an ACCESS database that I want to publish on-line. Do I need to dowload any type of decompiler to read the python scripting language? Any help would be greatly appreciated!! Many thanks Jared Knight From mgushee@havenrock.com Sun Jan 9 21:54:18 2000 From: mgushee@havenrock.com (Matt Gushee) Date: Sun, 9 Jan 2000 16:54:18 -0500 (EST) Subject: [XML-SIG] DATABASE PUBLISHING In-Reply-To: <39.3926f5ad.25aa4409@aol.com> References: <39.3926f5ad.25aa4409@aol.com> Message-ID: <14457.906.341911.535565@gargle.gargle.HOWL> Hmm ... this type of question is probably best directed to comp.lang.python, rather than XML-SIG. But since we're already here: JKnight496@aol.com writes: > I have an ACCESS database that I want to publish on-line. Do I need to > dowload > any type of decompiler to read the python scripting language? In general, no (I'm not sure such a thing exists for Python, anyway). Python is an interpreted language, which means the executable scripts/programs are plain text. Well, actually, Python programs *can be* distributed in a compiled, therefore unreadable, form, but it's not all that common. And the fact that your author referred to a 'script' as opposed to a 'program' suggests to me that it's almost certainly in plain text form. Best of luck! -- Matt Gushee Portland, Maine, USA mgushee@havenrock.com http://www.havenrock.com/ From tpassin@idsonline.com Mon Jan 10 00:01:35 2000 From: tpassin@idsonline.com (Thomas B. Passin) Date: Sun, 9 Jan 2000 19:01:35 -0500 Subject: [XML-SIG] DATABASE PUBLISHING References: <39.3926f5ad.25aa4409@aol.com> Message-ID: <001701bf5afd$df0166e0$b02a08d1@tomshp> wrote > > To whom it may conern, > > I recently checked-out an XML book from a library. Within it, Microsoft > ACCESS > was an example database that the author used to publish an ACCESS database > on line. To do this, he refrenced a "python" script. > > I have an ACCESS database that I want to publish on-line. Do I need to > dowload > any type of decompiler to read the python scripting language? > > Any help would be greatly appreciated!! > Python scripts are usually text (*.py), although they could be compiled (*.pyc). If your book references a python script, the script may require another python package or module to work. You can tell what is needed by looking at the "import" statements. Most import statements call for standard modules that come with python, but if you see one called, for example, mxODBC, or most anything with "ODBC", that is one you'd have to get. (This particular one you can find at the Database SIG page on the www.python.org site). What is the book you mentioned? Tom Passin From uche.ogbuji@fourthought.com Mon Jan 10 00:28:02 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Sun, 09 Jan 2000 17:28:02 -0700 Subject: [XML-SIG] Removing a set of DOM nodes In-Reply-To: Your message of "Sun, 09 Jan 2000 11:04:09 +0100." <14456.23567.554931.949249@lindm.dm> Message-ID: <200001100028.RAA18892@localhost.localdomain> I think he's using PyDOM, because near the end of his message he spoke about the "DOM Walker". 4DOM instead has a generic Visitor pattern implementation (although a simple document-order visitor comes with it). > if you use PyDOM (i.e. the DOM implementation of the XML SIG) > then it will be sufficient to just remove the parent of the > subtree. PyDOM uses proxies to avoid any cyclic references > such that the usual Python garbage collection can clean > up without problem. > > If you use 4DOM, the FourThought people should tell you > about the necessary procedure to perform manual garbage > collection. > I assume, you must call a "destroy" method after the > subtree has been removed from the tree. This will > take care of any recursion down the tree. Yes. You just call ReleaseNode() on the node when you're done with it and it will release the node and all descendants. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From larsga@garshol.priv.no Tue Jan 11 17:10:05 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 11 Jan 2000 18:10:05 +0100 Subject: [XML-SIG] Future plans In-Reply-To: <003601bf58c0$aae89e80$de2a08d1@tomshp> References: <199912201637.LAA12992@amarok.cnri.reston.va.us> <003601bf58c0$aae89e80$de2a08d1@tomshp> Message-ID: * Thomas B. Passin | | I think we should follow the lead of Megginson and the XML-DEV | discussions on whether there should be a separate prefix part - I | personally think there should be, but let's follow what they end up | with on this. They have decided that the prefix should be made available. | Yes, tuples seem to be the perfect way to do qualified names, no | matter how the others want to do them for Java or C++. They do seem attractive, but the trouble is that if we include the prefix in the tuples, then the same name with different prefixes will not be equal. We will have to solve this somehow, perhaps by making incompatible changes the way they will for Java. --Lars M. From larsga@garshol.priv.no Tue Jan 11 17:13:35 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 11 Jan 2000 18:13:35 +0100 Subject: [XML-SIG] Groves In-Reply-To: <199912291541.IAA02695@localhost.localdomain> References: <199912291541.IAA02695@localhost.localdomain> Message-ID: * uche ogbuji | | Unfortunately, this probably puts practical use of the pure grove | model on hold for us. We are actually working heavily with RDF in a | current project, using internal Python/RDF tools that the Python | community will probably see soon in OSS form, and we'll probably | continue to use RDF directly. Just for the record: I also have a RDF tool for Python slowing brewing at home. It might be an idea for us to agree on the interface to in-memory RDF objects, and perhaps also to make it the same as the one for grove nodes. | I shall continue to study the grove model, however, as a means of | thinking as clearly as possible about data. As will I. Some sort of harmonization between groves and RDF would be most interesting, I think. --Lars M. From evangelo@pigdog.org Sat Jan 15 01:13:44 2000 From: evangelo@pigdog.org (ESP) Date: 14 Jan 2000 17:13:44 -0800 Subject: [XML-SIG] XBEL Message-ID: I dunno if anyone's keeping an eye on this stuff, but I have a patch to submit for the msie_parse.py util. If someone could tell me how to submit it, I'd be super grateful. ~ESP -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ESP | http://pigdog.org/ | RoR - Alucard ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From Jeremy Siek Sun Jan 16 02:45:40 2000 From: Jeremy Siek (Jeremy Siek ) Date: Sat, 15 Jan 2000 21:45:40 -0500 (EST) Subject: [XML-SIG] trouble shooting XSL demo Message-ID: <200001160245.VAA03829@philoctetes.lsc.nd.edu> Hi, I'm trying to learn how to the python XSL package and ran into a difficulty right away: I try to run the demo like so: python /usr/local/lib/XSL/Processor.py addr_book1.xml addr_book1.xsl And get the following error: IOError: (2, 'No such file or directory') The traceback started at: engine.include_xsl_file(name) ... Suggestions? Probably some simple install problem, yes? Thanks ahead of time, Jeremy From uche.ogbuji@fourthought.com Sun Jan 16 03:17:33 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Sat, 15 Jan 2000 20:17:33 -0700 Subject: [XML-SIG] trouble shooting XSL demo In-Reply-To: Your message of "Sat, 15 Jan 2000 21:45:40 EST." <200001160245.VAA03829@philoctetes.lsc.nd.edu> Message-ID: <200001160317.UAA03338@localhost.localdomain> > I'm trying to learn how to the python XSL package > and ran into a difficulty right away: > > I try to run the demo like so: > > python /usr/local/lib/XSL/Processor.py addr_book1.xml addr_book1.xsl > > And get the following error: > > IOError: (2, 'No such file or directory') > > The traceback started at: > engine.include_xsl_file(name) > ... Whoa! It looks as if the version you are trying to use is a year old. Not only has the software changed significantly, but so has the standard as well. You can get the latest version of 4XSLT at http://FourThought.com/4Suite/4XSLT If you are using Linux, RPMs are available, and if you are running Windows, please let us know and we can send you some tips for compiling the package. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From akuchlin@mems-exchange.org Tue Jan 18 01:49:52 2000 From: akuchlin@mems-exchange.org (A.M. Kuchling) Date: Mon, 17 Jan 2000 20:49:52 -0500 Subject: [XML-SIG] Developer's Day position paper Message-ID: <200001180149.UAA01038@207-172-112-37.s37.tnt4.ann.va.dialup.rcn.com> Here's the position paper for the XML-SIG's Developer's Day session. The issues I'd most like to see resolved are: * Should we switch to 4DOM? Or should I seriously start pushing on implementing DOM Level 2? Attendees are encouraged to look at 4DOM's code, so they can form some opinions of its quality. * Are there any issues about a SAX2 Python binding that we should discuss? * What about xmllib.py? (I raise this issue with some trepidation :) ) Some things on the list are quite simple; for example, consensus seemed to be that qp_xml.py should go into the tree. Greg, you want to go ahead and check it in? -- A.M. Kuchling http://starship.python.net/crew/amk/ "Don't try anything." "It's all right; I won't hurt you." -- Unnamed thug and Liz Shaw, in "The Ambassadors of Death" The XML-SIG has been mostly drifting for the last year, with little attempt made to push out a 1.0 version. Part of this stemmed from external forces (little development on a Python Unicode type, no namespace support in SAX or DOM), and part stemmed from internal reasons (my getting distracted, few people having time to work on it). We need to finish a 1.0 version as soon as possible. Paul Prescod listed some standards that should be supported: XML SAX Unicode XPath XPointer DOM XSLT. XSLT seems too far afield, but the rest are probably worthwhile candidates. (Can we prioritize them?) Issues remaining: DOM Level 2 support =================== Should we switch to 4Thought's implementations of DOM, XPath, and XPointer? Participants should try to at least read through some of 4Thought's code, so they can form an opinion of its quality. Pros: * An existing DOM Level 2 implementation. * Maintainers use it actively for real work; PyDOM maintainer has short attention span. * Has XPath and XSLT tools built on top of it. (Paul Prescod wrote a few weeks ago that "Ideally we would have one (or at most two!) implementation of each of the major specs: XML, SAX, Unicode, XPath, XPointer, XSLT, DOM"; if you take 4DOM + 4XSL + 4Path, this would mean that Unicode is the only missing piece.) * Faster than PyDOM * Potential for CORBA support by adding some extra bits Cons: * Does anyone other than the maintainers have any experience with it? Any comments? (If you don't want to slag it off publicly, you can send me unfavorable comments privately, and I'll preserve your anonymity.) * Uses Ft.Dom package name, not xml.dom * Potential incompatibilities with existing code, Sean's book, etc. (But probably a bit of glue code will let us smooth over such problems.) * Requires releasing nodes explicitly * Requires that 4Suite base be added to XML-SIG distribution (But the only dependency, at least in the DOM, seems to be on Ft.Lib.TraceOut.) SAX Namespace support ===================== This requires that XML-DEV converges to some consensus on SAX2. Has it done so? Are there issues facing a SAX2 binding that we should discuss? Unicode ======= We've been waiting on the official Python solution. M.-A. Lemburg is implementing a proposal (http://starship.skyport.net/~lemburg/unicode-proposal.txt) but it hasn't reached the Python CVS tree yet. Therefore, we still haven't dropped wstrop. My inclination is to do this as soon as Unicode is reasonably stable in the CVS tree. Adding qp_xml.py ================ To the xml.parser package, presumably? xmllib.py ========= A touchy catfight ensued on the xml-sig mailing about xmllib.py's standards compliance (with or without sgmlop). Should xmllib.py be dropped? Can we do an xmllib.py compatible class on top of Expat? Glue for Java and COM parsers ============================= We don't have any particular support for Java-based parsers, or Microsoft's XML-parsing COM component, but we probably should. (Still, this is probably a low priority.) From gstein@lyra.org Tue Jan 18 12:36:56 2000 From: gstein@lyra.org (Greg Stein) Date: Tue, 18 Jan 2000 04:36:56 -0800 (PST) Subject: [XML-SIG] Developer's Day position paper In-Reply-To: <200001180149.UAA01038@207-172-112-37.s37.tnt4.ann.va.dialup.rcn.com> Message-ID: On Mon, 17 Jan 2000, A.M. Kuchling wrote: >... > Some things on the list are quite simple; for example, consensus > seemed to be that qp_xml.py should go into the tree. Greg, you want > to go ahead and check it in? I did not consider myself "authoritative" regarding the XML-SIG distribution, so I never gave myself checkin privileges :-). I can go ahead and do so, though, so that I can check in (and maintain) qp_xml. >... > Adding qp_xml.py > ================ > > To the xml.parser package, presumably? Either there or xml.utils. I think xml.parsers makes more sense, but am open to opinion. Cheers, -g -- Greg Stein, http://www.lyra.org/ From fdrake@acm.org Tue Jan 18 18:18:20 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 18 Jan 2000 13:18:20 -0500 (EST) Subject: [XML-SIG] XBEL In-Reply-To: References: Message-ID: <14468.44652.383946.198726@weyr.cnri.reston.va.us> ESP writes: > I dunno if anyone's keeping an eye on this stuff, but I have a patch > to submit for the msie_parse.py util. > > If someone could tell me how to submit it, I'd be super grateful. If Andrew hasn't already asked for it, please post it to the XML-SIG list if it isn't too long. (If it is long, summarize the change to the list and send the patch to Andrew or myself.) Thanks! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From larsga@garshol.priv.no Tue Jan 18 18:25:26 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 18 Jan 2000 19:25:26 +0100 Subject: [XML-SIG] Developer's Day position paper In-Reply-To: References: Message-ID: * Greg Stein | | [qp_xml location] | Either there or xml.utils. I think xml.parsers makes more sense, but | am open to opinion. Personally, I don't think it's a parser. expat is the parser, and qp_xml is a client to that parser that builds a data structure suitable for navigation. xml.utils is better, IMHO. --Lars M. From skip@mojam.com (Skip Montanaro) Tue Jan 18 18:33:19 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 18 Jan 2000 12:33:19 -0600 (CST) Subject: [XML-SIG] Developer's Day position paper In-Reply-To: <200001180149.UAA01038@207-172-112-37.s37.tnt4.ann.va.dialup.rcn.com> References: <200001180149.UAA01038@207-172-112-37.s37.tnt4.ann.va.dialup.rcn.com> Message-ID: <14468.45551.632590.597771@beluga.mojam.com> AMK> xmllib.py AMK> ========= AMK> A touchy catfight ensued on the xml-sig mailing about xmllib.py's AMK> standards compliance (with or without sgmlop). Should xmllib.py be AMK> dropped? Can we do an xmllib.py compatible class on top of Expat? I haven't really been following this list for awhile, because my eyes just sort of glaze over when everyone starts slinging various XML-related acronyms (does XML seem to be worse afflicted with this disease than the rest of the Internet community? seems like it to me). I will throw in my two cents on this issue (not having noticed the aforementioned catfight in this list because of optical glazing). I currently use xmllib+sgmlop+xmlrpclib to do XML-RPC stuff in a production server. If you can't find a solution that is as easy to use and that performs at least as well, I'll have to freeze on what I have now. I've tried solutions that were implemented in straight Python and Python+private C-based extensions. The former is useless performance-wise, the other is a major headache to maintain. I for one am thankful that Fredrik Lundh developed and released both sgmlop and xmlrpclib. They make my days much more pleasant. Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ 847-971-7098 From paul@prescod.net Tue Jan 18 19:16:47 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 18 Jan 2000 11:16:47 -0800 Subject: [XML-SIG] Developer's Day position paper References: <200001180149.UAA01038@207-172-112-37.s37.tnt4.ann.va.dialup.rcn.com> <14468.45551.632590.597771@beluga.mojam.com> Message-ID: <3884BC1F.3CBD0B2B@prescod.net> I agree with everything Skip says. We need something as fast and easy as xmllib. We also need something compliant to the Unicode and XML specifications. Does anyone disagree? Does anyone think that these goals are mutually exclusive? -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Earth will soon support only survivor species -- dandelions, roaches, lizards, thistles, crows, rats. Not to mention 10 billion humans. - Planet of the Weeds, Harper's Magazine, October 1998 From paul@prescod.net Tue Jan 18 19:17:01 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 18 Jan 2000 11:17:01 -0800 Subject: [XML-SIG] Developer's Day position paper References: <200001180149.UAA01038@207-172-112-37.s37.tnt4.ann.va.dialup.rcn.com> <14468.45551.632590.597771@beluga.mojam.com> Message-ID: <3884BC2D.E4EE339D@prescod.net> I agree with everything Skip says. We need something as fast and easy as xmllib. We also need something compliant to the Unicode and XML specifications. Does anyone disagree? Does anyone think that these goals are mutually exclusive? -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Earth will soon support only survivor species -- dandelions, roaches, lizards, thistles, crows, rats. Not to mention 10 billion humans. - Planet of the Weeds, Harper's Magazine, October 1998 From fdrake@acm.org Tue Jan 18 19:56:45 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 18 Jan 2000 14:56:45 -0500 (EST) Subject: [XML-SIG] Developer's Day position paper In-Reply-To: <3884BC1F.3CBD0B2B@prescod.net> References: <200001180149.UAA01038@207-172-112-37.s37.tnt4.ann.va.dialup.rcn.com> <14468.45551.632590.597771@beluga.mojam.com> <3884BC1F.3CBD0B2B@prescod.net> Message-ID: <14468.50557.762952.669644@weyr.cnri.reston.va.us> Paul Prescod writes: > I agree with everything Skip says. We need something as fast and easy as > xmllib. We also need something compliant to the Unicode and XML > specifications. Does anyone disagree? Does anyone think that these goals > are mutually exclusive? No and no. There's absolutely no reason to drop support for xmllib. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From akuchlin@mems-exchange.org Wed Jan 19 03:21:40 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Tue, 18 Jan 2000 22:21:40 -0500 (EST) Subject: [XML-SIG] Developer's Day position paper In-Reply-To: <14468.45551.632590.597771@beluga.mojam.com> References: <200001180149.UAA01038@207-172-112-37.s37.tnt4.ann.va.dialup.rcn.com> <14468.45551.632590.597771@beluga.mojam.com> Message-ID: <14469.11716.833194.855673@newcnri.cnri.reston.va.us> Skip Montanaro writes: >this list because of optical glazing). I currently use >xmllib+sgmlop+xmlrpclib to do XML-RPC stuff in a production server. If you >can't find a solution that is as easy to use and that performs at least as >well, I'll have to freeze on what I have now. OK, hold that thought. Attached is a simple benchmark script that use PyExpat and xmllib+sgmlop to parse the 279K hamlet.xml file (ummm.. it's part of Jon Bosak's XML sample data, but I'm not sure where to download it from). Could someone please verify these results, or point out some stupid error in the benchmark script? (Since I use my system for developing the XML package, it's possible that I'm getting an old or broken version of xmllib+sgmlop.) [amk@mira bench]$ python pyexp.py PyExpat w/ null handlers: 279663 bytes in 0.04 seconds = 6575.70 K/sec PyExpat w/ StartElementHandler: 279663 bytes in 0.19 seconds = 1457.33 K/sec PyExpat w/ Start,End: 279663 bytes in 0.25 seconds = 1102.61 K/sec PyExpat w/ Start,End,Char,PI: 279663 bytes in 0.36 seconds = 758.03 K/sec Fast xmllib: 279663 bytes in 3.15 seconds = 86.66 K/sec Slow xmllib: 279663 bytes in 17.77 seconds = 15.37 K/sec Raw sgmlop: 279663 bytes in 0.02 seconds = 11004.42 K/sec [amk@mira bench]$ Assuming no errors in the benchmark, xmllib on top of PyExpat should be around half as fast as xmllib on top of sgmlop, probably roughly 40K/sec on my machine. (That's just a guess, though.) Like economists, this benchmark probably points in several directions. :) --amk import os, time from xml.parsers import pyexpat f = open('hamlet.xml', 'r') data = f.read() size = f.tell() def dummy(*args): pass def print_duration(parser, duration): print '%s: %i bytes in %.02f seconds = %.02f K/sec' % (parser, size, duration, size/duration/1024.0) parser = pyexpat.ParserCreate( ) start = time.time() parser.Parse( data, 1 ) print_duration('PyExpat w/ null handlers', time.time() - start) parser = pyexpat.ParserCreate( ) parser.StartElementHandler = dummy start = time.time() parser.Parse( data, 1 ) print_duration('PyExpat w/ StartElementHandler', time.time() - start) parser = pyexpat.ParserCreate( ) parser.StartElementHandler = dummy parser.EndElementHandler = dummy start = time.time() parser.Parse( data, 1 ) print_duration('PyExpat w/ Start,End', time.time() - start) parser = pyexpat.ParserCreate( ) parser.StartElementHandler = dummy parser.EndElementHandler = dummy parser.CharacterDataHandler = dummy parser.ProcessingInstructionHandler = dummy start = time.time() parser.Parse( data, 1 ) print_duration('PyExpat w/ Start,End,Char,PI', time.time() - start) from xml.parsers import xmllib p = xmllib.FastXMLParser() start = time.time() p.feed(data) p.close() print_duration('Fast xmllib', time.time() - start) p = xmllib.SlowXMLParser() start = time.time() p.feed(data) p.close() print_duration('Slow xmllib', time.time() - start) import sgmlop p = sgmlop.XMLParser() start = time.time() p.feed(data) p.close() print_duration('Raw sgmlop', time.time() - start) From tpassin@idsonline.com Wed Jan 19 04:34:56 2000 From: tpassin@idsonline.com (Thomas B. Passin) Date: Tue, 18 Jan 2000 23:34:56 -0500 Subject: [XML-SIG] Developer's Day position paper References: <200001180149.UAA01038@207-172-112-37.s37.tnt4.ann.va.dialup.rcn.com> <14468.45551.632590.597771@beluga.mojam.com> <3884BC1F.3CBD0B2B@prescod.net> Message-ID: <002f01bf6236$8c674f40$342a08d1@tomshp> Paul wrote: > I agree with everything Skip says. We need something as fast and easy as > xmllib. We also need something compliant to the Unicode and XML > specifications. Does anyone disagree? Does anyone think that these goals > are mutually exclusive? > > -- I agree - easy and reasonably fast. There's lots of jobs out there that use small xml files and want to be easy to get off and running. xmllib is good for that. Also agree on Unicode and the XML standards (I think we can wait awhile longer for the dust to settle on xml-schemas, though). I think that an XPath processor would be important since it could be the basis of any number of query processors. Another thought, does anyone else think this should count for anything? That is the subject of compatibility with JPython. Right now, xmllib (minus the c-based parsers) ought to work with JPython, but the 4thought suite won't, since they need to compile C stuff with bison, etc. I think we ought to have a basic library that's easy to use and will work with both flavors of Python on any machine. (On the other hand, JPython should make it relatively easy to work with all those nice Java products). This leads me to think that we shouldn't rely on the 4thought suite as the **sole** processors in the library. Anyone want to add some thoughts to this? Tom Passin From paul@prescod.net Fri Jan 21 11:38:01 2000 From: paul@prescod.net (Paul Prescod) Date: Fri, 21 Jan 2000 03:38:01 -0800 Subject: [XML-SIG] xmllib on expat References: <200001180149.UAA01038@207-172-112-37.s37.tnt4.ann.va.dialup.rcn.com> <14468.45551.632590.597771@beluga.mojam.com> <14469.11716.833194.855673@newcnri.cnri.reston.va.us> Message-ID: <38884519.CEB31DE0@prescod.net> I had just decided to do this "xmllib on top of PyExpat" idea this afternoon! I was planning to make big changes to pyexpat but then it looked like I wouldn't have to. The interfaces are pretty close. But I'm actually having trouble getting PyExpat to work at all. Am I doing something stupid? I wanted to get it out tonight but it is getting very late so I'll have to figure it out on a plane tomorrow and send it Saturday morning. :( Here's my code: import pyexpat parser = pyexpat.ParserCreate() def show( *x ): print x parser.StartElementHandler=show parser.EndElementHandler=show parser.CharacterDataHandler=show parser.ProcessingInstructionHandler=show data=open( "hamlet.xml" ).read() parser.Parse( data, 0 ) Nothing happens. Nothing ever gets printed. I've already written the xmllib on pyexpat code so if I can get this little example to work, I would be home free. Also, I'm having trouble with pyexpat sometimes hanging depending on the buffer length. That code needs a general cleanup but its only 500 lines so we're not talking alot of code. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Earth will soon support only survivor species -- dandelions, roaches, lizards, thistles, crows, rats. Not to mention 10 billion humans. - Planet of the Weeds, Harper's Magazine, October 1998 From paul@prescod.net Fri Jan 21 11:37:34 2000 From: paul@prescod.net (Paul Prescod) Date: Fri, 21 Jan 2000 03:37:34 -0800 Subject: [XML-SIG] Developer's Day position paper References: <200001180149.UAA01038@207-172-112-37.s37.tnt4.ann.va.dialup.rcn.com> <14468.45551.632590.597771@beluga.mojam.com> <14469.11716.833194.855673@newcnri.cnri.reston.va.us> Message-ID: <388844FE.2B682DC7@prescod.net> Andrew Kuchling wrote: > > [amk@mira bench]$ python pyexp.py > PyExpat w/ null handlers: 279663 bytes in 0.04 seconds = 6575.70 K/sec > PyExpat w/ StartElementHandler: 279663 bytes in 0.19 seconds = 1457.33 > K/sec > PyExpat w/ Start,End: 279663 bytes in 0.25 seconds = 1102.61 K/sec > PyExpat w/ Start,End,Char,PI: 279663 bytes in 0.36 seconds = 758.03 > K/sec > Fast xmllib: 279663 bytes in 3.15 seconds = 86.66 K/sec > Slow xmllib: 279663 bytes in 17.77 seconds = 15.37 K/sec > Raw sgmlop: 279663 bytes in 0.02 seconds = 11004.42 K/sec > [amk@mira bench]$ > > Assuming no errors in the benchmark, xmllib on top of PyExpat should > be around half as fast as xmllib on top of sgmlop, probably roughly > 40K/sec on my machine. (That's just a guess, though.) Like > economists, this benchmark probably points in several directions. :) I don't see where you get that figure. Actual parsing takes up a small fraction of xmllib's time. If you reduce that to zero you still don't speed up the entire process much. If you double it, you don't slow down the entire process much. If you double the 0.02 seconds (parsing time) in the 3.15 (xmllib processing time) you change the time to 3.17 seconds -- an increase of just 0.6%. (but its late...I may be missing something) -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Earth will soon support only survivor species -- dandelions, roaches, lizards, thistles, crows, rats. Not to mention 10 billion humans. - Planet of the Weeds, Harper's Magazine, October 1998 From guido@CNRI.Reston.VA.US Fri Jan 21 19:01:08 2000 From: guido@CNRI.Reston.VA.US (Guido van Rossum) Date: Fri, 21 Jan 2000 14:01:08 -0500 Subject: [XML-SIG] Python Conference weather Message-ID: <200001211901.OAA29501@eric.cnri.reston.va.us> If you're traveling to the Python Conference, be advised that winter has finally arrived in the Washington, DC area. We're currently experiencing *high* temperatures of 22 degrees F (-6 degrees C); with the wind chill it will feel much colder. So be sure to pack warm clothes. Yesterday, about 6 inches of snow fell, disrupting air travel; more is expected on Sunday, so expect delays flying into DC. Local transportation should be fully operational, but may experience some delays. We've placed a weather advisory in the local section of the conference website: http://www.python.org/workshops/2000-01/local.html Over 250 people have registered for the conference. See you all there! --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin@mems-exchange.org Fri Jan 21 20:27:58 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Fri, 21 Jan 2000 15:27:58 -0500 (EST) Subject: [XML-SIG] Developer's Day position paper In-Reply-To: <388844FE.2B682DC7@prescod.net> References: <200001180149.UAA01038@207-172-112-37.s37.tnt4.ann.va.dialup.rcn.com> <14468.45551.632590.597771@beluga.mojam.com> <14469.11716.833194.855673@newcnri.cnri.reston.va.us> <388844FE.2B682DC7@prescod.net> Message-ID: <14472.49486.805766.80052@amarok.cnri.reston.va.us> Paul Prescod writes: >I don't see where you get that figure. Actual parsing takes up a small >fraction of xmllib's time. If you reduce that to zero you still don't >speed up the entire process much. If you double it, you don't slow Of course, you're right; the vast majority of the time is spent in calling Python code, not parsing, and the calling time shouldn't scale as badly as I thought. I've been reading through the sgmlop.c code. While it probably wouldn't be too difficult to add the missing features (there are already FIXMEs in the code where you would do DTD-related parsing) and tighten up the parser's strictness when in XML mode, it is a significant bit of work. Also, given that Expat handles multiple encodings, while sgmlop gains its speed from being able to use code like this: while (ISALNUM(*p) || *p == '.') if (++p >= end) goto eol; I wonder if the added dereferences of a character encoding translation table would knock sgmlop's speed down to Expat's? -- A.M. Kuchling http://starship.python.net/crew/amk/ It would be nice to be unfailingly, perpetually, remorselessly funny, day in and day out, year in and year out until somebody murdered you, now wouldn't it? -- Robertson Davies, _The Diary of Samuel Marchbanks_ From gstein@lyra.org Sat Jan 22 02:48:51 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 21 Jan 2000 18:48:51 -0800 (PST) Subject: [XML-SIG] xmllib on expat In-Reply-To: <38884519.CEB31DE0@prescod.net> Message-ID: On Fri, 21 Jan 2000, Paul Prescod wrote: >... > Here's my code: > > import pyexpat > > parser = pyexpat.ParserCreate() > def show( *x ): > print x > > parser.StartElementHandler=show > parser.EndElementHandler=show > parser.CharacterDataHandler=show > parser.ProcessingInstructionHandler=show > > data=open( "hamlet.xml" ).read() > parser.Parse( data, 0 ) Try changing that 0 to a 1... meaning "end of input." It may be possible that Expat is buffering the input for some weird reason. Otherwise, the code looks fine to me. qp_xml uses pyexpat if you want a short, simple reference. http://www.lyra.org/greg/python/qp_xml.py >... > Also, I'm having trouble with pyexpat sometimes hanging depending on the > buffer length. That code needs a general cleanup but its only 500 lines > so we're not talking alot of code. Never seen this. Cheers, -g -- Greg Stein, http://www.lyra.org/ From anthony@interlink.com.au Sat Jan 22 03:13:49 2000 From: anthony@interlink.com.au (Anthony Baxter) Date: Sat, 22 Jan 2000 14:13:49 +1100 Subject: [XML-SIG] request for advice on decoding Zope XML exports. Message-ID: <200001220313.OAA30273@mbuna.arbhome.com.au> I'd like to put together something to parse the XML format export that Zope can produce. Unfortunately, I've not yet delved into the wonders (if that is the word) of the XML-SIG's work, and I was wondering if there's pointers to some simple examples of this sort of thing out there... thanks, Anthony From larsga@garshol.priv.no Sat Jan 22 09:06:19 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 22 Jan 2000 10:06:19 +0100 Subject: [XML-SIG] request for advice on decoding Zope XML exports. In-Reply-To: <200001220313.OAA30273@mbuna.arbhome.com.au> References: <200001220313.OAA30273@mbuna.arbhome.com.au> Message-ID: * Anthony Baxter | | I'd like to put together something to parse the XML format export | that Zope can produce. Unfortunately, I've not yet delved into the | wonders (if that is the word) of the XML-SIG's work, and I was | wondering if there's pointers to some simple examples of this sort | of thing out there... There are some examples in the XML-SIG package in the demo directory. The quotes and xbel directories contain SAX examples, whereas the dom directory has a html2html sample that might be useful. There doesn't seem to be an qp_xml samples, but perhaps there should be? --Lars M. From larsga@garshol.priv.no Sat Jan 22 16:47:11 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 22 Jan 2000 17:47:11 +0100 Subject: [XML-SIG] XBEL support in Jazilla Message-ID: As of Jazilla 0.2, it supports XBEL as a bookmark format. This might be worth listing on the XBEL pages. --Lars M. From paul@prescod.net Mon Jan 24 15:56:47 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 24 Jan 2000 09:56:47 -0600 Subject: [XML-SIG] Expat as xmllib Message-ID: <388C763F.13264AF0@prescod.net> This is a multi-part message in MIME format. --------------E2834FEC56D5F06E9B5E259A Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit The attached library allows expat to be used as a basis for a parser with the xmllib interface. Performance: Without any xmllib-specific optimization, pyexpat runs almost as fast as sgmlop: raw sgmlop: 13222 items; 0.426 seconds; 1281.29 kbytes per second fast xmllib: 13222 items; 1.445 seconds; 378.03 kbytes per second slow xmllib: 13222 items; 6.651 seconds; 82.11 kbytes per second pyexpat: 13210 items; 1.527 seconds; 357.68 kbytes per second I can think of several optimizations that could speed it up quite a bit. Also if you compare it to the xmllib in the standard distribution, we are talking night and day so if we bundle expat we're only improving things for them. Conformance Pyexpat caught more errors than xmllib, was more accepting of legal XML input (e.g. ) and handled entities (especially character entities) in a manner consistent with the XML specification. These explain the differenced in the number of "items" above. Backwards Compatibility The only big compatibility difference between xmllib on pyexpat and xmllib on sgmlop is that expat expands entity references like & to "&" instead of to a separate event. This is actually a feature of expat because it is doing entity expansion *for you*. The XML spec requires this behavior. The library and a test program are attached. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Earth will soon support only survivor species -- dandelions, roaches, lizards, thistles, crows, rats. Not to mention 10 billion humans. - Planet of the Weeds, Harper's Magazine, October 1998 --------------E2834FEC56D5F06E9B5E259A Content-Type: text/plain; charset=us-ascii; name="ExpatOp.py" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="ExpatOp.py" from xml.parsers import xmllib import pyexpat handlerMap=[("finish_starttag", "StartElementHandler"), ("finish_endtag", "EndElementHandler"), ("handle_data","CharacterDataHandler"), ("handle_proc","ProcessingInstructionHandler")] class ExpatPretendingToBeSGMLOp: def __init__(self, encoding=None): if encoding: self.pyexpat=pyexpat.ParserCreate(encoding) else: self.pyexpat=pyexpat.ParserCreate() def close( self ): self.pyexpat.Parse( "", 1 ) def parse( self, data ): self.pyexpat.Parse( data, 1 ) def feed( self, data ): self.pyexpat.Parse( data, 0 ) def register( self, obj ): for oldname,newname in handlerMap: method=getattr( obj, oldname, None ) setattr( self.pyexpat, newname, method ) class XMLParser( xmllib.FastXMLParser ): def reset( self ): xmllib.FastXMLParser.reset(self) self.parser=ExpatPretendingToBeSGMLOp() self.feed=self.parser.pyexpat.Parse self.parser.register( self ) if __name__=="__main__": import sys junk = open( "out.tmp","w") if len( sys.argv )>1: filename=sys.argv[1] else: filename="hamlet.xml" class myparser( XMLParser ): def handle_proc(self, target,data): junk.write( "\n?"+target+data ) def handle_data( self, data): junk.write( "\n'"+data) def finish_starttag(self,gi,attrs): junk.write( "\n<>"+gi+ `attrs` ) def finish_endtag( self, gi ): junk.write( "\n"+gi ) myparser().feed( open( filename).read() ) --------------E2834FEC56D5F06E9B5E259A Content-Type: text/plain; charset=us-ascii; name="testxml1.py" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="testxml1.py" # basic tests test_sgmlop = 1 import sys import time, string from xml.parsers import sgmlop, xmllib, ExpatOp try: FILE, VERBOSE = sys.argv[1], 2 except IndexError: FILE, VERBOSE = "hamlet.xml", 1 print print "test collecting parsers on", FILE print # -------------------------------------------------------------------- # sgmlop class myCollector: def __init__(self): self.data = [] self.text = [] def finish_starttag(self, tag, data): if self.text: self.data.append(repr(string.join(self.text, ""))) self.text = [] self.data.append("start", tag, data) def handle_proc(self, tag, data): if self.text: self.data.append(repr(string.join(self.text, ""))) self.text = [] self.data.append("pi", tag, data) def handle_special(self, data): if self.text: self.data.append(repr(string.join(self.text, ""))) self.text = [] self.data.append("special", data) def handle_entityref(self, data): if self.text: self.data.append(repr(string.join(self.text, ""))) self.text = [] self.data.append("entity", data) def handle_data(self, data): self.text.append(data) def handle_cdata(self, data): self.text.append("CDATA" + data) def doRawSGMLOp(): global parser t = time.clock() for i in range(1): out = myCollector() fp = open(FILE) parser = sgmlop.XMLParser() parser.register(out) b = 0 while 1: data = fp.read(512) if not data: break parser.feed(data) b = b + len(data) parser.close() t1 = time.clock() - t print "raw sgmlop:", len(out.data), "items;", round(t1, 3), "seconds;", print round(b / t1 / 512, 2), "kbytes per second" return t1 # -------------------------------------------------------------------- # xmllib base=None def makeparser( basecls ): global base base=basecls class FastXMLParser(base): def __init__(self): base.__init__(self) self.data = [] self.text = [] def unknown_starttag(self, tag, data): if self.text: self.data.append(repr(string.join(self.text, ""))) self.text = [] self.data.append("start", tag, data) def handle_proc(self, tag, data): if self.text: self.data.append(repr(string.join(self.text, ""))) self.text = [] self.data.append("pi", tag, data) def handle_special(self, data): if self.text: self.data.append(repr(string.join(self.text, ""))) self.text = [] self.data.append("special", data) def handle_entityref(self, data): if self.text: self.data.append(repr(string.join(self.text, ""))) self.text = [] self.data.append("entity", data) def handle_data(self, data): self.text.append(data) def handle_cdata(self, data): self.text.append("CDATA" + data) return FastXMLParser def doFastXMLLib(): global parser2 FastXMLParser = makeparser( xmllib.FastXMLParser ) t = time.clock() for i in range(1): fp = open(FILE) parser2 = FastXMLParser() b = 0 while 1: data = fp.read(512) if not data: break parser2.feed(data) b = b + len(data) parser2.close() t2 = time.clock() - t print "fast xmllib:", len(parser2.data), "items;", round(t2, 3), "seconds;", print round(b / t2 / 512, 2), "kbytes per second" return t2 class SlowXMLParser(xmllib.SlowXMLParser): def __init__(self): xmllib.SlowXMLParser.__init__(self) self.data = [] self.text = [] def unknown_starttag(self, tag, data): if self.text: self.data.append(repr(string.join(self.text, ""))) self.text = [] self.data.append("start", tag, data) def handle_proc(self, tag, data): if self.text: self.data.append(repr(string.join(self.text, ""))) self.text = [] self.data.append("pi", tag, data) def handle_special(self, data): if self.text: self.data.append(repr(string.join(self.text, ""))) self.text = [] self.data.append("special", data) def handle_entityref(self, data): if self.text: self.data.append(repr(string.join(self.text, ""))) self.text = [] self.data.append("entity", data) def handle_data(self, data): self.text.append(data) def handle_cdata(self, data): self.text.append("CDATA" + data) def doSlowXMLLib(): global parser3 t = time.clock() for i in range(1): fp = open(FILE) parser3 = SlowXMLParser() b = 0 while 1: data = fp.read(512) if not data: break parser3.feed(data) b = b + len(data) parser3.close() t3 = time.clock() - t print "slow xmllib:", len(parser3.data), "items;", round(t3, 3), "seconds;", print round(b / t3 / 512, 2), "kbytes per second" return t3 def doPyExpat(): global parser4 # PyExpat FastXMLParser = makeparser( ExpatOp.XMLParser ) t = time.clock() for i in range(1): fp = open(FILE) parser4 = FastXMLParser() b = 0 while 1: data = fp.read(512) if not data: break parser4.feed(data) b = b + len(data) parser4.close() t4 = time.clock() - t print "pyexpat:", len(parser4.data), "items;", round(t4, 3), "seconds;", print round(b / t4 / 512, 2), "kbytes per second" return t4 t1=doRawSGMLOp() t2=doFastXMLLib() t3=doSlowXMLLib() t4=doPyExpat() print print "normalized timing:" print "slow xmllib", 1.0 print "fast xmllib", round(t2 / t3, 2), "(%sx)" % round(t3 / t2, 1) print "sgmlop ", round(t1 / t3, 2), "(%sx)" % round(t3 / t1, 1) print "pyexpat ", round(t4 / t3, 2), "(%sx)" % round(t3 / t4, 1) print print "looking for differences:" items = min(len(parser2.data), len(parser4.data)) for i in xrange(items): if parser2.data[i] != parser3.data[i]: for j in range(max(i-5, 0), min(i+5, items)): if parser2.data[j] != parser3.data[j]: print "+", j+1, parser2.data[j] print "*", j+1, parser3.data[j] else: print "=", j+1, parser2.data[j] break else: print "(no differences)" --------------E2834FEC56D5F06E9B5E259A-- From akuchlin@mems-exchange.org Mon Jan 24 16:49:51 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Mon, 24 Jan 2000 11:49:51 -0500 (EST) Subject: [XML-SIG] PyXML 0.5.3 released Message-ID: <200001241649.LAA12996@amarok.cnri.reston.va.us> I've released a new snapshot of the PyXML snapshot. The changes are pretty minor: * Fixed setup.py to work with the Distutils, following suggestions from Greg Ward * Dropped the xmlarch code, as previously discussed * Started signing the distribution with my GnuPG key That's about it. I'll try to add the Expat/xmllib code for the next snapshot. -- A.M. Kuchling http://starship.python.net/crew/amk/ To see the world in a grain of sand, / And a heaven in a wild flower; / Hold infinity in the palm of your hand, / And eternity in an hour. -- William Blake, "Auguries of Innocence" From akuchlin@mems-exchange.org Mon Jan 24 17:35:52 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Mon, 24 Jan 2000 12:35:52 -0500 (EST) Subject: [XML-SIG] Expat as xmllib In-Reply-To: <388C763F.13264AF0@prescod.net> References: <388C763F.13264AF0@prescod.net> Message-ID: <14476.36216.685678.946314@amarok.cnri.reston.va.us> Paul Prescod writes: >Without any xmllib-specific optimization, pyexpat runs almost as fast as >sgmlop: >raw sgmlop: 13222 items; 0.426 seconds; 1281.29 kbytes per second >fast xmllib: 13222 items; 1.445 seconds; 378.03 kbytes per second >slow xmllib: 13222 items; 6.651 seconds; 82.11 kbytes per second >pyexpat: 13210 items; 1.527 seconds; 357.68 kbytes per second >I can think of several optimizations that could speed it up quite a bit. 21K/sec difference, or around 6% slower; very good. Let's discuss these optimizations at IPC8; I'd like to get a version of this into the CVS tree ASAP. >Also if you compare it to the xmllib in the standard distribution, we >are talking night and day so if we bundle expat we're only improving >things for them. Note that the xmllib in 1.5.2 and xml.parsers.xmllib are different; namespace support has been added to the 1.5.2 version. This is a divergence that's needed fixing for a while, and now seems like a good opportunity.. Is Expat becoming a fairly common component of Linux and *BSD distributions? I still dislike the idea of adding Expat to the Python distribution, because of possible collisions with updated versions of Expat. -- A.M. Kuchling http://starship.python.net/crew/amk/ And at times the fact of her absence will hit you like a blow to the chest, and you will weep. But this will happen less and less as time goes on. -- From SANDMAN: "The Song of Orpheus" From paul@prescod.net Tue Jan 25 06:15:49 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 25 Jan 2000 01:15:49 -0500 Subject: [XML-SIG] Expat as xmllib References: <388C763F.13264AF0@prescod.net> <14476.36216.685678.946314@amarok.cnri.reston.va.us> Message-ID: <388D3F95.82F32C4@prescod.net> "Andrew M. Kuchling" wrote: > > 21K/sec difference, or around 6% slower; very good. Let's discuss > these optimizations at IPC8; I'd like to get a version of this into > the CVS tree ASAP. The optimizations I am thinking about are in pyexpat itself. It also needs better error handling. > Note that the xmllib in 1.5.2 and xml.parsers.xmllib are different; > namespace support has been added to the 1.5.2 version. This is a > divergence that's needed fixing for a while, and now seems like a good > opportunity.. Expat can do namespaces for us in C. > Is Expat becoming a fairly common component of Linux and *BSD > distributions? I still dislike the idea of adding Expat to the Python > distribution, because of possible collisions with updated versions of > Expat. I'm not convinced that this is a big problem but let's just say it is. How hard would it be to rename exported object names and the final library name. It seems like it would be useful (and reasonably doable) to create a tool that uniqifies dynamic libraries in general. You can get the names using object-file reading tools or by parsing the text. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself The new revolutionaries believe the time has come for an aggressive move against our oppressors. We have established a solid beachhead on Friday. We now intend to fight vigorously for 'casual Thursdays.' -- who says America's revolutionary spirit is dead? From ke@gnu.franken.de Tue Jan 25 18:18:30 2000 From: ke@gnu.franken.de (Karl EICHWALDER) Date: 25 Jan 2000 19:18:30 +0100 Subject: [XML-SIG] Re: Expat as xmllib In-Reply-To: "Andrew M. Kuchling"'s message of "Mon, 24 Jan 2000 12:35:52 -0500 (EST)" References: <388C763F.13264AF0@prescod.net> <14476.36216.685678.946314@amarok.cnri.reston.va.us> Message-ID: "Andrew M. Kuchling" writes: | Is Expat becoming a fairly common component of Linux and *BSD | distributions? Debian and SuSE are featuring expat; SuSE with the next release, scheduled for March 2000 (IIRC). I didn't check other distributions. -- work : ke@suse.de | : http://www.suse.de/~ke/ | ------ ,__o home : ke@gnu.franken.de | ------ _-\_<, : http://www.franken.de/users/gnu/ke/ | ------ (*)/'(*) From uche.ogbuji@fourthought.com Wed Jan 26 18:35:18 2000 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 26 Jan 2000 11:35:18 -0700 Subject: [XML-SIG] ANN: 4DOM 0.9.2 Message-ID: <388F3E66.ADEC966D@fourthought.com> FourThought LLC (http://FourThought.com) announces the release of 4DOM 0.9.2 ----------------------- An XML/HTML Python library using the Document Object Model interface 4DOM is a Python library for XML and HTML processing and manipulation using the W3C's Document Object Model for interface. 4DOM implements DOM Core level 2, HTML level 2 and Level 2 Document Traversal. 4DOM should work on all platforms supported by Python. If you have any problems with a particular platform, please e-mail the authors. 4DOM is designed to allow developers rapidly design applications that read, write or manipulate HTML and XML. News ---- - Major fixes to namespace code - Other bug-fixes More info and Obtaining 4DOM ---------------------------- Please see http://FourThought.com/4Suite/4DOM Or you can download 4DOM from ftp://FourThought.com/pub/4Suite/4DOM 4DOM is distributed under a license similar to that of Python. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From uche.ogbuji@fourthought.com Wed Jan 26 18:41:48 2000 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 26 Jan 2000 11:41:48 -0700 Subject: [XML-SIG] ANN: 4XSLT 0.8.2 and 4XPath 4.8.2 Message-ID: <388F3FEC.C2256552@fourthought.com> FourThought LLC (http://FourThought.com) announces the release of 4XSLT and 4XPath 0.8.2 ---------------------- A python implementation of the W3C's XSLT language 4XSLT is an XML transformation processor based on the W3C's specification for the XSLT transform language. 4XPath implements the W3C XPath language for indicating and selecting XML document components. http://www.w3.org/TR/xslt 4XPath implements the full 4XPath recommendation except for the 'lang' core function. Currently, 4XSLT supports a sub-set of the XSLT recommendation including the following: Full expression support and attribute-value template expansion xsl:include xsl:import xsl:template xsl:apply-imports xsl:apply-templates xsl:copy xsl:call-template xsl:if xsl:for-each xsl:choose xsl:element xsl:when xsl:attribute xsl:otherwise xsl:text xsl:message xsl:value-of xsl:variable xsl:processing-instruction xsl:param xsl:comment xsl:with-param xsl:strip-space xsl:key xsl:preserve-space xsl:copy-of xsl:sort xsl:namespace-alias xsl:output and, of course, xsl:stylesheet, xsl:transform, literal elements and text Using the xml output method, 4XSLT produces the result tree by throwing events from the emerging SAX 2 standard to a handler, so it can be easily modified to supply results to any SAX 2 consumer. For the 'html' and 'text' output methods special SAX consumers produce HTML DOM nodes and plain text respectively. Note: 4XSLT and 4XPath cannot work with JPython. News ---- Changes in 0.8.2 ---------------- - Added i18n hooks - Added support for terminate option on xsl:message - Added more error checks - Fixed attribute-value templates - Fixed params - Kludge to avoid strtod('NaN') problem on Windows and FreeBSD: hopefully temporary - Bug-fixes More info and Obtaining 4XPath and 4XSLT ---------------------------------------- Please see http://FourThought.com/4Suite/4XPath http://FourThought.com/4Suite/4XSLT Or you can download 4XSLT from ftp://FourThought.com/pub/4Suite/4XPath ftp://FourThought.com/pub/4Suite/4XSLT 4XPath and 4XSLT are distributed under a license similar to that of Python. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From forkel@arsnova.de Wed Jan 26 19:09:47 2000 From: forkel@arsnova.de (Malte Forkel) Date: Wed, 26 Jan 2000 20:09:47 +0100 Subject: [XML-SIG] precompiled version? Message-ID: <388F467B.B6758C2C@arsnova.de> Hi, any chance to find a precompiled version of the XML toolkit? I'm using Python on Windows/NT. Thanks, Malte From akuchlin@mems-exchange.org Fri Jan 28 18:07:30 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Fri, 28 Jan 2000 13:07:30 -0500 (EST) Subject: [XML-SIG] DevDay results Message-ID: <200001281807.NAA18796@amarok.cnri.reston.va.us> The XML-SIG's developer's day session went well, and, unlike most DD sessions, we actually achieved consensus on something. :) To summarize the outcome: * The current PyDOM code will be dropped and replaced with 4DOM. The precise details of how this will work are still to be resolved; will the 4DOM code move into xml.dom, or will xml.dom import from xml.Ft.dom and provide some wrappers? * PyExpat's interface will be changed to be SAX-like, and we'll lobby Guido to add PyExpat to 1.6, along with Expat itself. It will be renamed, preferably to something with SAX in the name. (expat_sax? pysax? pyxml? whatever...) It'll be updated to support all the features in current versions of Expat; Jim Fulton has an updated version of PyExpat inside Zope that will probably be used. * xmllib.py will be left unmodified, though it'll be deprecated in favor of PyExpat. * When 1.6 begins supporting Unicode, we'll fork the development tree into two branches; the branch that works with 1.5 will be maintained, though probably not actively developed. This will leave the other branch free to use 1.6-specific features without worrying about backward compatibility. If I've forgotten something from the session, please let me know. -- A.M. Kuchling http://starship.python.net/crew/amk/ First things first, but not necessarily in that order. -- The Doctor, in John Flanagan and Andrew McCulloch's _Meglos_ From jack@oratrix.nl Fri Jan 28 21:59:45 2000 From: jack@oratrix.nl (Jack Jansen) Date: Fri, 28 Jan 2000 22:59:45 +0100 Subject: [XML-SIG] DevDay results In-Reply-To: Message by "Andrew M. Kuchling" , Fri, 28 Jan 2000 13:07:30 -0500 (EST) , <200001281807.NAA18796@amarok.cnri.reston.va.us> Message-ID: <20000128215951.5DC70189FE1@oratrix.oratrix.nl> Recently, "Andrew M. Kuchling" said: > * PyExpat's interface will be changed to be SAX-like, and we'll > lobby Guido to add PyExpat to 1.6, along with Expat itself. It > will be renamed, preferably to something with SAX in the name. > (expat_sax? pysax? pyxml? whatever...) It'll be updated to > support all the features in current versions of Expat; Jim > Fulton has an updated version of PyExpat inside Zope that will > probably be used. A suggestion to whoever is going to implement this: if we're going to include a private version of expat it's probably a good idea to change all the C global symbols. Expat is pretty popular, and I've been bitten a few times by global symbol name clashes where Python used one version of a library and a package used (or embedded in an application in which Python was also embedded) had incorporated a different version. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From paul@prescod.net Fri Jan 28 22:04:59 2000 From: paul@prescod.net (Paul Prescod) Date: Fri, 28 Jan 2000 16:04:59 -0600 Subject: [XML-SIG] Expat answers References: <200001281807.NAA18796@amarok.cnri.reston.va.us> Message-ID: <3892128B.4C72799F@prescod.net> Answers to questions that people have asked me about Expat: 1. Expat can parse DTDs if we want it to. If you compile DTD support in then you can turn on or off parameter entity parsing on a per-parser instance basis. """ XML_PARAM_ENTITY_PARSING_NEVER Don't parse parameter entities or the external subset XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE Parse parameter entites and the external subset unless standalone was set to "yes" in the XML declaration. XML_PARAM_ENTITY_PARSING_ALWAYS Always parse parameter entities and the external subset """ It doesn't validate but we could probably build that in Python 2. Expat can be compiled to output either UTF-8 or UTF-16 (which is for our purposes the same as UCS-2). It is theoretically possible to make a parser that understands Unicode enough to do proper well-formedness checking yet leaves characters in their native encoding but as far as I know, no such tool exists. I don't believe that sgmlop could ever be that tool, even when it is rewritten on top of Fredrick's fast Unicode regexp engine because that engine would still be UTF-16/UCS-2 specific. If you need to process shift-JIS information then you need to allow Expat to convert it to UTF-16 and then convert it back to shift-JIS. I don't think that there is any XML parser in the world that allows you to work in any arbitrary native encoding with no conversions. Maybe some day. Handling for non-Unicode character sets is simply not supported. The XML world decided specifically against this based on two arguments: * one cannot argue against Unicode on the basis of character encoding *efficiency* because we allow any encoding (even those compatible with the Unicode subset of shift-JIS etc.) to be used. * one cannot argue against Unicode on the basis that it does not allow "private" characters because it does: http://www.lists.ic.ac.uk/hypermail/xml-dev/xml-dev-Oct-1998/0366.html http://www.ascc.net/xml/en/utf-8/faq/faq-xsl.html 3. Expat outputs UTF-16 so it is ready for 20-bit Unicode, wherein we will find: "Plane 1 is going to hold ancient and invented scripts and musical symbols, while Plane 2 (U-0002xxxx) is reserved for additional Han ideographs, Plane 14 (U-000Exxxx) is going to start with some meta characters for language tagging and there are two entire bonus private-use planes." Python itself will not handle 20bit characters yet, so the situation with them will be just like the situation with 16 bit characters in Python/xmllib today (Python will think that they are two characters). Paul Prescod "Andrew M. Kuchling" wrote: > > The XML-SIG's developer's day session went well, and, unlike most DD > sessions, we actually achieved consensus on something. :) To summarize > the outcome: > > * The current PyDOM code will be dropped and replaced with 4DOM. > The precise details of how this will work are still to be > resolved; will the 4DOM code move into xml.dom, or will xml.dom > import from xml.Ft.dom and provide some wrappers? > > * PyExpat's interface will be changed to be SAX-like, and we'll > lobby Guido to add PyExpat to 1.6, along with Expat itself. It > will be renamed, preferably to something with SAX in the name. > (expat_sax? pysax? pyxml? whatever...) It'll be updated to > support all the features in current versions of Expat; Jim > Fulton has an updated version of PyExpat inside Zope that will > probably be used. > > * xmllib.py will be left unmodified, though it'll be deprecated in > favor of PyExpat. > > * When 1.6 begins supporting Unicode, we'll fork the development > tree into two branches; the branch that works with 1.5 will be > maintained, though probably not actively developed. This will > leave the other branch free to use 1.6-specific features without > worrying about backward compatibility. > > If I've forgotten something from the session, please let me know. > > -- > A.M. Kuchling http://starship.python.net/crew/amk/ > First things first, but not necessarily in that order. > -- The Doctor, in John Flanagan and Andrew McCulloch's _Meglos_ > > > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself The new revolutionaries believe the time has come for an aggressive move against our oppressors. We have established a solid beachhead on Friday. We now intend to fight vigorously for 'casual Thursdays.' -- who says America's revolutionary spirit is dead? From paul@prescod.net Mon Jan 31 08:46:59 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 31 Jan 2000 02:46:59 -0600 Subject: [XML-SIG] Expat strategy References: <20000128215951.5DC70189FE1@oratrix.oratrix.nl> Message-ID: <38954C03.D8C05F1@prescod.net> Jack Jansen wrote: > > A suggestion to whoever is going to implement this: if we're going to > include a private version of expat it's probably a good idea to change > all the C global symbols. Expat is pretty popular, and I've been > bitten a few times by global symbol name clashes where Python used one > version of a library and a package used (or embedded in an application > in which Python was also embedded) had incorporated a different version. 1. Exports There was some debate about what would happen if we statically linked pyexpat to xmlparse.dll. I am confident that we could, on most reaasonable platforms, export only the symbols Python needs to bootstrap and not all of Expat's static symbols. It is routine on Windows to statically link to a C library without worrying about conflicts with "open". Perl's expat.dll exports exactly two names: _boot_XML__Parser__Expat and _boot_XML__Parser__Expat. BTW, it's 112K. Anyhow, I count 49 exported symbols and all of them begin with the prefix XML_ so they can be safely renamed with 49 #defines if we decide it is necessary. That's ugly but safe and effective. 2. API We had talked of embedding SAX directly in PyExpat but in retrospect I don't think that there is any need to do so. We can layer SAX 1 and 2 on top of a transliterated Expat API without any loss of performance. This is true because of Expat's handler architecture. Even if you layer xmllib on top of sax 1 on top of another implementation of xmllib on top of another layer of sax 2 on top of expat, you get high performance if the "handler" is the same method at all levels. In other words, we can "wrap" expat at the Python level without doing any proxying of events. I'm only mentioning xmllib to emphasize the point that the number of layers doesn't matter because you don't lose performance in the layers. I'm not proposing that we layer xmllib on top of Expat. If you pass a method "foo" to xmllib as finish_starttag and it passes it to sax 2 as SAX2_StartElement and it passes it to SAX1 as SAX1_StartElement which passes it to Expat as XML_SetElementHandler, you still only get one Python function call per element in the document. So let's expose the raw Expat API and build SAX 1 and SAX 2 layers on top of it. 3. Error handling PyExpat is one of a very few modules in the library to use setjmp. It uses it for error handling and I'm not sure if there is any way around it so I won't advocate its removal unless someone can propose a better way. I'm not clear how to signal to expat that it should quit parsing other than through setjmp/longjmp. In general, though, error handling doesn't seem to work for me: >>> from xml.parsers.pyexpat import ParserCreate, ErrorString >>> p=ParserCreate() >>> p.foo="abc" Traceback (innermost last): File "", line 1, in ? SystemError: error return without exception set >>> p.StartElementHandler=junk >>> p.Parse( "" ) 0 >>> from xml.parsers.pyexpat import ParserCreate, ErrorString >>> p=ParserCreate() >>> def junk2(a,b): ... print a,b ... assert 0 ... >>> p.StartElementHandler=junk2 >>> print p.Parse( "", 1 ) abc [] def [] 1 Errors in the Python do not appropriately abort the process, despite the setjmp/longjmp. I am guessing that this is due to the fact that the call goes across Windows DLL boundaries. If that's really all it is then it will work better once we statically link to expat. I'd still rather not use setjmp/longjmp if there was a way around it... if (rv == NULL) { if (self->jmpbuf_valid) longjmp(self->jmpbuf, 1); My_WriteStderr("Exception in CharacterDataHandler()\n"); PyErr_Clear(); } One funny thing is the code after the longjmp. I guess maybe its a fallback for when the long-jump doesn't work. It doesn't seem to work on Windows, though. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself The new revolutionaries believe the time has come for an aggressive move against our oppressors. We have established a solid beachhead on Friday. We now intend to fight vigorously for 'casual Thursdays.' -- who says America's revolutionary spirit is dead? From larsga@garshol.priv.no Mon Jan 31 09:15:59 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 31 Jan 2000 10:15:59 +0100 Subject: [XML-SIG] DevDay results In-Reply-To: <200001281807.NAA18796@amarok.cnri.reston.va.us> References: <200001281807.NAA18796@amarok.cnri.reston.va.us> Message-ID: * Andrew M. Kuchling | | * PyExpat's interface will be changed to be SAX-like, and we'll | lobby Guido to add PyExpat to 1.6, along with Expat itself. It | will be renamed, preferably to something with SAX in the name. | (expat_sax? pysax? pyxml? whatever...) Saxpat? --Lars M. From paul@prescod.net Mon Jan 31 09:08:00 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 31 Jan 2000 03:08:00 -0600 Subject: [XML-SIG] DevDay results References: <200001281807.NAA18796@amarok.cnri.reston.va.us> Message-ID: <389550F0.5D1AE3F@prescod.net> I don't remember if we achieved clear concensus on whether to bundle the DOM or anything else into 1.6 along with Python. I think we were leaning towards bundling the DOM based on the argument that SAX and DOM were the "two biggies" in terms of API. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself The new revolutionaries believe the time has come for an aggressive move against our oppressors. We have established a solid beachhead on Friday. We now intend to fight vigorously for 'casual Thursdays.' -- who says America's revolutionary spirit is dead? From uche.ogbuji@fourthought.com Mon Jan 31 12:24:18 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Mon, 31 Jan 2000 05:24:18 -0700 Subject: [XML-SIG] DevDay results In-Reply-To: Your message of "Mon, 31 Jan 2000 03:08:00 CST." <389550F0.5D1AE3F@prescod.net> Message-ID: <200001311224.FAA02204@localhost.localdomain> > I don't remember if we achieved clear concensus on whether to bundle the > DOM or anything else into 1.6 along with Python. I think we were leaning > towards bundling the DOM based on the argument that SAX and DOM were the > "two biggies" in terms of API. Hmm. As much as I'd find it cool to have 4DOM bundled into Python, it is rather vast: 104 files, excluding the 84-file test-suite (which we haven't been publishing, but we shall now that the xml-sig has adopted it). I have the sense that a raised eyebrow would be the nicest we can expect from Guido. My vote would be to bundle SAX and Expat, which will do for many uses. If they need more sophisticated XML, they can download the XML package to get DOM, XPath, XSLT, etc. I think this is the way it is in Perl (in fact, I'm not even sure XML is bundled at all in Perl). Of course, Perl has CPAN, which makes finding modules much less travail, but that is a problem for Python to solve in other ways than bundling every package into the main distro. I understand the dist-utils SIG are close to a solution. -- Uche Ogbuji Fourthought, Inc., IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software-engineering, project-management, knowledge-management http://Fourthought.com http://OpenTechnology.org From fdrake@acm.org Mon Jan 31 15:53:04 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 31 Jan 2000 10:53:04 -0500 (EST) Subject: [XML-SIG] DevDay results In-Reply-To: <200001311224.FAA02204@localhost.localdomain> References: <389550F0.5D1AE3F@prescod.net> <200001311224.FAA02204@localhost.localdomain> Message-ID: <14485.45024.576090.154937@weyr.cnri.reston.va.us> uche.ogbuji@fourthought.com writes: > My vote would be to bundle SAX and Expat, which will do for many uses. If > they need more sophisticated XML, they can download the XML package to get > DOM, XPath, XSLT, etc. Uche, I agree; I think that's pretty much the consensus. It certainly seems reasonable. That allows existing xmllib users to convert easily to something that will be maintained in the standard library and users of the other APIs will still need to do what they have to do now (download something); it'll just be easier to have one XML package for the "advanced" users. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From ken@bitsko.slc.ut.us Mon Jan 31 20:03:56 2000 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 31 Jan 2000 14:03:56 -0600 Subject: [XML-SIG] Expat strategy In-Reply-To: Paul Prescod's message of Mon, 31 Jan 2000 02:46:59 -0600 References: <20000128215951.5DC70189FE1@oratrix.oratrix.nl> <38954C03.D8C05F1@prescod.net> Message-ID: Paul Prescod writes: > 2. API > > We had talked of embedding SAX directly in PyExpat but in retrospect I > don't think that there is any need to do so. We can layer SAX 1 and 2 on > top of a transliterated Expat API without any loss of performance. This > is true because of Expat's handler architecture. Even if you layer > xmllib on top of sax 1 on top of another implementation of xmllib on top > of another layer of sax 2 on top of expat, you get high performance if > the "handler" is the same method at all levels. In other words, we can > "wrap" expat at the Python level without doing any proxying of events. > So let's expose the raw Expat API and build SAX 1 and SAX 2 layers on > top of it. I agree that there's no reason to try to "block" the raw API from being used, but general usage documentation should focus on SAX. Otherwise new module authors might write to the raw interface and lose interoperability with other SAX modules. -- Ken From jack@oratrix.nl Mon Jan 31 21:21:04 2000 From: jack@oratrix.nl (Jack Jansen) Date: Mon, 31 Jan 2000 22:21:04 +0100 Subject: [XML-SIG] Expat strategy In-Reply-To: Message by Paul Prescod , Mon, 31 Jan 2000 02:46:59 -0600 , <38954C03.D8C05F1@prescod.net> Message-ID: <20000131212109.195E0D3AC2@oratrix.oratrix.nl> Recently, Paul Prescod said: > PyExpat is one of a very few modules in the library to use setjmp. It > uses it for error handling and I'm not sure if there is any way around > it so I won't advocate its removal unless someone can propose a better > way. I'm not clear how to signal to expat that it should quit parsing > other than through setjmp/longjmp. I think I put in the setjmp/longjmp, basically because I could see no other way to stop the parser, indeed. There's a couple of other libraries I embedded in Python that have the same problem (jpeg and pbm spring to mind). Aside from the cross-segment longjmps, which needed a bit of massaging on the Mac, so assume the same is could be true on Windows, there's one very big problem with setjmp/longjmp and that is that they're not thread-safe. However, in the case of the use in Pyexpat the Python programmer will have to do something pretty gross to invoke this bug: as the jmpbuf_valid flag is saved in the parser object and set/reset around the Parse() call you'll have to create one parser and call parser.Parse() on the one object simultaneously in two threads. Still, putting a mutex in the object is probably a good idea. > if (rv == NULL) { > if (self->jmpbuf_valid) > longjmp(self->jmpbuf, 1); > My_WriteStderr("Exception in CharacterDataHandler()\n"); > PyErr_Clear(); > } > > One funny thing is the code after the longjmp. I guess maybe its a > fallback for when the long-jump doesn't work. It doesn't seem to work on > Windows, though. I think the code is better replaced by an abort(): if myStartElementHandler and myEndElementHandler are called outside of a Parse() invocation there's something pretty basic about expat that I didn't understand when I wrote this code:-) But please note that all this is based on how Pyexpat looked when I maintained it, I haven't had the time to track developments since then... -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From jack@oratrix.nl Mon Jan 31 21:27:02 2000 From: jack@oratrix.nl (Jack Jansen) Date: Mon, 31 Jan 2000 22:27:02 +0100 Subject: [XML-SIG] Expat strategy In-Reply-To: Message by Jack Jansen , Mon, 31 Jan 2000 22:21:04 +0100 , <20000131212109.195E0D3AC2@oratrix.oratrix.nl> Message-ID: <20000131212707.626A4D3AC2@oratrix.oratrix.nl> Recently, Jack Jansen said: > I think the code is better replaced by an abort(): if > myStartElementHandler and myEndElementHandler are called outside of a > Parse() invocation there's something pretty basic about expat that I > didn't understand when I wrote this code:-) Whoops, there is a buglet in the code on second inspection. While I don't think it can be triggered normally, can someone add a line to clear the jmpbuf_valid flag when the setjmp returns in the longjmp()ed condition? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From paul@prescod.net Mon Jan 31 22:42:53 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 31 Jan 2000 14:42:53 -0800 Subject: [XML-SIG] Expat strategy References: <20000128215951.5DC70189FE1@oratrix.oratrix.nl> <38954C03.D8C05F1@prescod.net> Message-ID: <38960FED.8ED5867C@prescod.net> Ken MacLeod wrote: > > I agree that there's no reason to try to "block" the raw API from > being used, but general usage documentation should focus on SAX. > Otherwise new module authors might write to the raw interface and > lose interoperability with other SAX modules. Agree 100%. My documentation for the raw API would be: "read the source code or go read Clark Cooper's article on XML.com." :) -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "Ivory towers are no longer in order. We need ivory networks. Today, sitting quietly and thinking is the world´s greatest generator of wealth and prosperity." - http://www.bespoke.org/viridian/print.asp?t=140 From paul@prescod.net Mon Jan 31 22:51:51 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 31 Jan 2000 14:51:51 -0800 Subject: [XML-SIG] DevDay results References: <200001311224.FAA02204@localhost.localdomain> Message-ID: <38961207.EF6835F2@prescod.net> uche.ogbuji@fourthought.com wrote: > > My vote would be to bundle SAX and Expat, which will do for many uses. If > they need more sophisticated XML, they can download the XML package to get > DOM, XPath, XSLT, etc. My concern is that I don't consider the DOM "advanced". Hell, Visual Basic and Javascript programmers can't even spell SAX but they all use the DOM. If a new user asked me which to learn first, I'd say "the DOM" because any semi-competent newbie can find their way around a tree(?) to get the information they need whereas being smart enough to buffer the right information in the right order takes a little more algorithmic fore-though ('scuse me). Plus, I kind of feel that an XSL-ish tree iteration with triggers is going to be the dominant XML processing model of the 21st century. Nevertheless, I'll leave this for now. I don't want to jinx our chances of getting expat in. Maybe in 1.7 we could have some kind of minimal read-only DOM 1 with namespaces. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "Ivory towers are no longer in order. We need ivory networks. Today, sitting quietly and thinking is the world´s greatest generator of wealth and prosperity." - http://www.bespoke.org/viridian/print.asp?t=140