From martin@loewis.home.cs.tu-berlin.de Sat Jul 1 08:47:20 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 1 Jul 2000 09:47:20 +0200 Subject: [XML-SIG] Minidom and Unicode Message-ID: <200007010747.JAA00921@loewis.home.cs.tu-berlin.de> While trying the minidom parser from the current CVS, I found that repr apparently does not work for nodes: Python 2.0b1 (#29, Jun 30 2000, 10:48:11) [GCC 2.95.2 19991024 (release)] on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam Copyright 1995-2000 Corporation for National Research Initiatives (CNRI) >>> from xml.dom.minidom import parse >>> d=parse("/usr/src/python/Doc/tools/sgmlconv/conversion.xml") >>> d.childNodes [Traceback (most recent call last): File "", line 1, in ? TypeError: __repr__ returned non-string (type unicode) The problem here is that __repr__ is computed as def __repr__( self ): return "" and that self.tagName is u'conversion', so the resulting string is a unicode string. I'm not sure whose fault that is: either __repr__ should accept unicode strings, or minidom.Element.__repr__ should be changed to return a plain string, e.g. by converting tagname to UTF-8. In any case, I believe __repr__ should 'work' for these objects. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Sat Jul 1 11:18:47 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 1 Jul 2000 12:18:47 +0200 Subject: [XML-SIG] XML in Python 1.6 (PROPOSAL) Message-ID: <200007011018.MAA02306@loewis.home.cs.tu-berlin.de> > The name change simplifies a good many things, and we don't want to > break things *after* 1.6, so let's get it over with now. My great > fear is breaking existing code and what's worse, Sean's book. (I'll > have to buy a copy and see what would need to be preserved to keep > the examples running.) I just downloaded pyxie.py, and noticed from xml.parsers import pyexpat So it won't run on the current CVS tree. However, since you'll need the full XML distribution, anyway, I guess this is not a problem - that one will provide xml.parsers, right? Regards, Martin From mal@lemburg.com Sat Jul 1 13:02:55 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 01 Jul 2000 14:02:55 +0200 Subject: [XML-SIG] Re: [Python-Dev] Minidom and Unicode References: <200007010747.JAA00921@loewis.home.cs.tu-berlin.de> Message-ID: <395DDDEF.4ABB5BFC@lemburg.com> "Martin v. Loewis" wrote: > > While trying the minidom parser from the current CVS, I found that > repr apparently does not work for nodes: > > Python 2.0b1 (#29, Jun 30 2000, 10:48:11) [GCC 2.95.2 19991024 (release)] on linux2 > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > Copyright 1995-2000 Corporation for National Research Initiatives (CNRI) > >>> from xml.dom.minidom import parse > >>> d=parse("/usr/src/python/Doc/tools/sgmlconv/conversion.xml") > >>> d.childNodes > [Traceback (most recent call last): > File "", line 1, in ? > TypeError: __repr__ returned non-string (type unicode) > > The problem here is that __repr__ is computed as > > def __repr__( self ): > return "" > > and that self.tagName is u'conversion', so the resulting string is a > unicode string. > > I'm not sure whose fault that is: either __repr__ should accept > unicode strings, or minidom.Element.__repr__ should be changed to > return a plain string, e.g. by converting tagname to UTF-8. In any > case, I believe __repr__ should 'work' for these objects. Note that __repr__ has to return a string object (and IIRC this is checked in object.c or abstract.c). The correct way to get there is to simply return str(...) or to have a switch on the type of self.tagName and then call .encode(). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From martin@loewis.home.cs.tu-berlin.de Sat Jul 1 13:22:39 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 1 Jul 2000 14:22:39 +0200 Subject: [XML-SIG] Re: [Python-Dev] Minidom and Unicode In-Reply-To: <395DDDEF.4ABB5BFC@lemburg.com> (mal@lemburg.com) References: <200007010747.JAA00921@loewis.home.cs.tu-berlin.de> <395DDDEF.4ABB5BFC@lemburg.com> Message-ID: <200007011222.OAA11088@loewis.home.cs.tu-berlin.de> > Note that __repr__ has to return a string object (and IIRC > this is checked in object.c or abstract.c). The correct way > to get there is to simply return str(...) or to have a > switch on the type of self.tagName and then call .encode(). Ok. I believe tagName will be always a Unicode object (as mandated by the DOM API), so I propose patch 100706 (http://sourceforge.net/patch/?func=detailpatch&patch_id=100706&group_id=5470) Regards, Martin From paul@prescod.net Sat Jul 1 14:09:20 2000 From: paul@prescod.net (Paul Prescod) Date: Sat, 01 Jul 2000 08:09:20 -0500 Subject: [XML-SIG] Re: [Python-Dev] Minidom and Unicode References: <200007010747.JAA00921@loewis.home.cs.tu-berlin.de> <395DDDEF.4ABB5BFC@lemburg.com> Message-ID: <395DED80.A30FB417@prescod.net> "M.-A. Lemburg" wrote: > > Note that __repr__ has to return a string object (and IIRC > this is checked in object.c or abstract.c). The correct way > to get there is to simply return str(...) or to have a > switch on the type of self.tagName and then call .encode(). > ... I prefer the former solution and unless someone screams I will check that in in a few hours. Why can't repr have a special case that converts Unicode strings to "Python strings" automatically. This case is going to byte other people. > Ok. I believe tagName will be always a Unicode object (as mandated by > the DOM API), so I propose patch 100706 > (http://sourceforge.net/patch/?func=detailpatch&patch_id=100706&group_id=5470) I would like Unicode usage to be a userland option for reasons of performance and backwards compatibility. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From paul@prescod.net Sat Jul 1 14:12:28 2000 From: paul@prescod.net (Paul Prescod) Date: Sat, 01 Jul 2000 08:12:28 -0500 Subject: [XML-SIG] Re: [Python-Dev] Minidom and Unicode References: <200007010747.JAA00921@loewis.home.cs.tu-berlin.de> <395DDDEF.4ABB5BFC@lemburg.com> <010601bfe35b$46727ae0$f2a6b5d4@hagrid> Message-ID: <395DEE3C.E7BAD4C3@prescod.net> Fredrik Lundh wrote: > > ... > > assuming that the goal is to get rid of this restriction in future > versions (a string is a string is a string), how about special- > casing this in PyObject_Repr: This is my prefered solution. +1 from me. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From paul@prescod.net Sat Jul 1 14:28:43 2000 From: paul@prescod.net (Paul Prescod) Date: Sat, 01 Jul 2000 08:28:43 -0500 Subject: [XML-SIG] XML in Python 1.6 (PROPOSAL) References: <200007011018.MAA02306@loewis.home.cs.tu-berlin.de> Message-ID: <395DF20B.326FD3BC@prescod.net> "Martin v. Loewis" wrote: > > I just downloaded pyxie.py, and noticed > > from xml.parsers import pyexpat > > So it won't run on the current CVS tree. However, since you'll need > the full XML distribution, anyway, I guess this is not a problem - > that one will provide xml.parsers, right? > ... It's not quite that simple. Python's package model is not like Java's in that only Python 1.6 or the XML distribution can really "own" the name "xml". It's logical for Python itself to own it but this breaks the distribution. So the distribution must be renamed pyxml ... which will break a lot, including Pyxie. Suggestions for workarounds are solicited. The best solution for Pyxie is to move it into the distribution so it can be maintained with everything else. An early PyExpat change probably broke it earlier. I didn't know (remember?) that it was using pyexpat when I made the change. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From Fredrik Lundh" <395DDDEF.4ABB5BFC@lemburg.com> <010601bfe35b$46727ae0$f2a6b5d4@hagrid> <395DEE3C.E7BAD4C3@prescod.net> Message-ID: <022701bfe376$b0325d40$f2a6b5d4@hagrid> paul wrote: > > assuming that the goal is to get rid of this restriction in future > > versions (a string is a string is a string), how about special- > > casing this in PyObject_Repr: >=20 > This is my prefered solution. +1 from me. the repository has been updated. no need to change minidom. cheers /F From sales@lookelu.com Sun Jul 2 05:25:04 2000 From: sales@lookelu.com (The Western Web) Date: Sun, 2 Jul 2000 04:25:04 Subject: [XML-SIG] The Western Web (Advertisement) Message-ID: <20000702112412.D95811CEB9@dinsdale.python.org> The Western Web has just finished our new classified ad section. We have added new sections in the classifieds, hay/feed/shavings, livestock, camelids, cattle, deer and elk, poultry, rabbits, sheep, livestock equipment, swine, donkeys, dogs, mules and model horses. Ad Photos, Video and Audio for FREE! http://www.thewesternweb.com The new classified section is automated now and your ads will be posted immediatly. You can also add Multi-Media files (photos, sound and video) on line. This is a free service to you so use it at your will. http://www.westernwebclassified.com We have also finished the Western Web Search Engine, which is solely optimized for the western way of life. Please stop by the search engine add your site. http://www.searchthewesternweb.com Our message board is also now up and running so please use it . http://www.westernmessageboard.com/cgi-bin/Ultimate.cgi Thank you, http://www.thewesternweb.com If you received this message in error please reply to this emal address with the word "remove " in the subject line. From martin@loewis.home.cs.tu-berlin.de Mon Jul 3 07:58:14 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 3 Jul 2000 08:58:14 +0200 Subject: [XML-SIG] Changes Message-ID: <200007030658.IAA01173@loewis.home.cs.tu-berlin.de> Hi Andrew, I just noticed an error in your Python 2 changes list: the PyExpat module is really named pyexpat. I don't know whether the name is cast in stone already, but I'd really prefer it to be called "expat"; Python modules don't need to indicate their Python-ness with a py prefix. For compatibility, the following pyexpat.py could be provided from expat import * The only problem with that would be that assigning to pyexpat.native_encoding would not have the desired effect (and yes, I do want to have __setattr__ for modules :) Regards, Martin From martin@loewis.home.cs.tu-berlin.de Mon Jul 3 08:04:19 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 3 Jul 2000 09:04:19 +0200 Subject: [XML-SIG] XML in Python 1.6 (PROPOSAL) Message-ID: <200007030704.JAA01291@loewis.home.cs.tu-berlin.de> > Suggestions for workarounds are solicited. See my patch at http://sourceforge.net/patch/?func=detailpatch&patch_id=100712&group_id=5470; it would allow PyXML to keep the xml package name. Regards, Martin From fdrake@beopen.com Mon Jul 3 08:13:46 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Mon, 3 Jul 2000 03:13:46 -0400 (EDT) Subject: [XML-SIG] Changes In-Reply-To: <200007030658.IAA01173@loewis.home.cs.tu-berlin.de> References: <200007030658.IAA01173@loewis.home.cs.tu-berlin.de> Message-ID: <14688.15658.751966.282640@cj42289-a.reston1.va.home.com> Martin v. Loewis writes: > I just noticed an error in your Python 2 changes list: the PyExpat > module is really named pyexpat. I don't know whether the name is cast > in stone already, but I'd really prefer it to be called "expat"; > Python modules don't need to indicate their Python-ness with a py > prefix. For compatibility, the following pyexpat.py could be provided I suspect the proper name is _expat, and it should publically live at xml.parser.expat (easily enough arrange from xml.parser.__init__). -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From fdrake@beopen.com Mon Jul 3 08:28:36 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Mon, 3 Jul 2000 03:28:36 -0400 (EDT) Subject: [XML-SIG] XML in Python 1.6 (PROPOSAL) In-Reply-To: <200007030704.JAA01291@loewis.home.cs.tu-berlin.de> References: <200007030704.JAA01291@loewis.home.cs.tu-berlin.de> Message-ID: <14688.16548.244113.671946@cj42289-a.reston1.va.home.com> Martin v. Loewis writes: > See my patch at > http://sourceforge.net/patch/?func=detailpatch&patch_id=100712&group_id=5470; > it would allow PyXML to keep the xml package name. I don't see how; patch 100712 (which I've checked in) affects the string module implementation. Please explain. -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From jerome@IDEALX.com Mon Jul 3 08:42:51 2000 From: jerome@IDEALX.com (Jérôme Marant) Date: 03 Jul 2000 09:42:51 +0200 Subject: [XML-SIG] PyXML in Python 1.6: Drawbacks Message-ID: <643dlrvev8.fsf@amboise.ird.idealx.com> Hi, Integrating as many modules as possible in Python 1.6 is just stupid. Every module has to be considered one after one. PyXML is the kind of module that should stay independant, for the following reasons: - XML and all related standards rapidly evolve - PyXML is made of third party software that have their own life cycle - some pieces of software do not fully implement those standard, and frequently improve. - new software could be integrated (XSL implementations, for instance) So we must NOT make PyXML releases depend on Python releases: We cannot wait for a new Python release to get a new PyXML version. Please forget about it ! --=20 J=E9r=F4me Marant ----------------------------------------------------------- | IDEALX - Open Source Engineering / Ing=E9nierie Open Source | | http://IDEALX.com | ----------------------------------------------------------- From fdrake@beopen.com Mon Jul 3 09:29:01 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Mon, 3 Jul 2000 04:29:01 -0400 (EDT) Subject: [XML-SIG] XML in Python 1.6 (PROPOSAL) In-Reply-To: <14688.16548.244113.671946@cj42289-a.reston1.va.home.com> References: <200007030704.JAA01291@loewis.home.cs.tu-berlin.de> <14688.16548.244113.671946@cj42289-a.reston1.va.home.com> Message-ID: <14688.20173.220834.333419@cj42289-a.reston1.va.home.com> Martin v. Loewis writes: > See my patch at > http://sourceforge.net/patch/?func=detailpatch&patch_id=100712&group_id=5470; > it would allow PyXML to keep the xml package name. Fred L. Drake, Jr. writes: > I don't see how; patch 100712 (which I've checked in) affects the > string module implementation. Please explain. Never mind; I found it. You were actually referring to patch #100705: http://sourceforge.net/patch/?func=detailpatch&patch_id=100705&group_id=5470 -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From gstein@lyra.org Mon Jul 3 10:41:53 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 3 Jul 2000 02:41:53 -0700 Subject: [XML-SIG] Changes In-Reply-To: <200007030658.IAA01173@loewis.home.cs.tu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Mon, Jul 03, 2000 at 08:58:14AM +0200 References: <200007030658.IAA01173@loewis.home.cs.tu-berlin.de> Message-ID: <20000703024153.U29590@lyra.org> +1 on the rename. On Mon, Jul 03, 2000 at 08:58:14AM +0200, Martin v. Loewis wrote: > Hi Andrew, > > I just noticed an error in your Python 2 changes list: the PyExpat > module is really named pyexpat. I don't know whether the name is cast > in stone already, but I'd really prefer it to be called "expat"; > Python modules don't need to indicate their Python-ness with a py > prefix. For compatibility, the following pyexpat.py could be provided > > from expat import * > > The only problem with that would be that assigning to > pyexpat.native_encoding would not have the desired effect (and yes, I > do want to have __setattr__ for modules :) > > Regards, > Martin > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Greg Stein, http://www.lyra.org/ From martin@loewis.home.cs.tu-berlin.de Mon Jul 3 13:53:11 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 3 Jul 2000 14:53:11 +0200 Subject: [XML-SIG] XML in Python 1.6 (PROPOSAL) In-Reply-To: <14688.20173.220834.333419@cj42289-a.reston1.va.home.com> (fdrake@beopen.com) References: <200007030704.JAA01291@loewis.home.cs.tu-berlin.de> <14688.16548.244113.671946@cj42289-a.reston1.va.home.com> <14688.20173.220834.333419@cj42289-a.reston1.va.home.com> Message-ID: <200007031253.OAA00749@loewis.home.cs.tu-berlin.de> > Never mind; I found it. You were actually referring to patch > #100705: Exactly. Sorry for the confusion. Martin From paul@prescod.net Mon Jul 3 16:52:51 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 03 Jul 2000 10:52:51 -0500 Subject: [XML-SIG] Changes References: <200007030658.IAA01173@loewis.home.cs.tu-berlin.de> Message-ID: <3960B6D3.CBBAD17A@prescod.net> "Martin v. Loewis" wrote: > > ... > > I just noticed an error in your Python 2 changes list: the PyExpat > module is really named pyexpat. I don't know whether the name is cast > in stone already, but I'd really prefer it to be called "expat"; > Python modules don't need to indicate their Python-ness with a py > prefix. For compatibility, the following pyexpat.py could be provided I think that the current name makes more clear to maintainers which files (pyexpat.*) are maintained by us and which are maintained by James Clark. Further, we have to be careful of name clashes in terms of expat.dll and pyexpat.dll. That isn't a problem now because the expat DLL is not named expat.dll but I can easily imagine that one of Mozilla, TCL, Perl or James Clark might also want the name expat.dll. Adding a "py" doesn't hurt anyone and most people will be using expat through the SAX API anyhow. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From paul@prescod.net Mon Jul 3 16:55:26 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 03 Jul 2000 10:55:26 -0500 Subject: [XML-SIG] PyXML in Python 1.6: Drawbacks References: <643dlrvev8.fsf@amboise.ird.idealx.com> Message-ID: <3960B76E.D66B54E7@prescod.net> "Jérôme Marant" wrote: > > Hi, > > Integrating as many modules as possible in Python 1.6 is just stupid. Agreed. > Every module has to be considered one after one. Agreed. > PyXML is the kind of module that should stay independant, for the > following reasons: PyXML is not a module. PyXML is a variety of modules in a very large package. We have selected some of the most popular and static of those modules, based primarily on industry standards, and integrated them into Python 1.6. > - XML and all related standards rapidly evolve No, XML itself has not changed in three years. That's why we have had XML support in for at least a year and a half...maybe longer. We are just improving that support. Some of XML's related standards do change rapidly. We are not providing support for most of those. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From fdrake@beopen.com Mon Jul 3 17:07:58 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Mon, 3 Jul 2000 12:07:58 -0400 (EDT) Subject: [XML-SIG] Changes In-Reply-To: <3960B6D3.CBBAD17A@prescod.net> References: <200007030658.IAA01173@loewis.home.cs.tu-berlin.de> <3960B6D3.CBBAD17A@prescod.net> Message-ID: <14688.47710.475404.225817@cj42289-a.reston1.va.home.com> Paul Prescod writes: > I think that the current name makes more clear to maintainers which > files (pyexpat.*) are maintained by us and which are maintained by James > Clark. Further, we have to be careful of name clashes in terms of > expat.dll and pyexpat.dll. That isn't a problem now because the expat > DLL is not named expat.dll but I can easily imagine that one of Mozilla, > TCL, Perl or James Clark might also want the name expat.dll. I hadn't thought about the DLL issue for Windows; that is a reasonable problem we should try to avoid proactively. How about this: rename pyexpat to _pyexpat and not use that as the interface, but add xml/parser/expat.py which imports the appropriate names from _pyexpat. This keeps the namespace fairly clean and ensures we don't have problems with the C extension for static builds (which doesn't support putting C extensions in packages). Objections? Better approaches? -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From jerome@IDEALX.com Mon Jul 3 17:10:57 2000 From: jerome@IDEALX.com (Jérôme Marant) Date: 03 Jul 2000 18:10:57 +0200 Subject: [XML-SIG] PyXML in Python 1.6: Drawbacks In-Reply-To: Paul Prescod's message of "Mon, 03 Jul 2000 10:55:26 -0500" References: <643dlrvev8.fsf@amboise.ird.idealx.com> <3960B76E.D66B54E7@prescod.net> Message-ID: <64r99b2nzi.fsf@amboise.ird.idealx.com> Paul Prescod writes: =20 > No, XML itself has not changed in three years. That's why we have had > XML support in for at least a year and a half...maybe longer. We are > just improving that support. Some of XML's related standards do change > rapidly. We are not providing support for most of those. DOM is changing (DOM 2.0 and 3.0), XML Schema is coming and there is a plenty of standards that have no implementation or unfinished ones (XForms, XPath, XSL, etc) in Python. Moreover, tools like xmlproc are still not finished. --=20 J=E9r=F4me Marant ----------------------------------------------------------- | IDEALX - Open Source Engineering / Ing=E9nierie Open Source | | http://IDEALX.com | ----------------------------------------------------------- From paul@prescod.net Mon Jul 3 17:27:29 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 03 Jul 2000 11:27:29 -0500 Subject: [XML-SIG] PyXML in Python 1.6: Drawbacks References: <643dlrvev8.fsf@amboise.ird.idealx.com> <3960B76E.D66B54E7@prescod.net> <64r99b2nzi.fsf@amboise.ird.idealx.com> Message-ID: <3960BEF1.2DDEEDED@prescod.net> "Jérôme Marant" wrote: > > DOM is changing (DOM 2.0 and 3.0), That's right. That's why we have a DOM 1 implementation in the distribution and an "evolving DOM" implementation in the PyXML package. > XML Schema is coming and there is > a plenty of standards that have no implementation or unfinished ones > (XForms, XPath, XSL, etc) in Python. So? I don't understand your point? If these things become available, they will be put into the pyxml package. It isn't going away. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From jerome@IDEALX.com Mon Jul 3 17:47:11 2000 From: jerome@IDEALX.com (Jérôme Marant) Date: 03 Jul 2000 18:47:11 +0200 Subject: [XML-SIG] PyXML in Python 1.6: Drawbacks In-Reply-To: Paul Prescod's message of "Mon, 03 Jul 2000 11:27:29 -0500" References: <643dlrvev8.fsf@amboise.ird.idealx.com> <3960B76E.D66B54E7@prescod.net> <64r99b2nzi.fsf@amboise.ird.idealx.com> <3960BEF1.2DDEEDED@prescod.net> Message-ID: <64aefz2mb4.fsf@amboise.ird.idealx.com> Paul Prescod writes: =20 > That's right. That's why we have a DOM 1 implementation in the > distribution and an "evolving DOM" implementation in the PyXML packag= e. I think I understand now. You'll include every XML stuff that hasn't changed for ages in a Python release and you'll keep pyxml for evolving stuff, right ? =20 > So? I don't understand your point? If these things become available, > they will be put into the pyxml package. It isn't going away. I thought that PyXML would disappear after its integration into the next Python release. That's why I was a bit suprised. --=20 J=E9r=F4me Marant ----------------------------------------------------------- | IDEALX - Open Source Engineering / Ing=E9nierie Open Source | | http://IDEALX.com | ----------------------------------------------------------- From paul@prescod.net Mon Jul 3 19:21:13 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 03 Jul 2000 13:21:13 -0500 Subject: [XML-SIG] PyXML in Python 1.6: Drawbacks References: <643dlrvev8.fsf@amboise.ird.idealx.com> <3960B76E.D66B54E7@prescod.net> <64r99b2nzi.fsf@amboise.ird.idealx.com> <3960BEF1.2DDEEDED@prescod.net> <64aefz2mb4.fsf@amboise.ird.idealx.com> Message-ID: <3960D999.FE553B31@prescod.net> "Jérôme Marant" wrote: > > Paul Prescod writes: > > > That's right. That's why we have a DOM 1 implementation in the > > distribution and an "evolving DOM" implementation in the PyXML package. > > I think I understand now. You'll include every XML stuff that hasn't > changed for ages in a Python release and you'll keep pyxml for > evolving stuff, right ? Right! > > So? I don't understand your point? If these things become available, > > they will be put into the pyxml package. It isn't going away. > > I thought that PyXML would disappear after its integration into > the next Python release. That's why I was a bit suprised. Some had proposed that for a later release (e.g. Python 1.7) but I agree with you that the pyxml distribution is a useful home for things that are evolving. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From scurvey@nwaf.com Mon Jul 3 20:25:53 2000 From: scurvey@nwaf.com (Chris Curvey) Date: Mon, 03 Jul 2000 15:25:53 -0400 Subject: [XML-SIG] help installing PyXML on Linux Message-ID: <3960E8C1.C773B2C3@nwaf.com> Hi folks, I'm trying to install the xml modules for Python, and I'm running into some issues. Any help would be appreciated. After unpacking the source stuff, I run make -f Makefile.pre.in Makefile VERSION=1.5 installdir=/usr/local from the "extensions" directory. (This seems to work OK) Then I run "make" which compiles the modules. Fine. Then I run "make install" and get "no rule to make the target install". Hmm. System stuff: running Red Hat 6.1, Python 1.5.2 and XML library 0.5.4 Thanks in advance! -Chris From paul@prescod.net Mon Jul 3 20:41:34 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 03 Jul 2000 14:41:34 -0500 Subject: [XML-SIG] Pyx Message-ID: <3960EC6E.5B07D888@prescod.net> I'm catching up on some mail I missed: http://www.python.org/pipermail/xml-sig/2000-June/004505.html > Nope. Think pipes. Think os.popen(). [1] You have two choices, > use pyexat directly and write the external file. This avoids > the sub-process but costs more disk space. Alternatively, > live with the sub-process call (hardly an issue these days) > and use popen() to create a piped connection to the created > PYX. This is a very disk efficient way of doing things. Using pipes and two parsers is only efficient compared to the alternative of saving the Pyx to disk. It is not efficient compared to just using Expat in a single pass. Pyxie uses two options and neither is particularly efficient. One is space inefficient and one is time inefficient (and also doesn't work well on some versions of Windows, in my experience). Fredrick has invented a way that is portable, and time/space efficient. It would be trivial to incorporate it into Pyxie and thus relieve Pyxie of its dependence on Pyx. > >Why would you generate PYX rather than XML? If we start moving PYX > >between XML-aware programs, it becomes an XML competitor. > > There is obviously a fundamental misconncect here. I don't > know what else I can do to explain this to you! > > PYX is *line oriented*, I pass it through line oriented tools > using the Unix pipe philsophy. I cannot do that with > XML. #1. You yourself said that it is possible for an XML subset to be its own line normalized format. #2. Let's ignore that and pretend it is not possible. It is entirely possible to use XML as the interchange format between databases and applications and so forth and just use Pyx when it is necessary to make the information available as line-oriented information. Translation to Pyx can be just the end result of a chain of filters. Therefore you do not need ODBC->PYX and HTML->PYX and ...->PYX. You need *->XML and XML-> Pyx (which you already have). If you start making Pyx "drivers" for every data source in the world then you are duplicating all of the work that has already been done for XML! > Why are you so hostile to it? I'm not hostile to Pyx. I am hostile to what I see as a very fuzzy description of what Pyx does and does not do. #1. You claim that Pyxie is a Pyx procesing library but I could implement the entire Pyxie API without PYX. So let's separate Pyxie and Pyx so that we can see what are good and bad about each. The first step is to recognize that Pyx and Pyxie they are not at all dependent on each other -- except according to your current implementation scheme. So Pyxie's API is great and innovative but that derives not one whit from Pyx. #2. You claim that we should make pyx generators for ODBC and various apps. I claim that the combination of XML and XML->Pyx gives us defacto such generators. Therefore we should push for *xml generators* first, because they have a much broader utility than Pyx generators. #3. You and I agree that an XML subset can serve as its own normalized-line syntax. If software generated this subset then it would be automatically compatible with both line-oriented software and with XML-compatible software. I cannot see an advantage of Pyx over such a subset. I've worked with ESIS for years and it never bothered me. It was a useful hack for programming languages (most) that didn't have built-in SGML parsers. It remains that for XML and languages like SED, AWK and GREP without built-in XML parsers. Fantastic! What bothers me about it is not that it exists or is used, but merely that its importance is exaggerated. One one page you say this: http://www.digitome.com/Download.html "The entire Pyxie library revolves around a very simple, line-oriented notation for the information emitted by an XML parser. This notation is known as PYX. The first character of each line is used to say what type of information the line contains:" And yet, on the example pages, I see nothing that requires any knowledge of the Pyx notation AT ALL: http://www.digitome.com/Examples.html Why would a Pyxie programmer care about the syntax you describe on the overview page? In fact, I see the great virtues of Pyx for awk, sed and grep programmers, but if I am only interested in Python, why would I care whether the input format was line oriented or not? -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From paul@prescod.net Mon Jul 3 21:33:24 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 03 Jul 2000 15:33:24 -0500 Subject: [XML-SIG] SAX Namespaces Message-ID: <3960F894.2BBA026B@prescod.net> [still catching up] > ... > I think this is very wrong, and that we shouldn't do it. I've looked > through the SAX code in the Python CVS tree and this is the only thing > I'm really unhappy about. > > Let me enumerate the reasons: > > - the rawname is not really part of the element name, so it does not > feel right to have it in the tuple like that The rawname *is* the element type name. > - it makes name comparison (the most common operation on names!) > really ugly: > > if name[0] == xslt_ns and \ > name[1] == 'template': > # do something useful #1. I don't fundamentally believe that people will be doing this at the SAX level. SAX is pretty painful for this sort of thing and I would like to see people move to something with a stack and tree mode (whether pulldom, Pyxie, whatever) soon. #2. The clean way to do what you describe is: def startElement( self, (URI, name, rawname), attrs ): if (URI, name)==xslt_ns # do something useful or: def startElement( self, name, attrs ): (URI, localname, rawname)=name if (URI, localname)==xslt_ns # do something useful > - dispatching also becomes ugly: > > self._element_handlers[(name[0], name[1])] () Same deal. > - it makes it harder to make people understand that the prefix used > in the XML document is not part of the element name Namespaces are complicated and nasty. The old SAX API did not change that. > - I don't see anything that is made easier or faster by this > representation The DOM, XPath and XSLT-based APIs will need the URI, localname, rawname triple (or at least URI/rawname). It would be nice to pass the same tuple from Pyexpat->SAX->... with no rebundling. In fact, I hope to optimize the SAX layer away altogether. (by making PyExpat a SAX parser and minidom et. al. SAX consumers). > - we already discussed this and decided for something else; I think > we should not change our minds without good reason Agreed. We can go back to the way things were easily at this point. My reason for changing is in getting a clearer picture of the needs of higher layers. Requiring all three is not the exception: it is the rule. > I don't fully understand this argument. What benefits do you see over > the (uri, localname), qname representation? Also, don't you think any > speed gains will be lost here in the cost from all the method calls > and object instantiations? It is precisely method calls and objects instantiations that I'm trying to avoid. > However, after benchmarking the speed of a real application > (Shakespeare to HTML converter) which created a new instance for each > call against one that recycled the old instance (but updated the > internal attributes) I gave this up. I think the speed increase was > from 99 to 96 seconds for converting all of Shakespeare's plays. First, would it have complicated your life so much to say: attrs=AttributesImpl( attrs ) Second, it isn't fair to compare building new attribute lists versus using existing ones when I am talking about not building them at all. Consider applications that do not deal with domain-specific attribute names at all: canonalizing, normalizing, pyx-generation, DOM generation and so forth. For these apps, the AttributeList API is expensive both because of the object creations and because it isn't designed for sequential access (versus name-based access). Third, the API has no facilities for looking things up based on (URI,*) or (*,URI) (needed for XPath) That means that XPath-type applications may need to copy the data out into another data structure. Notice also that .items() doesn't return the rawnames needed for xpath and so forth so this copying is actually pretty expensive. Overall, I think it tries to be "friendly" at too low a level in the application stack. It can't know what the app programmer wants "down there". Of course I think this way because I don't think that "application" programmers should really work at the SAX layer because it is inconvenient on many counts. It should be the most efficient possible API for parsers and that's it. I am also trying to make SAX as efficient as Expat's native API so that we do not have to support two APIs forever. I don't know if you read my description of the different sorts of APIs but I categorized them into four quadrants along two axes: tree building versus non and object-building versus primitive-object using. If we let that guide us we come up with DOM, PullDOM, QPXML and SAX. SAX, then, shouldn't be wasting time building objects. Leave that to EasySAX, PullDOM, EventDOM, or whatever else. SAX has always had multiple personalities in this way. It isn't really simple. It isn't as efficient as it could be. It certainly isn't as usable as it could be. I'm partial to more clear-cut, single-purpose designs. > I'm sorry to write an email that may seem so harsh, but I am really > convinced that what you are proposing is really really wrong and that > it is very important that we don't do this. The end result will, I > think, be to make SAX a real pain to use, and I think the speed > benefit is likely to be less than 5% for a reasonable application. Let's say the speedup is "only" 5%, but 97% of all SAX-using programs never use the SAX API directly anyhow? Then that speedup comes "for free" from the point of view of those programmers. It's like a tweak in the bowels of Python that makes Guido's job a little bit harder but makes Python run faster for everyone. It's like rewriting a Python module in C for the sake of all of the apps built on top of it. That's where I think we are going. Heck, I woudn't be suprised if we see Python SAX producers and consumers both written in C soon. I think of it as a device driver API. > Hoping this is not too late... No, it isn't too late and I consider SAX your domain. I just did what seemed to me to be the best thing for performance when it comes to names and I haven't touched AttributeList yet. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From sean@digitome.com Mon Jul 3 21:37:03 2000 From: sean@digitome.com (Sean McGrath) Date: Mon, 03 Jul 2000 21:37:03 +0100 Subject: [XML-SIG] Re: Pyx In-Reply-To: <3960EC6E.5B07D888@prescod.net> Message-ID: <3.0.6.32.20000703213703.009e86f0@www.digitome.com> [Paul Prescod] (On Fredericks method) >It would be trivial to incorporate it into Pyxie and thus relieve Pyxie >of its dependence on Pyx. Although doubtless possible, I see no great benefit in doing this. The "overhead" of a fork in order to read from a pipe does not bother me. In the overall scheme of things, the overhead is not significant. Anyway, pipeline processing is awash with forked processes. C'est la vie. >#1. You yourself said that it is possible for an XML subset to be its >own line normalized format. > Yes, and I also said I pushed for line-orientation when XML 1.0 was cooking. But now XML 1.0 is cooked so the question is now moot. If we subset XML to produce a similifed line oriented versions, the powers that be will be very vexed because it will be seen as fragmenting XML, confusing developers etc. etc. Witness the SML experience. Syntactically, PYX is clearly not XML and is in no way a threat to XML 1.0 and does not confuse developers. How would you propose that an XML document conforming to your XML-subset would signal this fact to software? >#2. Let's ignore that and pretend it is not possible. And it isn't, for political but not technical reasons. > It is entirely >possible to use XML as the interchange format between databases and >applications and so forth and just use Pyx when it is necessary to make >the information available as line-oriented information. You mean "parse" it into PYX. But you are complaining about parsing overhead and here you are introducing it! >Translation to >Pyx can be just the end result of a chain of filters. Therefore you do >not need ODBC->PYX and HTML->PYX and ...->PYX. You need *->XML and XML-> >Pyx (which you already have). If you start making Pyx "drivers" for >every data source in the world then you are duplicating all of the work >that has already been done for XML! This paragraph pre-supposes a line oriented subset of XML which I believe to be politically if not technically infeasible at this point. > >> Why are you so hostile to it? > >I'm not hostile to Pyx. I am hostile to what I see as a very fuzzy >description of what Pyx does and does not do. > With respect, you are using your not insignificant intelligence to read too much into it! PYX is actually as simple as it looks. Python, Perl, PHP, PL/1 and Java programmers have all e-mailed me saying they understand it, use it, and are thankful for its existence. This makes me happy. Why can't you be happy too? >#1. You claim that Pyxie is a Pyx procesing library but I could >implement the entire Pyxie API without PYX. So let's separate Pyxie and >Pyx so that we can see what are good and bad about each. The first step >is to recognize that Pyx and Pyxie they are not at all dependent on each >other -- except according to your current implementation scheme. So >Pyxie's API is great and innovative but that derives not one whit from >Pyx. Yes, I accept that it is feasible to separate PYX from Pyxie but they are inextricably linked in my head. Maybe this is a bad thing... > >#2. You claim that we should make pyx generators for ODBC and various >apps. I claim that the combination of XML and XML->Pyx gives us defacto >such generators. Therefore we should push for *xml generators* first, >because they have a much broader utility than Pyx generators. > I would look at it differently. A PYX generator is a trivial piece of work even for those who know nothing about XML. PYX is trivially and automatically convertable to XML. PYX is perfect as a format to appear on "standard output" and "standard input" because you can flow it line by line through a chain until at the end you pipe through PYX2XML to get XML on disk. The filters in the chain do not need to know one iota about XML or XML parsing. At the same time, they can be as XML savvy as the like. Pyxie for example, makes a great base for a PYX processing filter. >#3. You and I agree that an XML subset can serve as its own >normalized-line syntax. Could have. Never happened. XML 1.0 is a done deal. Line orientated XML is a pipe dream. We have to live with it and move on. >If software generated this subset then it would >be automatically compatible with both line-oriented software and with >XML-compatible software. I cannot see an advantage of Pyx over such a >subset. > Well, PYX exists and the subset does not. PYX does not draw any wrath from the W3C whereas an XML subset would. Surely those are advantages! >I've worked with ESIS for years and it never bothered me. It was a >useful hack for programming languages (most) that didn't have built-in >SGML parsers. It remains that for XML and languages like SED, AWK and >GREP without built-in XML parsers. Fantastic! > >What bothers me about it is not that it exists or is used, but merely >that its importance is exaggerated. > >One one page you say this: http://www.digitome.com/Download.html > >"The entire Pyxie library revolves around a very simple, line-oriented >notation for the information emitted by an XML parser. This notation is >known as PYX. The first character of each line is used to say what type >of information the line contains:" > >And yet, on the example pages, I see nothing that requires any knowledge >of the Pyx notation AT ALL: http://www.digitome.com/Examples.html > >Why would a Pyxie programmer care about the syntax you describe on the >overview page? In fact, I see the great virtues of Pyx for awk, sed and >grep programmers, but if I am only interested in Python, why would I >care whether the input format was line oriented or not? > You seem to want to fork the universe. Text tools that understand XML and text tools that do not. You seem happy to throw away the tools that are not XML aware. I want to do the opposite. All text tools understand the concept of a line. Lines are good. Lines are universal and useful. A lot of useful work can get done with the line paradigm -- and that includes XML work -- thanks to PYX. regards, http://www.pyxie.org - an Open Source XML Processing library for Python From paul@prescod.net Mon Jul 3 22:07:35 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 03 Jul 2000 16:07:35 -0500 Subject: [XML-SIG] SAX Namespaces Message-ID: <39610097.7932DD7E@prescod.net> Lars: > I know, but it's much better to simply modify the output from expat > (preferably in C source) than to implement namespaces in Python. I'm not clear what route you are advaocating: 1. Are we going to fork and fix Expat? I've been waiting quite a while for a response from James on this issue. Ease of implementation: 6/10 Ease of maintanance: 3/10 Technical correctness: 10/10 2. Are we going to implement new namespace handling ourselves in C code? (ignoring Expat's namespace features?) Ease of implementation: 3/10 Ease of maintanance: 6/10 Technical correctness: 10/10 3. Are we going to try to reverse-map from URIs to prefixes? And what about when there are two possible prefixes for the same namespace? Ease of implementation: 8/10 Ease of maintanance: 10/10 Technical correctness: 2/10 Implementing something in Python is a stop-gap until we choose the least-bad of these approaches. I would multiply ease of maintenance by 10 and ease tecnical correctness by 100 because they are so much more important than initial difficulty. So "2" would probably be the best unless we hear from James Clark. Who wants to implement it, though? -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From paul@prescod.net Mon Jul 3 22:41:15 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 03 Jul 2000 16:41:15 -0500 Subject: [XML-SIG] Reconsidering the DOM AP References: <3.0.6.32.20000629223040.009eda80@www.digitome.com> Message-ID: <3961087B.E668C375@prescod.net> [I dont' think that this went out before] Sean McGrath wrote: > > For the event handling stuff, the principle difference is just > down to the convenience of event handlers named after element > type names. If it were just the event-oriented stuff, then > Pyxie would not offer enough to drag even me away from > the industry APIs:-) The only reason we didn't add this to Python SAX is because it falls down when you try to use namespaces. > 1)Pyxie uses a "cursor" location metaphor and a > cut/paste approach which is very different from the > DOM. I find the Pyxie approach more natural than > the DOM approach. But why couldn't a Pyxie cursor move around a DOM? It seems that cursor.down is just syntactic sugar for cursor.current_node=cursor.current_node.childNode[0] If the former is more natural for you, then you can just write a small object that implements it. > 2)Pyxie blends the ease of use of tree-oriented processing > with the memory efficiency of event-oriented processing > using a sparse-tree facility. This is no such facility > in industry APIs (that I know of). No, but it can be added to industry APIs. My new pulldom (inspired in large part by Pyxie) has only three methods: parse() getNode() expandNode() (builds a tree) I believe that those functions are essentially analogous to Pyxie's equivalent but after I build the tree I can apply one of the two Python XPath implementations and other DOM packages. > 3)Pyxie allows you to mix logical navigation with > parsing and content insertion in a way I find > very useful in my day to day work. This sort of thing:- These strike me as still being features of a cursor object which could work on any tree data structure. > 4) Pyxie is unashamedly focused on the logical > model of XML documents. It does not concern itself > with general entity references, DTD info etc. etc. Many DOM implementations also restrict themselves in this way. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From paul@prescod.net Mon Jul 3 23:16:47 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 03 Jul 2000 17:16:47 -0500 Subject: [XML-SIG] Re: Pyx References: <3.0.6.32.20000703213703.009e86f0@www.digitome.com> Message-ID: <396110CF.CB142D02@prescod.net> Sean McGrath wrote: > > ... > >It would be trivial to incorporate it into Pyxie and thus relieve Pyxie > >of its dependence on Pyx. > > Although doubtless possible, I see no great benefit in doing this. I guess we are too far apart on this issue. My approach is that if you can reduce I/O (even piped) then you always do because it is so expensive, especially if you have to do do splits, attribute bundling and so forth in Python code. > If we subset XML to produce a similifed line oriented versions, > the powers that be will be very vexed because it will be seen > as fragmenting XML, confusing developers etc. etc. Witness > the SML experience. The SML experience is not illustrative. Their approach was: "XML is too complex. That's a bug. Let's fix it." That is sure to get up people's hackles. Their "simplification" did not really enable any new applications, it just scratched their perception of a general messiness in the XML spec. Line-ML would (or at least could!) be very different. You would say: "I need compatiblity with these legacy tools. XML does not meet that requirement. I intend to make an XML subset *complimentary to full XML* that is compatible with these tools. It would be clear when it was appropriate to use Line-ML and when to use "full XML." If you didn't care about compatibility with line-oriented tools you would use Line-ML. I am the wrong person to lead such a charge because I use Python for all of my XML processing needs and do not worry about line breaks. I understand and respect that some people are more comfortable with awk, grep and so forth. I'm just not the right person to promote their world-view. > Syntactically, PYX is clearly not XML and is in no way > a threat to XML 1.0 and does not confuse developers. It would be a threat to XML if people started making ODBC, LaTeX, etc. drivers in preference to XML drivers. I don't see how you can accuse me of forking the universe when you are proposing this "line-oriented XML alternative" that happens to be isomorphic with a subset of XML. > How would you propose that an XML document conforming > to your XML-subset would signal this fact to software? I wouldn't require any signal at all, but I would *allow* a processing instruction. > You mean "parse" it into PYX. But you are complaining about > parsing overhead and here you are introducing it! I complained about parsing overhead *inside of Pyxie* because it is unnecessary. It is an implementation detail that could (should!) be optimized away. On the other hand, using conversion as a means to avoid doubling the number of data format drivers in the world strikes me as good sense! That isn't about optimization, it's about the fundamental interfaces exposed by software. > >Translation to > >Pyx can be just the end result of a chain of filters. Therefore you do > >not need ODBC->PYX and HTML->PYX and ...->PYX. You need *->XML and XML-> > >Pyx (which you already have). If you start making Pyx "drivers" for > >every data source in the world then you are duplicating all of the work > >that has already been done for XML! > > This paragraph pre-supposes a line oriented subset of XML which I believe > to be politically if not technically infeasible at this point. No it does not. The whole paragraph presumes that pyx is still the line-oriented format. > With respect, you are using your not insignificant > intelligence to read too much into it! I feel the opposite is true: > Yes, I accept that it is feasible to separate PYX from Pyxie > but they are inextricably linked in my head. Maybe this is > a bad thing... There. It is you who are reading too much into it. I think that separating pyx from pyxie *at least logically* is necessary before we can start talking about incorporating pyxie's best parts into Python. If we DID incorporate both Pyx and Pyxie I would say that they would be as separate modules oriented towards their distinct and separate strengths and weaknesses. Half of pyx would probably goes with the marshallers and half with the parsers. You are I are not the only person who are unclear on the Pyx/Pyxie relationship: http://www.deja.com/threadmsg_ct.xp?AN=561555955 The poor guy asked a very reasonable question but never got a clear answer back. The lack of clarity around this situation has been a long-time annoyance for me in discussion of Pyxie and I think that this discussion has been productive in clearing it up in my mind. -- The another annoyance has been the assertion that Pyxie is "Pythonic" and SAX/DOM are "language independent." I see no evidence of this dichotomy. Pyxie is innovative. It would be innovative if it had been invented for Java or Perl too. PyDOM and PySAX re Pythonic. They make a lot of use of tuples, dictionaries, __getitem__, __getattr__ and other Python idioms. We just spent last week defending the use of __getattr__. The Python DOMs have lots of flaws but none of them derive from having been specified in IDL rather than Python. I would be happy to entertain criticisms based on real weaknesses like redundancy, performance or API design. *Evidence* of poor pythonicity would also be welcomed. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From uogbuji@fourthought.com Mon Jul 3 23:40:46 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 03 Jul 2000 16:40:46 -0600 Subject: [XML-SIG] SAX Namespaces In-Reply-To: Message from Paul Prescod of "Mon, 03 Jul 2000 15:33:24 CDT." <3960F894.2BBA026B@prescod.net> Message-ID: <200007032240.QAA13690@localhost.localdomain> [snip Paul's arguments for the (uri, local, rawname) triple] I'll just note for the moment, that based on my experience implementing SAX-> DOM and XPath, I'm nto convinced that using the triple makes life so much easier. Basically, I wouldn't use XPath as an argument for it. 4XPath certainly does fine without it. If Lars's approach makes life easier for "direct-to-SAX" programmers, I'd vote for that. Also, Paul seems to be arguing that "direct-to-SAX" programmers should change their ways and move to higher-level APIs. I think that's unfair. I avoid SAX myself, but I know from many people that they prefer that route and why should we strong-arm them to another approach? Let pulldom or pyxie or DOM or whatever win out on its merits, not by making SAX more difficult. > Let's say the speedup is "only" 5%, but 97% of all SAX-using programs > never use the SAX API directly anyhow? Then that speedup comes "for > free" from the point of view of those programmers. It's like a tweak in > the bowels of Python that makes Guido's job a little bit harder but > makes Python run faster for everyone. It's like rewriting a Python > module in C for the sake of all of the apps built on top of it. That's > where I think we are going. Heck, I woudn't be suprised if we see Python > SAX producers and consumers both written in C soon. I think of it as a > device driver API. This puzzles me. A while back you seemed a bit dismayed when you found that 4XPath and 4XSLT used C modules, but here you seem to be advocating moving at least some XML tools to C. I still think that there is no reason why Python/XML tools shoudln't be in C as long as the maintainers take responsibility to port or assist in porting to all Python platforms. > No, it isn't too late and I consider SAX your domain. I just did what > seemed to me to be the best thing for performance when it comes to names > and I haven't touched AttributeList yet. Until further discussion I'd vote for changing it back to Lars's original API for q-names. Apart from all the other arguments, Lars did put up his API and invite us all to hash it out (and there _was_ some discussion of the matter). IMO it's a bad idea to suddenly change it all now. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From gstein@lyra.org Tue Jul 4 01:46:47 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 3 Jul 2000 17:46:47 -0700 Subject: [XML-SIG] Changes In-Reply-To: <14688.47710.475404.225817@cj42289-a.reston1.va.home.com>; from fdrake@beopen.com on Mon, Jul 03, 2000 at 12:07:58PM -0400 References: <200007030658.IAA01173@loewis.home.cs.tu-berlin.de> <3960B6D3.CBBAD17A@prescod.net> <14688.47710.475404.225817@cj42289-a.reston1.va.home.com> Message-ID: <20000703174647.Y29590@lyra.org> On Mon, Jul 03, 2000 at 12:07:58PM -0400, Fred L. Drake, Jr. wrote: > > Paul Prescod writes: > > I think that the current name makes more clear to maintainers which > > files (pyexpat.*) are maintained by us and which are maintained by James > > Clark. Further, we have to be careful of name clashes in terms of > > expat.dll and pyexpat.dll. That isn't a problem now because the expat > > DLL is not named expat.dll but I can easily imagine that one of Mozilla, > > TCL, Perl or James Clark might also want the name expat.dll. > > I hadn't thought about the DLL issue for Windows; that is a > reasonable problem we should try to avoid proactively. How about > this: rename pyexpat to _pyexpat and not use that as the interface, > but add xml/parser/expat.py which imports the appropriate names from > _pyexpat. This keeps the namespace fairly clean and ensures we don't > have problems with the C extension for static builds (which doesn't > support putting C extensions in packages). > Objections? Better approaches? Yes, I object. pyexpat is a perfectly usable interface. It is false that "people will only use SAX" to talk to the thing. And why bung up the mess with expat.py? It isn't doing anything except for shifting names from one namespace to another. That certainly is not a reason for having it. The expat.dll is good. I withdraw my +1 for that renaming :-) ... keep it pyexpat, but forget the notion of _pyexpat and expat.py. There is no value-add in that. Cheers, -g -- Greg Stein, http://www.lyra.org/ From paul@prescod.net Tue Jul 4 01:50:57 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 03 Jul 2000 19:50:57 -0500 Subject: [XML-SIG] SAX Namespaces References: <200007032240.QAA13690@localhost.localdomain> Message-ID: <396134F1.E41A85E1@prescod.net> Uche Ogbuji wrote: > > ... > > Also, Paul seems to be arguing that "direct-to-SAX" programmers should change > their ways and move to higher-level APIs. I think that's unfair. I avoid SAX > myself, but I know from many people that they prefer that route and why should > we strong-arm them to another approach? Let pulldom or pyxie or DOM or > whatever win out on its merits, not by making SAX more difficult. I don't see the APIs as in competition. I see it as trying to make APIs that do the best they can at a particular job. If SAX is not efficient enough, people will go around it and use the PyExpat API directly. Then SAX lives in a purgatory where it is not easy enough to be popular with programmers looking for ease of use nor efficient enough to be popular among programmers who need absolute speed. Better performance will make SAX more, not less, popular -- but quite possibly popular with a different crowd of people. > ... A while back you seemed a bit dismayed when you found that > 4XPath and 4XSLT used C modules, but here you seem to be advocating moving at > least some XML tools to C. If something needs C, it needs C. That's why we're putting PyExpat into Python 1.6. I said that I wasn't convinced that XPath parsing needed C yet. Nobody has tried to do it with SRE yet. If an SRE-based parser demonstrates the performance problems you described when you tried a Python approach before then C will be the right choice. > I still think that there is no reason why Python/XML tools shoudln't be in C > as long as the maintainers take responsibility to port or assist in porting to > all Python platforms. I agree. I think you'll also agree that C is harder to maintain than Python. I don't think that we disagree here. > IMO it's a bad idea to suddenly change it all now. My understanding was that the details of namespace support were still an open question. The AttributeList change is a new one intended to improve performance. I have thought things through a lot more since we discussed these issues before. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From gstein@lyra.org Tue Jul 4 01:57:22 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 3 Jul 2000 17:57:22 -0700 Subject: [XML-SIG] Reconsidering the DOM AP In-Reply-To: <3961087B.E668C375@prescod.net>; from paul@prescod.net on Mon, Jul 03, 2000 at 04:41:15PM -0500 References: <3.0.6.32.20000629223040.009eda80@www.digitome.com> <3961087B.E668C375@prescod.net> Message-ID: <20000703175722.Z29590@lyra.org> Paul, What is your point? Below you seem to be demonstrating that certain Pyxie operations are possible using a DOM instead. So what? PullDOM has three methods. So? None of that invalidates Pyxie, which is what it appears you are trying to do here. Pyxie does its thing, the DOM does it. I'll choose to use whichever code that I'd like to use. Cheers, -g On Mon, Jul 03, 2000 at 04:41:15PM -0500, Paul Prescod wrote: > [I dont' think that this went out before] > Sean McGrath wrote: > > > > For the event handling stuff, the principle difference is just > > down to the convenience of event handlers named after element > > type names. If it were just the event-oriented stuff, then > > Pyxie would not offer enough to drag even me away from > > the industry APIs:-) > > The only reason we didn't add this to Python SAX is because it falls > down when you try to use namespaces. > > > 1)Pyxie uses a "cursor" location metaphor and a > > cut/paste approach which is very different from the > > DOM. I find the Pyxie approach more natural than > > the DOM approach. > > But why couldn't a Pyxie cursor move around a DOM? It seems that > cursor.down is just syntactic sugar for > > cursor.current_node=cursor.current_node.childNode[0] > > If the former is more natural for you, then you can just write a small > object that implements it. > > > 2)Pyxie blends the ease of use of tree-oriented processing > > with the memory efficiency of event-oriented processing > > using a sparse-tree facility. This is no such facility > > in industry APIs (that I know of). > > No, but it can be added to industry APIs. My new pulldom (inspired in > large part by Pyxie) has only three methods: > > parse() > getNode() > expandNode() (builds a tree) > > I believe that those functions are essentially analogous to Pyxie's > equivalent but after I build the tree I can apply one of the two Python > XPath implementations and other DOM packages. > > > 3)Pyxie allows you to mix logical navigation with > > parsing and content insertion in a way I find > > very useful in my day to day work. This sort of thing:- > > These strike me as still being features of a cursor object which could > work on any tree data structure. > > > 4) Pyxie is unashamedly focused on the logical > > model of XML documents. It does not concern itself > > with general entity references, DTD info etc. etc. > > Many DOM implementations also restrict themselves in this way. > > -- > Paul Prescod - Not encumbered by corporate consensus > The calculus and the rich body of mathematical analysis to which it > gave rise made modern science possible, but it was the algorithm that > made the modern world possible. > - The Advent of the Algorithm (pending), by David Berlinski > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Greg Stein, http://www.lyra.org/ From paul@prescod.net Tue Jul 4 02:05:30 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 03 Jul 2000 20:05:30 -0500 Subject: [XML-SIG] Reconsidering the DOM AP References: <3.0.6.32.20000629223040.009eda80@www.digitome.com> <3961087B.E668C375@prescod.net> <20000703175722.Z29590@lyra.org> Message-ID: <3961385A.D5E5C143@prescod.net> Greg Stein wrote: > > ... > > None of that invalidates Pyxie, which is what it appears you are trying to > do here. I don't see Pyxie as invalid at all. Someone asked us to put it in the standard Python distribution. I think it would be great to put the best ideas from the Pyxie distribution into Python 1.7 but I don't think we are going to put in a tree API that is DOM-like but a little different, and an event API that is SAX-like but a little different. Maybe others will disagree with me on that issue. It's precisely because I think that Pyxie has a lot of smart ideas that I'm trying to figure out what's the same and what's different between Pyxie and DOM. Sean may or may not be interested in merging the concepts or in getting Pyxie (in whole or in part) into Python 1.7. I don't know. I think it's a good route to explore. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From paul@prescod.net Tue Jul 4 02:05:44 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 03 Jul 2000 20:05:44 -0500 Subject: [XML-SIG] Reconsidering the DOM AP References: <3.0.6.32.20000629223040.009eda80@www.digitome.com> <3961087B.E668C375@prescod.net> <20000703175722.Z29590@lyra.org> Message-ID: <39613868.33E23C25@prescod.net> Greg Stein wrote: > > ... > > None of that invalidates Pyxie, which is what it appears you are trying to > do here. I don't see Pyxie as invalid at all. Someone asked us to put it in the standard Python distribution. I think it would be great to put the best ideas from the Pyxie distribution into Python 1.7 but I don't think we are going to put in a tree API that is DOM-like but a little different, and an event API that is SAX-like but a little different. Maybe others will disagree with me on that issue. It's precisely because I think that Pyxie has a lot of smart ideas that I'm trying to figure out what's the same and what's different between Pyxie and DOM. Sean may or may not be interested in merging the concepts or in getting Pyxie (in whole or in part) into Python 1.7. I don't know. He hasn't said. I think it's a good route to explore. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From gstein@lyra.org Tue Jul 4 02:10:46 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 3 Jul 2000 18:10:46 -0700 Subject: [XML-SIG] SAX Namespaces In-Reply-To: <3960F894.2BBA026B@prescod.net>; from paul@prescod.net on Mon, Jul 03, 2000 at 03:33:24PM -0500 References: <3960F894.2BBA026B@prescod.net> Message-ID: <20000703181046.A29590@lyra.org> On Mon, Jul 03, 2000 at 03:33:24PM -0500, Paul Prescod wrote: > [still catching up] > > > ... > > I think this is very wrong, and that we shouldn't do it. I've looked > > through the SAX code in the Python CVS tree and this is the only thing > > I'm really unhappy about. > > > > Let me enumerate the reasons: > > > > - the rawname is not really part of the element name, so it does not > > feel right to have it in the tuple like that > > The rawname *is* the element type name. No, the element name is the (namespace, element-name) tuple. The rawname is actually quite hard to use. We had talked about preserving the namespace *prefix*. The third item in the provided data should be the bare prefix. > > - it makes name comparison (the most common operation on names!) > > really ugly: > > > > if name[0] == xslt_ns and \ > > name[1] == 'template': > > # do something useful > > #1. I don't fundamentally believe that people will be doing this at the > SAX level. I do it all the time (although I don't use SAX). These kinds of comparisons are quite common, and I am certainly not going to use some magic tools that simplify it for me. To enable comparisions like above, the best output format would be something like: ((URI, element-name), prefix). This enables people to use name[0] for the distinguished name (primary key). > SAX is pretty painful for this sort of thing and I would like > to see people move to something with a stack and tree mode (whether > pulldom, Pyxie, whatever) soon. Don't tell me, or others, where to move our code, our designs, or our algorithms. It is not your place to legislate. > #2. The clean way to do what you describe is: > > def startElement( self, (URI, name, rawname), attrs ): > if (URI, name)==xslt_ns > # do something useful > > or: > > def startElement( self, name, attrs ): > (URI, localname, rawname)=name > if (URI, localname)==xslt_ns > # do something useful or: def startElement(self, name, attrs): if name == xslt_ns: # do something useful or: def startElement(self, ((URI, name), prefix), attrs): if URI != 'DAV:' return # do something useful Both of your alternatives constructed a tuple at *runtime*. I'd rather avoid the tuple construction and keep the (IMO) cleaner model. >... > > - it makes it harder to make people understand that the prefix used > > in the XML document is not part of the element name > > Namespaces are complicated and nasty. The old SAX API did not change > that. They don't have to be. I've never had a problem with them in my qp_xml module or the users of that module. > > - I don't see anything that is made easier or faster by this > > representation > > The DOM, XPath and XSLT-based APIs will need the URI, localname, rawname > triple (or at least URI/rawname). Not the rawname... the prefix. Combining the prefix/name means that somebody must then yank it bank apart to deal with that prefix. When they arrive seperate, then the developer has an easy choice for putting them back together. > It would be nice to pass the same > tuple from Pyexpat->SAX->... with no rebundling. In fact, I hope to > optimize the SAX layer away altogether. (by making PyExpat a SAX parser > and minidom et. al. SAX consumers). Please do not add SAX to the front of pyexpat.c. I always want to have access to the raw handlers. Expat is known as the *fastest* non-validating parser around. I don't want to see a bunch of stuff gummed onto the Python version which kills that speed. > > - we already discussed this and decided for something else; I think > > we should not change our minds without good reason > > Agreed. We can go back to the way things were easily at this point. My You should not be making API changes unilaterally. That is totally against the spirit here. Great, you have commit access. I do, too. Should I go in and start making API changes because they suit me? >... > > Hoping this is not too late... > > No, it isn't too late and I consider SAX your domain. I just did what > seemed to me to be the best thing for performance when it comes to names > and I haven't touched AttributeList yet. I think this discussion needs to be reset, and a new consensus needs to be reached. There are obviously divergent opinions here. Can somebody enumerate each of the options here so that we can restart the discussion? Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Tue Jul 4 02:12:54 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 3 Jul 2000 18:12:54 -0700 Subject: [XML-SIG] SAX Namespaces In-Reply-To: <200007032240.QAA13690@localhost.localdomain>; from uogbuji@fourthought.com on Mon, Jul 03, 2000 at 04:40:46PM -0600 References: <200007032240.QAA13690@localhost.localdomain> Message-ID: <20000703181253.B29590@lyra.org> On Mon, Jul 03, 2000 at 04:40:46PM -0600, Uche Ogbuji wrote: >... > > No, it isn't too late and I consider SAX your domain. I just did what > > seemed to me to be the best thing for performance when it comes to names > > and I haven't touched AttributeList yet. > > Until further discussion I'd vote for changing it back to Lars's original API > for q-names. Apart from all the other arguments, Lars did put up his API and > invite us all to hash it out (and there _was_ some discussion of the matter). > IMO it's a bad idea to suddenly change it all now. Not only bad, but unilateral decisions for change are against the spirit of the game... Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Tue Jul 4 02:17:49 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 3 Jul 2000 18:17:49 -0700 Subject: [XML-SIG] Pyx In-Reply-To: <3960EC6E.5B07D888@prescod.net>; from paul@prescod.net on Mon, Jul 03, 2000 at 02:41:34PM -0500 References: <3960EC6E.5B07D888@prescod.net> Message-ID: <20000703181749.C29590@lyra.org> On Mon, Jul 03, 2000 at 02:41:34PM -0500, Paul Prescod wrote: >... > #2. Let's ignore that and pretend it is not possible. It is entirely > possible to use XML as the interchange format between databases and > applications and so forth and just use Pyx when it is necessary to make > the information available as line-oriented information. Translation to > Pyx can be just the end result of a chain of filters. Therefore you do > not need ODBC->PYX and HTML->PYX and ...->PYX. You need *->XML and XML-> > Pyx (which you already have). If you start making Pyx "drivers" for > every data source in the world then you are duplicating all of the work > that has already been done for XML! What is the problem with that? If somebody chooses to do that work, then why is that a problem with you? It is their choice, after all. We all work on the things that we believe in, and that we enjoy working on. If Sean is working on PYX, and you don't like the format or its implications, then who says you must work on it? > > Why are you so hostile to it? > > I'm not hostile to Pyx. I am hostile to what I see as a very fuzzy > description of what Pyx does and does not do. From this standpoint, you seem hostile to the concept. As if it is infringing on territory that only XML should occupy. >... > #2. You claim that we should make pyx generators for ODBC and various > apps. I claim that the combination of XML and XML->Pyx gives us defacto > such generators. Therefore we should push for *xml generators* first, > because they have a much broader utility than Pyx generators. People will work on what they want to work on. There is no way that you can (or should!) tell somebody "don't work on that!" Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Tue Jul 4 02:38:45 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 3 Jul 2000 18:38:45 -0700 Subject: [XML-SIG] help installing PyXML on Linux In-Reply-To: <3960E8C1.C773B2C3@nwaf.com>; from scurvey@nwaf.com on Mon, Jul 03, 2000 at 03:25:53PM -0400 References: <3960E8C1.C773B2C3@nwaf.com> Message-ID: <20000703183845.E29590@lyra.org> On Mon, Jul 03, 2000 at 03:25:53PM -0400, Chris Curvey wrote: > Hi folks, > > I'm trying to install the xml modules for Python, and I'm running into > some issues. Any help would be appreciated. > > After unpacking the source stuff, I run > > make -f Makefile.pre.in Makefile VERSION=1.5 installdir=/usr/local from > the "extensions" directory. (This seems to work OK) > > Then I run "make" which compiles the modules. Fine. > > Then I run "make install" and get "no rule to make the target > install". Hmm. > > System stuff: running Red Hat 6.1, Python 1.5.2 and XML library 0.5.4 Hey Chris! Small world, eh? :-) The link on www.python.org/topics/xml/download.html is out of date. Please grab the latest distribution at: http://www.python.org/sigx/xml-sig/files/PyXML-0.5.5.1.tar.gz The README has directions for building and installing. I'm trying it right now as we speak... (also on RH 6.1, Py 1.5.2) It worked fine for me. (the "install" step should be done as root, but the build can be anybody) Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Tue Jul 4 02:39:30 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 3 Jul 2000 18:39:30 -0700 Subject: [XML-SIG] change topics page (was: help installing PyXML on Linux) In-Reply-To: <3960E8C1.C773B2C3@nwaf.com>; from scurvey@nwaf.com on Mon, Jul 03, 2000 at 03:25:53PM -0400 References: <3960E8C1.C773B2C3@nwaf.com> Message-ID: <20000703183930.F29590@lyra.org> Could somebody change the link on the XML topic's download page? Thx, -g On Mon, Jul 03, 2000 at 03:25:53PM -0400, Chris Curvey wrote: > Hi folks, > > I'm trying to install the xml modules for Python, and I'm running into > some issues. Any help would be appreciated. > > After unpacking the source stuff, I run > > make -f Makefile.pre.in Makefile VERSION=1.5 installdir=/usr/local from > the "extensions" directory. (This seems to work OK) > > Then I run "make" which compiles the modules. Fine. > > Then I run "make install" and get "no rule to make the target > install". Hmm. > > System stuff: running Red Hat 6.1, Python 1.5.2 and XML library 0.5.4 > > Thanks in advance! > > -Chris > > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Greg Stein, http://www.lyra.org/ From paul@prescod.net Tue Jul 4 03:02:48 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 03 Jul 2000 21:02:48 -0500 Subject: [XML-SIG] SAX Namespaces References: <200007032240.QAA13690@localhost.localdomain> <20000703181253.B29590@lyra.org> Message-ID: <396145C8.AC7BF9E9@prescod.net> Greg Stein wrote: > > ... > > Not only bad, but unilateral decisions for change are against the spirit of > the game... You would do better to reserve your flames until you know the circumstances under which the change was made. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From paul@prescod.net Tue Jul 4 03:08:39 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 03 Jul 2000 21:08:39 -0500 Subject: [XML-SIG] SAX Namespaces References: <3960F894.2BBA026B@prescod.net> <20000703181046.A29590@lyra.org> Message-ID: <39614727.C188CA85@prescod.net> Greg Stein wrote: > > > The rawname *is* the element type name. > > No, the element name is the (namespace, element-name) tuple. You won't find that in any XML specification. > The rawname is > actually quite hard to use. We had talked about preserving the namespace > *prefix*. The third item in the provided data should be the bare prefix. Arguable. I could go either way on that issue. The parser has both the prefix and the rawname hanging around. The existing code said "rawname" so I left "rawname". SAX/Java also uses rawname. > > #1. I don't fundamentally believe that people will be doing this at the > > SAX level. > > I do it all the time (although I don't use SAX). These kinds of comparisons > are quite common, and I am certainly not going to use some magic tools that > simplify it for me. What are you talking about? I didn't propose any magic tools. > To enable comparisions like above, the best output format would be something > like: ((URI, element-name), prefix). This enables people to use name[0] for > the distinguished name (primary key). I could live with it, but it's another level of tuple-building and ripping apart. You are assuming a particular use mode now. I'm trying to hand the three parts to the user or application and let them do with them what they want. > > SAX is pretty painful for this sort of thing and I would like > > to see people move to something with a stack and tree mode (whether > > pulldom, Pyxie, whatever) soon. > > Don't tell me, or others, where to move our code, our designs, or our > algorithms. It is not your place to legislate. If you don't decide what your API is for, you can't design it. You make certain assumptions about how people will being doing name comparisons with this particular API and I make other ones. > or: > def startElement(self, name, attrs): > if name == xslt_ns: > # do something useful That isn't an option. We need the prefix or qname. Plus, I'm not even sure why you would be comparing a whole name to a namespace. > or: > def startElement(self, ((URI, name), prefix), attrs): > if URI != 'DAV:' > return > # do something useful That is an option. > Both of your alternatives constructed a tuple at *runtime*. I'd rather avoid > the tuple construction and keep the (IMO) cleaner model. How can you avoid constructing a tuple at runtime? We can't construct them before the program starts!!! > > Namespaces are complicated and nasty. The old SAX API did not change > > that. > > They don't have to be. I've never had a problem with them in my qp_xml > module or the users of that module. Namespace are nasty long before you get to an API. You don't even need to be sitting at a computer. Just read the spec! By the way, qp_xml's handling of the xml: namespace is incorrect. > Not the rawname... the prefix. Combining the prefix/name means that somebody > must then yank it bank apart to deal with that prefix. When they arrive > seperate, then the developer has an easy choice for putting them back > together. Sure, but if the only reason you are going to ever use the prefix is to glue it back on to the localname, then why bother separating it. What use case do you see for wanting the prefix alone? I couldn't think of very many. Wanting the whole rawname is more common and is what is required by DOM and XPath and the various specs writting on top of them. > In fact, I hope to > > optimize the SAX layer away altogether. (by making PyExpat a SAX parser > > and minidom et. al. SAX consumers). > > Please do not add SAX to the front of pyexpat.c. I always want to have > access to the raw handlers. That's what I just said. Read again from "I hope to..." You'll note that the SAX API I am describing is almost a renaming of PyExpat's API (when PyExpat namespaces are turned off). > Expat is known as the *fastest* non-validating > parser around. I don't want to see a bunch of stuff gummed onto the Python > version which kills that speed. That's what we're talking about. > You should not be making API changes unilaterally. That is totally against > the spirit here. Great, you have commit access. I do, too. Should I go in > and start making API changes because they suit me? Chill out. I was given the job of putting SAX into Python 1.6 overnight (literally!). This API change made it easier to get it into the beta. One major virtue of the proposal is that the def startElement( self, name, attrs ): ... signature is the same as SAX 1 which made integration with minidom and other handlers easier in the time available. Pyexpat and minidom were both expecting two argument handlers. Changing SAX was easier than changing them. Both need to change before Python beta 2 anyhow so this isn't a major issue anymore. I thought it was pretty cool that Python's dynamic typing allowed us to go from a namespace-unaware to a namespace-aware mode without inventing a new kind of handler. That's why I posted about it on the weekend. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From paul@prescod.net Tue Jul 4 03:14:38 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 03 Jul 2000 21:14:38 -0500 Subject: [XML-SIG] SAX Namespaces Message-ID: <3961488E.68099597@prescod.net> Greg: > Can somebody enumerate each of the options here so that we can restart the > discussion? These are the ones I can keep track of: #1. def startElement( self, (uri, name), qname, attrs ): .... Question 1: what should uri, name and qname get when namespace processing is off? Question 2: qname or prefix #2. def startElement( self, (uri,localname,qname), attrs ): .... Same questions #3. def startElement( self, ((uri, localname), qname), atrs ): .... Same questions. #4. def startElement( self, name, attrs ): .... Depending on whether you have turned on namespace processing, "name" is either "string" or (uri,localname,qname) Question: qname or prefix #5. def startElement( self, name, atrs ): .... Same description and questions as above. ---- At this point, I don't care enough to argue anymore. I think that #4. is nicely backwards compatible, doesn't put namespaces stuff in people's faces unless they ask for it and will be a little more efficient in the common case where SAX is used as a means to an end like DOM, Pyxie, qp_xml or whatever. I think that having a tuple-format is even more useful (and efficient) when you are trying to pass attribute lists around. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From fdrake@beopen.com Tue Jul 4 03:29:06 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Mon, 3 Jul 2000 22:29:06 -0400 (EDT) Subject: [XML-SIG] SAX Namespaces In-Reply-To: <39614727.C188CA85@prescod.net> References: <3960F894.2BBA026B@prescod.net> <20000703181046.A29590@lyra.org> <39614727.C188CA85@prescod.net> Message-ID: <14689.19442.805699.406444@cj42289-a.reston1.va.home.com> Paul Prescod writes: > Chill out. I was given the job of putting SAX into Python 1.6 overnight > (literally!). This API change made it easier to get it into the beta. > One major virtue of the proposal is that the Should I start feeling powerful? ;) I like option #4, for the reasons you outline, but will live with whatever the group decides. Once I've finished reviewing a manuscript I'm in the middle of, I'll start thinking about the documentation for the XML stuff in the standard library. Andrew, would you mind if I borrow heavily from your start? -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From paul@prescod.net Tue Jul 4 03:29:11 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 03 Jul 2000 21:29:11 -0500 Subject: [XML-SIG] Pyx References: <3960EC6E.5B07D888@prescod.net> <20000703181749.C29590@lyra.org> Message-ID: <39614BF7.66FBA33@prescod.net> Greg Stein wrote: > > ... > > People will work on what they want to work on. There is no way that you > can (or should!) tell somebody "don't work on that!" Somebody asked to put Pyxie into the Python standard library. There are things I think are great about Pyxie (e.g. innovative navigation model, sparse trees) and things that I think are not great (dependence on Pyx). I have never said a word against Pyxie or Pyx and in fact I taught a few people Pyxie at the last Python conference when Sean couldn't make it. I think it is unfortunate that we haven't been able to merge Sean's best ideas with ours but if Sean isn't interested in that then that's his business. We were asked point blank why Pyxie is not in Python 1.6. The reasons I expressed are the truthful reasons I have never proposed that and would not propose it for Python 1.7. Would you prefer if I were silent to avoid hurting Sean's feelings? That was my first inclination, but what happens when the issue comes up again and again? -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski From fdrake@beopen.com Tue Jul 4 03:31:15 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Mon, 3 Jul 2000 22:31:15 -0400 (EDT) Subject: [XML-SIG] change topics page (was: help installing PyXML on Linux) In-Reply-To: <20000703183930.F29590@lyra.org> References: <3960E8C1.C773B2C3@nwaf.com> <20000703183930.F29590@lyra.org> Message-ID: <14689.19571.487179.349021@cj42289-a.reston1.va.home.com> Greg Stein writes: > Could somebody change the link on the XML topic's download page? Andrew, can you handle this? (Don't know if you have access.) There are some bugs with our access to python.org at the moment. ;( -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From mclay@nist.gov Tue Jul 4 06:00:27 2000 From: mclay@nist.gov (Michael McLay) Date: Tue, 4 Jul 2000 01:00:27 -0400 (EDT) Subject: [XML-SIG] Pyx In-Reply-To: <20000703181749.C29590@lyra.org> References: <3960EC6E.5B07D888@prescod.net> <20000703181749.C29590@lyra.org> Message-ID: <14689.28523.413097.94348@fermi.eeel.nist.gov> Greg Stein writes: > On Mon, Jul 03, 2000 at 02:41:34PM -0500, Paul Prescod wrote: > >... > > #2. Let's ignore that and pretend it is not possible. It is entirely > > possible to use XML as the interchange format between databases and > > applications and so forth and just use Pyx when it is necessary to make > > the information available as line-oriented information. Translation to > > Pyx can be just the end result of a chain of filters. Therefore you do > > not need ODBC->PYX and HTML->PYX and ...->PYX. You need *->XML and XML-> > > Pyx (which you already have). If you start making Pyx "drivers" for > > every data source in the world then you are duplicating all of the work > > that has already been done for XML! > > What is the problem with that? If somebody chooses to do that work, then why > is that a problem with you? It is their choice, after all. > > We all work on the things that we believe in, and that we enjoy working on. > If Sean is working on PYX, and you don't like the format or its > implications, then who says you must work on it? I agree with Greg. Also, since there is a book out on using pyxie with Python it should be included in the standard Python distribution so that the package works out of the box for someone who picks up the book and tries to use the examples. (Battries Included) It doesn't harm end users to have an alternate tool to use. Any potential performance concerns could be documented in the description of the package. The description could also point to the book so that people would understand why the package is included in the distribution. How about including something like the following as part of the introduction to the xml package. Package xml: The xml package contains all python modules used to process the xml syntax. Some of these modules are based on well known industry API standards, such as SAX and DOM. Other modules are here because books have been written based on the module. Still others are included because they provide easy to use APIs for a specific type of data processing activity. None of the modules should be considered more correct than the others. Read the documentation on each module and select the module that best matches the problem to be solved. A release may contain experimental xml modules or packages that are labled as such. Please test them and give feedback to the authors, but don't expect them to remain the same, or even remain in the permanent library. If it is marked "EXPERIMENTAL" that is what it means. Don't complain later when it changes. From gstein@lyra.org Tue Jul 4 04:18:23 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 3 Jul 2000 20:18:23 -0700 Subject: [XML-SIG] SAX Namespaces In-Reply-To: <39614727.C188CA85@prescod.net>; from paul@prescod.net on Mon, Jul 03, 2000 at 09:08:39PM -0500 References: <3960F894.2BBA026B@prescod.net> <20000703181046.A29590@lyra.org> <39614727.C188CA85@prescod.net> Message-ID: <20000703201823.I29590@lyra.org> Minor mistake on my part: On Mon, Jul 03, 2000 at 09:08:39PM -0500, Paul Prescod wrote: > Greg Stein wrote: >... > > > > > The rawname *is* the element type name. > > > > No, the element name is the (namespace, element-name) tuple. > > You won't find that in any XML specification. The XML Namespaces spec states that the URI and the localname form the complete, unique element name. >... > > > #1. I don't fundamentally believe that people will be doing this at the > > > SAX level. > > > > I do it all the time (although I don't use SAX). These kinds of comparisons > > are quite common, and I am certainly not going to use some magic tools that > > simplify it for me. > > What are you talking about? I didn't propose any magic tools. If I'm not making the URI/name comparisons myself, then something else is. Just what were you thinking to do this? If not an "if" statement, then I presumed some handy-dandy block box. >... > > or: > > def startElement(self, name, attrs): > > if name == xslt_ns: > > # do something useful > > That isn't an option. We need the prefix or qname. Plus, I'm not even > sure why you would be comparing a whole name to a namespace. Sorry, bunged this one up. I meant: def startElement(self, name, attrs): if name[0] == xslt_ns: # do something useful The calling convention for this is the same as my next option: > > or: > > def startElement(self, ((URI, name), prefix), attrs): > > if URI != 'DAV:' > > return > > # do something useful > > That is an option. > > > Both of your alternatives constructed a tuple at *runtime*. I'd rather avoid > > the tuple construction and keep the (IMO) cleaner model. > > How can you avoid constructing a tuple at runtime? We can't construct > them before the program starts!!! Sure you can :-) xslt_ns = (whatever_uri, whatever_name) ... def startElement(...): ... xslt_ns ... The tuple is constructed at import time. If those whatever_* values are strings, then it is even possible to marshal a tuple constant into the .pyc file (via peephole optimization). With the stored constant, then you even avoid the tuple construction at import time! [ yah yah, nobody cares about saving a few cycles at import time, but it is a neat effect :-) ] > > > Namespaces are complicated and nasty. The old SAX API did not change > > > that. > > > > They don't have to be. I've never had a problem with them in my qp_xml > > module or the users of that module. > > Namespace are nasty long before you get to an API. You don't even need > to be sitting at a computer. Just read the spec! By the way, qp_xml's > handling of the xml: namespace is incorrect. In what way? Please specify. I suspect that your thinking here is simply that a spec does not (yet) exist for the qp_xml API. Specifically, it issues namespace/localname tuple pairs: ('DAV:', 'response') When you have no namespace, you get: ('', 'some-element') When the xml: prefix is found, it returns: ('', 'xml:foobar') This is done because qp_xml does not preserve prefixes. Thus, an application is required to manufacture prefixes when dumping the QP tree back out. The particular tuple format for xml: prefix does two things: 1) '' signifies no namespace, so a prefix is not generated for the name 2) 'xml:' in the elem name ensure that the xml: prefix gets dumped out If we returned the data in another way, the application might accidentally replace the prefix with something other than xml:. Final note on the QP name values: since '' is used to signify *NO* namespace, it is important to note how defaults are handled. QP will always resolve defaults before returning values. When re-generating content, default namespaces should not be used, unless you really know what you're doing w.r.t. the no-namespace signifier and the xml: prefix. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Tue Jul 4 04:24:58 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 3 Jul 2000 20:24:58 -0700 Subject: [XML-SIG] SAX Namespaces In-Reply-To: <3961488E.68099597@prescod.net>; from paul@prescod.net on Mon, Jul 03, 2000 at 09:14:38PM -0500 References: <3961488E.68099597@prescod.net> Message-ID: <20000703202458.J29590@lyra.org> Each of these presumes that the namespace processing can be turned off. Is that an important/required feature? IMO, the NS processing can/should always occur; they are an integral part of XML processing today. The presence of namespaces doesn't impact older documents either. [ well, there is the case of somebody using ':' in an XML element name but NOT using namespaces. but holy smokes... I don't think we should introduce variations in our APIs based on this edge case. ] [ nit: how is #5 different from #4? ] I would take option (1) or (3). qname would be the prefix used. If NS processing *can* be disabled, then uri==None and name==qname. Cheers, -g On Mon, Jul 03, 2000 at 09:14:38PM -0500, Paul Prescod wrote: > Greg: > > Can somebody enumerate each of the options here so that we can restart the > > discussion? > > These are the ones I can keep track of: > > #1. def startElement( self, (uri, name), qname, attrs ): > .... > > Question 1: what should uri, name and qname get when namespace > processing is off? > Question 2: qname or prefix > > #2. def startElement( self, (uri,localname,qname), attrs ): > .... > > Same questions > > #3. def startElement( self, ((uri, localname), qname), atrs ): > .... > > Same questions. > > #4. def startElement( self, name, attrs ): > .... > > Depending on whether you have turned on namespace processing, "name" is > either "string" or (uri,localname,qname) > > Question: qname or prefix > > #5. def startElement( self, name, atrs ): > .... > > Same description and questions as above. > > ---- > > At this point, I don't care enough to argue anymore. > > I think that #4. is nicely backwards compatible, doesn't put namespaces > stuff in people's faces unless they ask for it and will be a little more > efficient in the common case where SAX is used as a means to an end like > DOM, Pyxie, qp_xml or whatever. I think that having a tuple-format is > even more useful (and efficient) when you are trying to pass attribute > lists around. > > -- > Paul Prescod - Not encumbered by corporate consensus > The calculus and the rich body of mathematical analysis to which it > gave rise made modern science possible, but it was the algorithm that > made the modern world possible. > - The Advent of the Algorithm (pending), by David Berlinski > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Tue Jul 4 04:32:50 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 3 Jul 2000 20:32:50 -0700 Subject: [XML-SIG] SAX Namespaces In-Reply-To: <20000703202458.J29590@lyra.org>; from gstein@lyra.org on Mon, Jul 03, 2000 at 08:24:58PM -0700 References: <3961488E.68099597@prescod.net> <20000703202458.J29590@lyra.org> Message-ID: <20000703203250.K29590@lyra.org> Just realized that I wanted to state why the other forms aren't as nice... On Mon, Jul 03, 2000 at 08:24:58PM -0700, Greg Stein wrote: >... > I would take option (1) or (3). qname would be the prefix used. If NS > processing *can* be disabled, then uri==None and name==qname. >... > On Mon, Jul 03, 2000 at 09:14:38PM -0500, Paul Prescod wrote: >... > > #2. def startElement( self, (uri,localname,qname), attrs ): This form is a bit more difficult to work with the uri/localname pair when doing processing. >... > > #4. def startElement( self, name, attrs ): > > .... > > > > Depending on whether you have turned on namespace processing, "name" is > > either "string" or (uri,localname,qname) This introduces a "mode" into the API. Depending on some flag, you get entirely different data. Further, the presence of qname in there means that "name" is useless unless pulled apart (otherwise A:name and B:name are distinct). This means that we have some functions that do: def startElement_mode1(self, name, attrs): if name == 'elem': ... def startElement_mode2(self, (uri, localname, qname), attrs): if (uri, localname) == LOOKING_FOR: ... In essence the variant structure of the return value does a disservice to creating a standard API. Depending on how the event generator is set up, you could get entirely different data. [ and we won't even go into a handler that must do an isinstance() to check to see which form was passed... ] >... > > #5. def startElement( self, name, atrs ): > > .... > > > > Same description and questions as above. Unknown. Unless something funny is going on in here, this might simply be a variant of the tuple-unpacking-declaration of the functions above. In other words, the folllowing two signatures are the same: def startElement(self, name, attrs): def startElement(self, (uri, localname, qname), attrs): It is just that the second does a bit of automatic tuple-unpacking on entry to the function. (and the latter will raise an error if it isn't passed a 3-item tuple) Cheers, -g -- Greg Stein, http://www.lyra.org/ From paul@prescod.net Tue Jul 4 04:38:06 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 03 Jul 2000 22:38:06 -0500 Subject: [XML-SIG] Namespaces in SAX Message-ID: <39615C1E.50B942C@prescod.net> I've just checked in changes that separate out the qname parameters from the uri, localname ones. AttributeList was never changed and thus does not need to be changed back. -- Paul Prescod - Not encumbered by corporate consensus The great era of mathematical physics is now over; the 300-year effort to represent the material world in mathematical terms has exhausted itself. The understanding it was to provide is infinitely closer than it was when Isaac Newton wrote in the late seventeenth centruy, but it is still infinitely far away. - The Advent of the Algorithm, by David Berlinski From paul@prescod.net Tue Jul 4 04:40:51 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 03 Jul 2000 22:40:51 -0500 Subject: [XML-SIG] SAX Namespaces References: <3961488E.68099597@prescod.net> <20000703202458.J29590@lyra.org> Message-ID: <39615CC3.77913838@prescod.net> Greg Stein wrote: > > Each of these presumes that the namespace processing can be turned off. Is > that an important/required feature? IMO, the NS processing can/should always > occur; they are an integral part of XML processing today. I do a lot of XML processing that doesn't use namespaces. So, I guess do all of the SAX 1.0 and Pyxie users. I think that a lot of beginning XML users do not want to care about namespaces until they have to which is why option 4 made namespace handling "invisible" until they turned it on. > [ nit: how is #5 different from #4? ] It was supposed to be the tuple-in-tuple version with the dynamic object type. -- Paul Prescod - Not encumbered by corporate consensus The great era of mathematical physics is now over; the 300-year effort to represent the material world in mathematical terms has exhausted itself. The understanding it was to provide is infinitely closer than it was when Isaac Newton wrote in the late seventeenth centruy, but it is still infinitely far away. - The Advent of the Algorithm, by David Berlinski From paul@prescod.net Tue Jul 4 04:46:28 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 03 Jul 2000 22:46:28 -0500 Subject: [XML-SIG] Pyx References: <3960EC6E.5B07D888@prescod.net> <20000703181749.C29590@lyra.org> <14689.28523.413097.94348@fermi.eeel.nist.gov> Message-ID: <39615E14.753D7C78@prescod.net> Michael McLay wrote: > > ... > > I agree with Greg. Also, since there is a book out on using pyxie > with Python it should be included in the standard Python distribution > so that the package works out of the box for someone who picks up the > book and tries to use the examples. (Battries Included) Are you talking about the XML distribution (where I think Pyxie would be a good fit) or Python 1.6???? Assuming the latter, I disagree. If we go to Guido with that proposal I think that he will tell us that the XML distribution is the place for experimental packages. But I'm not as good at channelling him as some others... Your proposed text goes even farther, implying that we've got "all" Python-related XML modules in there. That would never fly... -- Paul Prescod - Not encumbered by corporate consensus The great era of mathematical physics is now over; the 300-year effort to represent the material world in mathematical terms has exhausted itself. The understanding it was to provide is infinitely closer than it was when Isaac Newton wrote in the late seventeenth centruy, but it is still infinitely far away. - The Advent of the Algorithm, by David Berlinski From uogbuji@fourthought.com Tue Jul 4 07:29:24 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 04 Jul 2000 00:29:24 -0600 Subject: [XML-SIG] SAX Namespaces In-Reply-To: Message from Greg Stein of "Mon, 03 Jul 2000 20:18:23 PDT." <20000703201823.I29590@lyra.org> Message-ID: <200007040629.AAA15318@localhost.localdomain> > > > Paul Prescod: > > > > The rawname *is* the element type name. > > Greg Stein: > > > No, the element name is the (namespace, element-name) tuple. > > > Paul Prescod: > > You won't find that in any XML specification. > > Greg Stein: > The XML Namespaces spec states that the URI and the localname form the > complete, unique element name. I think we have a case of duelling specs here. What exactly Namespaces 1.0 (XML/NS) is up to has been the subject of many a flame war that I hope we don't have to rehash. If I read Paul correctly, he carefully used the term "element type name", which is a high quiddity of XML (and SGML before it) that I don't think XML/NS tries to muck with. It seems to me from reading XML/NS that they are most concerned with disambiguating *generic identifiers*, which are, arguably, what most programmers expect to be dealing with when they plug into SAX. At any rate, all the data is available in both approaches, so does it really matter? > > Namespace are nasty long before you get to an API. You don't even need > > to be sitting at a computer. Just read the spec! By the way, qp_xml's > > handling of the xml: namespace is incorrect. > > In what way? Please specify. > > I suspect that your thinking here is simply that a spec does not (yet) exist > for the qp_xml API. Specifically, it issues namespace/localname tuple pairs: > > ('DAV:', 'response') > > When you have no namespace, you get: > > ('', 'some-element') > > When the xml: prefix is found, it returns: > > ('', 'xml:foobar') > > This is done because qp_xml does not preserve prefixes. Thus, an application > is required to manufacture prefixes when dumping the QP tree back out. The > particular tuple format for xml: prefix does two things: > 1) '' signifies no namespace, so a prefix is not generated for the name > 2) 'xml:' in the elem name ensure that the xml: prefix gets dumped out > > If we returned the data in another way, the application might accidentally > replace the prefix with something other than xml:. You lost me somewhere. It is simply impossible in an XML/NS-compliant application, for the "xml" prefix to be associated with no namespace. From your own description I think you have a bug. When the xml prefix is found, you should return ('http://www.w3.org/XML/1998/namespace', 'xml:foobar') -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uogbuji@fourthought.com Tue Jul 4 07:37:03 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 04 Jul 2000 00:37:03 -0600 Subject: [XML-SIG] SAX Namespaces In-Reply-To: Message from Greg Stein of "Mon, 03 Jul 2000 20:24:58 PDT." <20000703202458.J29590@lyra.org> Message-ID: <200007040637.AAA15358@localhost.localdomain> > Each of these presumes that the namespace processing can be turned off. Is > that an important/required feature? IMO, the NS processing can/should always > occur; they are an integral part of XML processing today. The presence of > namespaces doesn't impact older documents either. > > [ well, there is the case of somebody using ':' in an XML element name but > NOT using namespaces. but holy smokes... I don't think we should introduce > variations in our APIs based on this edge case. ] Actually, XML 1.0 deprecates this, so I don't think we should lose _any_ sleep over this edge case. > [ nit: how is #5 different from #4? ] > > I would take option (1) or (3). qname would be the prefix used. If NS > processing *can* be disabled, then uri==None and name==qname. I'll admit that Paul's point that the API changes least from SAX 1 to SAX 2 in option #4 is pretty attractive (I must have missed this point earlier). However, as Paul says, I don't think this is even worth much more argument. All the data is available either way. #1 is also attractive because it's what we all murmured assent to earlier. Not that change is forbidden, but this one has proven controversial and comes at the 11th hour. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Mark.Favas@per.dem.csiro.au Tue Jul 4 07:51:39 2000 From: Mark.Favas@per.dem.csiro.au (Favas, Mark (EM, Floreat)) Date: Tue, 4 Jul 2000 14:51:39 +0800 Subject: [XML-SIG] FW: pyexpat compilation errors - Python 2.0b1 Message-ID: I reported the following to the xml-sig some time ago, but it's still there in the current (July 4) CVS version of Python 2.0b1 - must have got lost in the noise . Platform: DEC Alpha, Tru64 Unix V4.0F, Compaq C V6.1-110 cc -I/home/gonzo1/mark/PyXML-0.5.5.1/extensions/expat/xmlparse -O -Olimit 1500 -I./../Include -I.. -DHAVE_CONFIG_H -c ./pyexpat.c cc: Error: ./pyexpat.c, line 69: The static declaration of "handler_info" is a tentative definition and specifies an incomplete type. (incompstat) staticforward struct HandlerInfo handler_info[]; ---------------------------------^ If I replace this definition by staticforward struct HandlerInfo handler_info[64]; pyexpat then compiles (with the following warnings): (64 was just chosen at random - needs to be larger than 1, though...) and the warnings can be gotten rid of by explicitly casting the my_XXXHandler values to void * cc -I/home/gonzo1/mark/PyXML-0.5.5.1/extensions/expat/xmlparse -O -Olimit 1500 -I./../Include -I.. -DHAVE_CONFIG_H -c ./pyexpat.c cc: Warning: ./pyexpat.c, line 987: In the initializer for handler_info[0].handl er, the referenced type of the pointer value "my_StartElementHandler" is "functi on (pointer to void, pointer to const char, pointer to pointer to const char) re turning void", which is not compatible with "void". (ptrmismatch) my_StartElementHandler}, --------^ cc: Warning: ./pyexpat.c, line 990: In the initializer for handler_info[1].handl er, the referenced type of the pointer value "my_EndElementHandler" is "function (pointer to void, pointer to const char) returning void", which is not compatib le with "void". (ptrmismatch) my_EndElementHandler}, --------^ cc: Warning: ./pyexpat.c, line 993: In the initializer for handler_info[2].handl er, the referenced type of the pointer value "my_ProcessingInstructionHandler" i s "function (pointer to void, pointer to const char, pointer to const char) retu rning void", which is not compatible with "void". (ptrmismatch) my_ProcessingInstructionHandler}, --------^ cc: Warning: ./pyexpat.c, line 996: In the initializer for handler_info[3].handl er, the referenced type of the pointer value "my_CharacterDataHandler" is "funct ion (pointer to void, pointer to const char, int) returning void", which is not compatible with "void". (ptrmismatch) my_CharacterDataHandler}, --------^ cc: Warning: ./pyexpat.c, line 999: In the initializer for handler_info[4].handl er, the referenced type of the pointer value "my_UnparsedEntityDeclHandler" is " function (pointer to void, pointer to const char, pointer to const char, pointer to const char, pointer to const char, pointer to const char) returning void", w hich is not compatible with "void". (ptrmismatch) my_UnparsedEntityDeclHandler }, --------^ cc: Warning: ./pyexpat.c, line 1002: In the initializer for handler_info[5].hand ler, the referenced type of the pointer value "my_NotationDeclHandler" is "funct ion (pointer to void, pointer to const char, pointer to const char, pointer to c onst char, pointer to const char) returning void", which is not compatible with "void". (ptrmismatch) my_NotationDeclHandler }, --------^ cc: Warning: ./pyexpat.c, line 1005: In the initializer for handler_info[6].hand ler, the referenced type of the pointer value "my_StartNamespaceDeclHandler" is "function (pointer to void, pointer to const char, pointer to const char) return ing void", which is not compatible with "void". (ptrmismatch) my_StartNamespaceDeclHandler }, --------^ cc: Warning: ./pyexpat.c, line 1008: In the initializer for handler_info[7].hand ler, the referenced type of the pointer value "my_EndNamespaceDeclHandler" is "f unction (pointer to void, pointer to const char) returning void", which is not c ompatible with "void". (ptrmismatch) my_EndNamespaceDeclHandler }, --------^ cc: Warning: ./pyexpat.c, line 1011: In the initializer for handler_info[8].hand ler, the referenced type of the pointer value "my_CommentHandler" is "function ( pointer to void, pointer to const char) returning void", which is not compatible with "void". (ptrmismatch) my_CommentHandler}, --------^ cc: Warning: ./pyexpat.c, line 1014: In the initializer for handler_info[9].hand ler, the referenced type of the pointer value "my_StartCdataSectionHandler" is " function (pointer to void) returning void", which is not compatible with "void". (ptrmismatch) my_StartCdataSectionHandler}, --------^ cc: Warning: ./pyexpat.c, line 1017: In the initializer for handler_info[10].han dler, the referenced type of the pointer value "my_EndCdataSectionHandler" is "f unction (pointer to void) returning void", which is not compatible with "void". (ptrmismatch) my_EndCdataSectionHandler}, --------^ cc: Warning: ./pyexpat.c, line 1020: In the initializer for handler_info[11].han dler, the referenced type of the pointer value "my_DefaultHandler" is "function (pointer to void, pointer to const char, int) returning void", which is not comp atible with "void". (ptrmismatch) my_DefaultHandler}, --------^ cc: Warning: ./pyexpat.c, line 1023: In the initializer for handler_info[12].han dler, the referenced type of the pointer value "my_DefaultHandlerExpandHandler" is "function (pointer to void, pointer to const char, int) returning void", whic h is not compatible with "void". (ptrmismatch) my_DefaultHandlerExpandHandler}, --------^ cc: Warning: ./pyexpat.c, line 1026: In the initializer for handler_info[13].han dler, the referenced type of the pointer value "my_NotStandaloneHandler" is "fun ction (pointer to void) returning int", which is not compatible with "void". (pt rmismatch) my_NotStandaloneHandler}, --------^ cc: Warning: ./pyexpat.c, line 1029: In the initializer for handler_info[14].han dler, the referenced type of the pointer value "my_ExternalEntityRefHandler" is "function (pointer to void, pointer to const char, pointer to const char, pointe r to const char, pointer to const char) returning int", which is not compatible with "void". (ptrmismatch) my_ExternalEntityRefHandler }, --------^ -- Email - m.favas@per.dem.csiro.au Mark C Favas Phone - +61 8 9333 6268, 0418 926 074 CSIRO Exploration & Mining Fax - +61 8 9383 9891 Private Bag No 5, Wembley WGS84 - 31.95 S, 115.80 E Western Australia 6913 From larsga@garshol.priv.no Tue Jul 4 10:01:41 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 04 Jul 2000 11:01:41 +0200 Subject: [XML-SIG] SAX Namespaces In-Reply-To: <20000703181253.B29590@lyra.org> References: <200007032240.QAA13690@localhost.localdomain> <20000703181253.B29590@lyra.org> Message-ID: * Uche Ogbuji | | Until further discussion I'd vote for changing it back to Lars's | original API for q-names. Apart from all the other arguments, Lars | did put up his API and invite us all to hash it out (and there _was_ | some discussion of the matter). IMO it's a bad idea to suddenly | change it all now. * Greg Stein | | Not only bad, but unilateral decisions for change are against the | spirit of the game... Greg, I think you should let this rest. I don't see this as Paul making a unilateral decision, but rather as him making a quick hack to get the XML modules in Python 1.6b1 to work together. Having done so he posted to the XML-SIG to discuss whether that change was to be considered the final solution: I don't like his proposed solution, but I have no problems with how it happened. --Lars M. From larsga@garshol.priv.no Tue Jul 4 10:08:50 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 04 Jul 2000 11:08:50 +0200 Subject: [XML-SIG] SAX Namespaces In-Reply-To: <39610097.7932DD7E@prescod.net> References: <39610097.7932DD7E@prescod.net> Message-ID: * Lars Marius Garshol | | I know, but it's much better to simply modify the output from expat | (preferably in C source) than to implement namespaces in Python. * Paul Prescod | | I'm not clear what route you are advaocating: | | [...] What I mean is that expat already has namespace handling, but with an interface that is tuned for C and not for Python. I think that rather than just directly translating the C interface into Python we should try to make it more convenient. At the moment expat represents namespace names as 'uri localname'. What I would want pyexpat.c to do is to turn those cooked strings into ('uri', 'localname') tuples. So it's not a matter of reimplementing anything, but just of improving the interface. --Lars M. From gstein@lyra.org Tue Jul 4 10:17:02 2000 From: gstein@lyra.org (Greg Stein) Date: Tue, 4 Jul 2000 02:17:02 -0700 Subject: [XML-SIG] SAX Namespaces In-Reply-To: ; from larsga@garshol.priv.no on Tue, Jul 04, 2000 at 11:08:50AM +0200 References: <39610097.7932DD7E@prescod.net> Message-ID: <20000704021702.U29590@lyra.org> On Tue, Jul 04, 2000 at 11:08:50AM +0200, Lars Marius Garshol wrote: >... > What I mean is that expat already has namespace handling, but with an > interface that is tuned for C and not for Python. I think that rather > than just directly translating the C interface into Python we should > try to make it more convenient. > > At the moment expat represents namespace names as 'uri localname'. > What I would want pyexpat.c to do is to turn those cooked strings into > ('uri', 'localname') tuples. So it's not a matter of reimplementing > anything, but just of improving the interface. Simple code: const char *space = strchr(name, ' '); PyObject *obNS = PyString_FromStringAndSize(name, space - name); PyObject *obName = PyString_FromString(space + 1); Of course, I don't know how to get the original prefix/rawname... Cheers, -g -- Greg Stein, http://www.lyra.org/ From larsga@garshol.priv.no Tue Jul 4 10:31:52 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 04 Jul 2000 11:31:52 +0200 Subject: [XML-SIG] SAX Namespaces In-Reply-To: <20000703202458.J29590@lyra.org> References: <3961488E.68099597@prescod.net> <20000703202458.J29590@lyra.org> Message-ID: * Greg Stein | | Each of these presumes that the namespace processing can be turned | off. Is that an important/required feature? I think so, yes. It makes XML much more approachable for novices, and many programs also don't need namespace processing. The cost of namespace processing is also relatively high for the pure-Python parsers. | IMO, the NS processing can/should always occur; they are an integral | part of XML processing today. It is the default setting, and parsers are required to support it. They are not required to be able to support turning it off. | If NS processing *can* be disabled, then uri==None and name==qname. I would prefer replacing the tuple with the qname. Any code that looks at the internal structure of names for (uri, localname) will assume namespace processing anyway, methinks. If anyone can think of convincing use cases that are made awkward by the string representation I will reconsider. --Lars M. From larsga@garshol.priv.no Tue Jul 4 10:41:55 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 04 Jul 2000 11:41:55 +0200 Subject: [XML-SIG] SAX namespaces discussion status Message-ID: I feel a need to summarize where the discussion stands and what needs to be done, hence this posting. Basically, we have a disagreement on how namespace names should be represented in SAX 2.0. My feeling is that since the organization of the API is changing anyway because of the incorporation into Python 1.6/2.0 we should make sure we have at least rough consensus now before moving on. Paul listed four alternatives (the fifth seems to be identical with #4). Here is my, slightly modified, version of that list. The qname or prefix discussion we can leave for later, since it is really orthogonal to the name representation issue. #1. def startElement( self, (uri, name), qname, attrs ): When namespace processing is off, (uri, name) is just the raw name instead. #2. def startElement( self, (uri,localname, qname), attrs ): #3. def startElement( self, ((uri, localname), qname), atrs ): #4. def startElement( self, name, attrs ): Depending on whether you have turned on namespace processing, "name" is # either "string" or (uri,localname,qname) #1 is here the current SAX 2.0 interface and #2 is what Paul implemented for Python 2.0. As near as I can tell, current positions are: - me: #1 - Paul: #2 - Greg: #1 or #3 - Uche: #1, pending further discussion The reasons I prefer #1 are that - it collects the logical name (in both the namespace view and the XML 1.0 view) into a single value, which seems like The Right Thing to me - it is easier to understand how to use this API correctly for novices - it is easier for programmers who use the SAX 2.0 interface directly. I do this all the time, and I believe others will do the same, so for me this is an important consideration. As near as I can tell, these are Paul's arguments against it: - it breaks backwards compatibility - SAX convenience is not important - performance for higher layers Below are my responses to his arguments: I don't think the backwards compatibility argument carries much weight. Names have changed anyway, and in rewriting the code adapting the startElement / endElement methods is very little work. At least it was for me, and I've rewritten heaps of example code for my book for just this. I think SAX convenience matters, but I agree that convenience arguments carry less weight. However, to me this is also a matter of rightness. In the namespace view, element names consist of two parts: URI and local name. The #1 representation reflects that very clearly, while #2 obscures it. Performance does of course matter, but I don't see how #2 improves it. The necessary information is available in both #1 and #2, and access to it is more or less identical. If the problem is that extracting the information from the Attributes interface is too slow, then let us look into what is needed and see how we can best provide that. Hoping to settle this issue once and for all, --Lars M. From larsga@garshol.priv.no Tue Jul 4 10:44:37 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 04 Jul 2000 11:44:37 +0200 Subject: [XML-SIG] SAX Namespaces In-Reply-To: <20000704021702.U29590@lyra.org> References: <39610097.7932DD7E@prescod.net> <20000704021702.U29590@lyra.org> Message-ID: * Lars Marius Garshol | | At the moment expat represents namespace names as 'uri localname'. | What I would want pyexpat.c to do is to turn those cooked strings into | ('uri', 'localname') tuples. So it's not a matter of reimplementing | anything, but just of improving the interface. * Greg Stein | | Simple code: | | const char *space = strchr(name, ' '); | PyObject *obNS = PyString_FromStringAndSize(name, space - name); | PyObject *obName = PyString_FromString(space + 1); Bingo. | Of course, I don't know how to get the original prefix/rawname... As far as I can tell expat does not make it available. However, SAX does not require this, so that's OK. --Lars M. From gstein@lyra.org Tue Jul 4 10:56:55 2000 From: gstein@lyra.org (Greg Stein) Date: Tue, 4 Jul 2000 02:56:55 -0700 Subject: [XML-SIG] SAX Namespaces In-Reply-To: ; from larsga@garshol.priv.no on Tue, Jul 04, 2000 at 11:31:52AM +0200 References: <3961488E.68099597@prescod.net> <20000703202458.J29590@lyra.org> Message-ID: <20000704025655.X29590@lyra.org> On Tue, Jul 04, 2000 at 11:31:52AM +0200, Lars Marius Garshol wrote: > * Greg Stein >... > | If NS processing *can* be disabled, then uri==None and name==qname. > > I would prefer replacing the tuple with the qname. Any code that > looks at the internal structure of names for (uri, localname) will > assume namespace processing anyway, methinks. If anyone can think of > convincing use cases that are made awkward by the string > representation I will reconsider. As long as you're saying it is ((uri, localname), qname) or (qname, qname), then I'm fine with that. In either case value[0] is the "name" of the item. [ btw, I use tuple representation above, but the arguments also applies when the two pieces are separate arguments (e.g. option #1) ] Cheers, -g -- Greg Stein, http://www.lyra.org/ From uogbuji@fourthought.com Tue Jul 4 14:47:15 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 04 Jul 2000 07:47:15 -0600 Subject: [XML-SIG] SAX namespaces discussion status In-Reply-To: Message from Lars Marius Garshol of "04 Jul 2000 11:41:55 +0200." Message-ID: <200007041347.HAA16154@localhost.localdomain> LMG: > Paul listed four alternatives (the fifth seems to be identical with > #4). Here is my, slightly modified, version of that list. The qname or > prefix discussion we can leave for later, since it is really > orthogonal to the name representation issue. > > #1. def startElement( self, (uri, name), qname, attrs ): > When namespace processing is off, (uri, name) is just the raw > name instead. > > #2. def startElement( self, (uri,localname, qname), attrs ): > > #3. def startElement( self, ((uri, localname), qname), atrs ): > > #4. def startElement( self, name, attrs ): > Depending on whether you have turned on namespace processing, > "name" is # either "string" or (uri,localname,qname) There's one axis you left out: qname versus prefix. IOW, there are another four options: #5. def startElement( self, (uri, name), prefix, attrs ): #6. def startElement( self, (uri,localname, prefix), attrs ): #7. def startElement( self, ((uri, localname), prefix), atrs ): #8. def startElement( self, name, attrs ): with modes where name = "name" or (uri,localname,prefix) I tend to side more with Greg on this matter: I'd rather have the prefix split out for me. 4XPath and 4XSLT are absolutely littered with SplitQName() calls that would be somewhat reduced in this case. So deciding all over again, 5 and 8 both look attractive. As Greg says, 8's modes can make genericizing SAX handlers (say for filters) tricky. But on the other hand, there would have to be a raft of conditionals for processing 5 generically. In the end, though, my leaning would be towards 5. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From paul@prescod.net Tue Jul 4 14:50:39 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 04 Jul 2000 08:50:39 -0500 Subject: [XML-SIG] SAX Namespaces References: <200007040637.AAA15358@localhost.localdomain> Message-ID: <3961EBAF.C3F5DF6F@prescod.net> Uche Ogbuji wrote: > > ... > > I'll admit that Paul's point that the API changes least from SAX 1 to SAX 2 in > option #4 is pretty attractive (I must have missed this point earlier). I may not have made it. Choosing the best way to handle this was a long thought process that was invariably muddy because it involved thinking about elements and attributes in parallel and thinking about various use cases for attributes. Which brings us back to attributes. These are the more important detail in terms of performance. Several people got their hackles up when I started describing what sorts of applications we should or should not be aiming SAX to. I wasn't thinking. I shouldn't have sent the conversation in that direction at all. It's just a distraction. I'll try to rewrite the technical parts. We know that many people are going to use SAX in many different ways. We know that a large group is going to use it as a headless driver. Another group is going to program domain-specific apps (at least at first). These groups have different needs but they are not THAT different that they should be competitive. Right now every driver needs code like this in it: attrs=MyAttrDataStructure() (perhaps a dict or list) attrs=AttributeList( attrs ) The generated object is inefficient to loop over and of course the instance construction has cost. For a lot of apps, you need to copy each attribute out of the object through a level of method calls. My proposal is to move that last line from the driver into the application and let the application do it only if it wants to. That gives them the choice of the right object wrapper (if any), depending on whether they want to iterate, or index by rawnames or index by localname/rawname pairs. Of course we will provide at least one object wrapper. So to get back the old behavior all you need to do is say: attrs=AttributeList( attrs ) as the first line of your handler. I don't see this as an onerous requirement making SAX unusable. Rather it makes it more widely usable. -- Paul Prescod - Not encumbered by corporate consensus The great era of mathematical physics is now over; the 300-year effort to represent the material world in mathematical terms has exhausted itself. The understanding it was to provide is infinitely closer than it was when Isaac Newton wrote in the late seventeenth centruy, but it is still infinitely far away. - The Advent of the Algorithm, by David Berlinski From akuchlin@mems-exchange.org Tue Jul 4 15:24:21 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Tue, 4 Jul 2000 10:24:21 -0400 Subject: [XML-SIG] SAX Namespaces In-Reply-To: <14689.19442.805699.406444@cj42289-a.reston1.va.home.com>; from fdrake@beopen.com on Mon, Jul 03, 2000 at 10:29:06PM -0400 References: <3960F894.2BBA026B@prescod.net> <20000703181046.A29590@lyra.org> <39614727.C188CA85@prescod.net> <14689.19442.805699.406444@cj42289-a.reston1.va.home.com> Message-ID: <20000704102421.C14382@newcnri.cnri.reston.va.us> On Mon, Jul 03, 2000 at 10:29:06PM -0400, Fred L. Drake, Jr. wrote: >standard library. Andrew, would you mind if I borrow heavily from >your start? I assume you mean the HOWTO? (libpyexpat.tex is already checked in.) Please do; let me know if you need the LaTeX source. --amk From larsga@garshol.priv.no Tue Jul 4 15:33:17 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 04 Jul 2000 16:33:17 +0200 Subject: [XML-SIG] SAX namespaces discussion status In-Reply-To: <200007041347.HAA16154@localhost.localdomain> References: <200007041347.HAA16154@localhost.localdomain> Message-ID: * Lars Marius Garshol | | Paul listed four alternatives (the fifth seems to be identical with | #4). Here is my, slightly modified, version of that list. The qname or | prefix discussion we can leave for later, since it is really | orthogonal to the name representation issue. | | [...list snipped...] * Uche Ogbuji | | There's one axis you left out: qname versus prefix. Yes. If you read my last sentence above you will see why. :-) | I tend to side more with Greg on this matter: I'd rather have the | prefix split out for me. 4XPath and 4XSLT are absolutely littered | with SplitQName() calls that would be somewhat reduced in this case. OK. I am 100% agnostic on this one, and chose qname because that was what Java SAX did. I agree that what is usually wanted is the prefix, so I wouldn't be against changing it. However, I would like to check the arguments from the xml-dev discussions first. Will try do to so. | So deciding all over again, 5 and 8 both look attractive. As Greg | says, 8's modes can make genericizing SAX handlers (say for filters) | tricky. But on the other hand, there would have to be a raft of | conditionals for processing 5 generically. What are you thinking about when you say 'a raft of conditionals'? --Lars M. From paul@prescod.net Tue Jul 4 06:49:48 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 04 Jul 2000 00:49:48 -0500 Subject: [XML-SIG] SAX Namespaces References: <3961488E.68099597@prescod.net> <20000703202458.J29590@lyra.org> <20000703203250.K29590@lyra.org> Message-ID: <39617AFC.22EFB40F@prescod.net> Greg Stein wrote: > > ... > > > #2. def startElement( self, (uri,localname,qname), attrs ): > > This form is a bit more difficult to work with the uri/localname pair when > doing processing. True, but easier to work with the triple when passing from module to module -- which becomes a bigger deal when we start to talk about attributes. > This introduces a "mode" into the API. Depending on some flag, you get > entirely different data. Right. Depending on some flag *set by you*. > In essence the variant structure of the return value does a disservice to > creating a standard API. Depending on how the event generator is set up, you > could get entirely different data. That's a virtue, not a flaw. If you ask for namespace information you get it. If you aren't interested, you don't see it. -- Paul Prescod - Not encumbered by corporate consensus The great era of mathematical physics is now over; the 300-year effort to represent the material world in mathematical terms has exhausted itself. The understanding it was to provide is infinitely closer than it was when Isaac Newton wrote in the late seventeenth centruy, but it is still infinitely far away. - The Advent of the Algorithm, by David Berlinski From paul@prescod.net Tue Jul 4 15:06:02 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 04 Jul 2000 09:06:02 -0500 Subject: [XML-SIG] SAX namespaces discussion status References: Message-ID: <3961EF4A.D6615406@prescod.net> Lars Marius Garshol wrote: > > ... > > As near as I can tell, these are Paul's arguments against it: > > - it breaks backwards compatibility > > - SAX convenience is not important > > - performance for higher layers The other one, that I was not unclear about, is that a representation that bundles the three parts gives us a clear and consistent path to a low-level representation for attributes: [(name, value), ...] That's where the performance comes in. > I don't think the backwards compatibility argument carries much > weight. Names have changed anyway, and in rewriting the code adapting > the startElement / endElement methods is very little work. At least > it was for me, and I've rewritten heaps of example code for my book > for just this. Oh geez are we going to break another book! Anyhow, the more interesting backwards compatibility is between the namespaces and no-namespaces mode. You say: > [non-namespace processing] makes XML much more approachable for novices, and > I would prefer replacing the tuple with the qname. Any code that > looks at the internal structure of names for (uri, localname) will > assume namespace processing anyway, methinks. Now you've got all these handlers in novice mode like this: def startElement( self, name, qname, attrs ): ... Where name is always equal to qname! That strikes me as confusing and unhelpful. If we are making a namespaces-off mode then you shouldn't have to think about namespaces. I think that the most tenable compromise is working out to: def startElement( self, name, attrs ): def startElement( self, ((uri, localname,), qname), attrs ): where "qname" could be "qname" or "prefix" -- Paul Prescod - Not encumbered by corporate consensus The great era of mathematical physics is now over; the 300-year effort to represent the material world in mathematical terms has exhausted itself. The understanding it was to provide is infinitely closer than it was when Isaac Newton wrote in the late seventeenth centruy, but it is still infinitely far away. - The Advent of the Algorithm, by David Berlinski From paul@prescod.net Tue Jul 4 15:08:42 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 04 Jul 2000 09:08:42 -0500 Subject: [XML-SIG] SAX Namespaces References: <39610097.7932DD7E@prescod.net> <20000704021702.U29590@lyra.org> Message-ID: <3961EFEA.BE4C096C@prescod.net> Lars Marius Garshol wrote: > > | Of course, I don't know how to get the original prefix/rawname... > > As far as I can tell expat does not make it available. However, SAX > does not require this, so that's OK. In what sense does SAX not require it? Do you think that our main parser should pass in a None value for the rawname parameter? Many W3C specs require the rawname! -- Paul Prescod - Not encumbered by corporate consensus The great era of mathematical physics is now over; the 300-year effort to represent the material world in mathematical terms has exhausted itself. The understanding it was to provide is infinitely closer than it was when Isaac Newton wrote in the late seventeenth centruy, but it is still infinitely far away. - The Advent of the Algorithm, by David Berlinski From paul@prescod.net Tue Jul 4 15:54:45 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 04 Jul 2000 09:54:45 -0500 Subject: [XML-SIG] FW: pyexpat compilation errors - Python 2.0b1 References: Message-ID: <3961FAB5.83EA0D16@prescod.net> "Favas, Mark (EM, Floreat)" wrote: > > ... > > $ cc ./pyexpat.c > cc: Error: ./pyexpat.c, line 69: > The static declaration of "handler_info" is a tentative definition > and specifies an incomplete type. (incompstat) > > staticforward struct HandlerInfo handler_info[]; > ---------------------------------^ I don't know why cc needs to know the size of the array at that point! > If I replace this definition by > staticforward struct HandlerInfo handler_info[64]; > pyexpat then compiles (with the following warnings): > > (64 was just chosen at random - needs to be larger than 1, though...) That's a hack but I can't think of anything it would break. I'd hate to trade a compile error for a mysterious crash, though. I'll check with Python-dev and xml-sig. For the record, here's the structure we are predeclaring. statichere struct HandlerInfo handler_info[]= {{"StartElementHandler", pyxml_SetStartElementHandler, (void*)my_StartElementHandler}, {"EndElementHandler", pyxml_SetEndElementHandler, (void*)my_EndElementHandler}, {"ProcessingInstructionHandler", (xmlhandlersetter)XML_SetProcessingInstructionHandler, (void*)my_ProcessingInstructionHandler}, {"CharacterDataHandler", (xmlhandlersetter)XML_SetCharacterDataHandler, (void*)my_CharacterDataHandler}, {"UnparsedEntityDeclHandler", (xmlhandlersetter)XML_SetUnparsedEntityDeclHandler, (void*)my_UnparsedEntityDeclHandler }, {"NotationDeclHandler", (xmlhandlersetter)XML_SetNotationDeclHandler, (void*)my_NotationDeclHandler }, {"StartNamespaceDeclHandler", pyxml_SetStartNamespaceDeclHandler, (void*)my_StartNamespaceDeclHandler }, {"EndNamespaceDeclHandler", pyxml_SetEndNamespaceDeclHandler, (void*)my_EndNamespaceDeclHandler }, {"CommentHandler", (xmlhandlersetter)XML_SetCommentHandler, (void*)my_CommentHandler}, {"StartCdataSectionHandler", pyxml_SetStartCdataSection, (void*)my_StartCdataSectionHandler}, {"EndCdataSectionHandler", pyxml_SetEndCdataSection, (void*)my_EndCdataSectionHandler}, {"DefaultHandler", (xmlhandlersetter)XML_SetDefaultHandler, (void*)my_DefaultHandler}, {"DefaultHandlerExpand", (xmlhandlersetter)XML_SetDefaultHandlerExpand, (void*)my_DefaultHandlerExpandHandler}, {"NotStandaloneHandler", (xmlhandlersetter)XML_SetNotStandaloneHandler, (void*)my_NotStandaloneHandler}, {"ExternalEntityRefHandler", (xmlhandlersetter)XML_SetExternalEntityRefHandler, (void*)my_ExternalEntityRefHandler }, {NULL, NULL, NULL } /* sentinel */ }; -- Paul Prescod - Not encumbered by corporate consensus The great era of mathematical physics is now over; the 300-year effort to represent the material world in mathematical terms has exhausted itself. The understanding it was to provide is infinitely closer than it was when Isaac Newton wrote in the late seventeenth centruy, but it is still infinitely far away. - The Advent of the Algorithm, by David Berlinski From larsga@garshol.priv.no Tue Jul 4 16:12:56 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 04 Jul 2000 17:12:56 +0200 Subject: [XML-SIG] SAX Namespaces In-Reply-To: <3961EFEA.BE4C096C@prescod.net> References: <39610097.7932DD7E@prescod.net> <20000704021702.U29590@lyra.org> <3961EFEA.BE4C096C@prescod.net> Message-ID: * Paul Prescod | | [qname information] | | In what sense does SAX not require it? In the sense that this is what the Java SAX documentation (which I have so far followed) states that it is not required. See the table in section 1.3 of the document below. Then combine that with this statement. All XML readers are required to recognize the "http://xml.org/sax/features/namespaces" and the "http://xml.org/sax/features/namespace-prefixes" features (see below), and to support a true value for the namespaces property and a false value for the namespace-prefixes property | Do you think that our main parser should pass in a None value for | the rawname parameter? No. However, I don't see any way around that short of modifying the expat core code. Does anyone else see any solutions for that? --Lars M. From Juergen Hermann" Message-ID: <200007041537.RAA05494@statistik.cinetic.de> On Tue, 04 Jul 2000 09:54:45 -0500, Paul Prescod wrote: >For the record, here's the structure we are predeclaring. > >statichere struct HandlerInfo handler_info[]=3D >{{"StartElementHandler", > pyxml_SetStartElementHandler, > (void*)my_StartElementHandler}, ... > >{NULL, NULL, NULL } /* sentinel */ >}; Wouldn't it be best to forward declare the functions (which poses no problem), THEN define the static array as above, then define the functions. It's certainly more work than the current way (forward declar= e the static array), but also less exotic. The "64" there bugs me a lot, a= nd creates new portability problems I bet. Another way that imposes less work would be this: statichere struct HandlerInfo* handler_info =3D 0; ... statichere struct HandlerInfo handler_info_array[]=3D {{"StartElementHandler", ... }; void initpyexpat(){ handler_info =3D handler_info_array; ... } That also resembles the code that is used to backfill the PyType_Type pointers: Xmlparsetype.ob_type =3D &PyType_Type; Ciao, J=FCrgen -- J=FCrgen Hermann (jhe@webde-ag.de) WEB.DE AG, Amalienbadstr.41, D-76227 Karlsruhe Tel.: 0721/94329-0, Fax: 0721/94329-22 From uogbuji@fourthought.com Tue Jul 4 17:20:40 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 04 Jul 2000 10:20:40 -0600 Subject: [XML-SIG] SAX namespaces discussion status In-Reply-To: Message from Lars Marius Garshol of "04 Jul 2000 16:33:17 +0200." Message-ID: <200007041620.KAA16440@localhost.localdomain> > > * Lars Marius Garshol > | > | Paul listed four alternatives (the fifth seems to be identical with > | #4). Here is my, slightly modified, version of that list. The qname or > | prefix discussion we can leave for later, since it is really > | orthogonal to the name representation issue. > | > | [...list snipped...] > > * Uche Ogbuji > | > | There's one axis you left out: qname versus prefix. > > Yes. If you read my last sentence above you will see why. :-) Whoa! I guess you hit my blind spot. Sorry. > | So deciding all over again, 5 and 8 both look attractive. As Greg > | says, 8's modes can make genericizing SAX handlers (say for filters) > | tricky. But on the other hand, there would have to be a raft of > | conditionals for processing 5 generically. > > What are you thinking about when you say 'a raft of conditionals'? Well, not that you hold me to the test, I guess it's not really a "raft". Basically, if someone were writing a generic app with different actions in namespace and non-namespace mode, they would have to have a conditional such as: def startElement( self, name, qname, attrs ): if type(name) == type(()): uri, lname = name #namespace processing else: #non-namespace processing Not the end of the world, of course, but we must remember that there are applications, filters, for example, that would have to deal with either mode. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tpassin@home.com Tue Jul 4 17:46:32 2000 From: tpassin@home.com (tpassin@home.com) Date: Tue, 4 Jul 2000 12:46:32 -0400 Subject: [XML-SIG] SAX Namespaces References: <3961488E.68099597@prescod.net> Message-ID: <004201bfe5d7$68c16f60$7cac1218@reston1.va.home.com> Paul Prescod summarized five possibilities for passing element names around. I agree with Greg Stein that you should not return a different kind of beast depending on whether a processor option is on or off (NS support, in this case). So a name should not be a tuple one time, and a string another time. I also agree with him that you should group together the two key pieces so they can be used as a single unit, which is probably how they will normally be used. This means (uri, localname) belong together, and (uri,localname,qname) do not. ((uri,localname),qname) would work. So would ((uri,localname),prefix). I also using the prefix over the entire rawname. Cheers, Tom Passin Paul wrote: > Greg: > > Can somebody enumerate each of the options here so that we can restart the > > discussion? > > These are the ones I can keep track of: > > #1. def startElement( self, (uri, name), qname, attrs ): > .... > > Question 1: what should uri, name and qname get when namespace > processing is off? > Question 2: qname or prefix > > #2. def startElement( self, (uri,localname,qname), attrs ): > .... > > Same questions > > #3. def startElement( self, ((uri, localname), qname), atrs ): > .... > > Same questions. > > #4. def startElement( self, name, attrs ): > .... > > Depending on whether you have turned on namespace processing, "name" is > either "string" or (uri,localname,qname) > > Question: qname or prefix > > #5. def startElement( self, name, atrs ): > .... > > Same description and questions as above. > > ---- > From paul@prescod.net Tue Jul 4 18:48:14 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 04 Jul 2000 12:48:14 -0500 Subject: [XML-SIG] FW: pyexpat compilation errors - Python 2.0b1 References: <200007041537.RAA05474@statistik.cinetic.de> Message-ID: <3962235E.758F10F8@prescod.net> I like this solution. Work for you Mark? Juergen Hermann wrote: > > statichere struct HandlerInfo* handler_info = 0; > > ... > > statichere struct HandlerInfo handler_info_array[]= > {{"StartElementHandler", > ... > }; > > void > initpyexpat(){ > handler_info = handler_info_array; > ... > } -- Paul Prescod - Not encumbered by corporate consensus The distinction between the real twentieth century (1914-1999) and the calenderical one (1900-2000) is based on the convincing idea that the century's bouts of unprecented violence, both within nations and between them, possess a definite historical coherence -- that they constitute, to put it simply, a single story. - The Unfinished Twentieth Century, Jonathan Schell Harper's Magazine, January 2000 From fdrake@beopen.com Tue Jul 4 19:06:27 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Tue, 4 Jul 2000 14:06:27 -0400 (EDT) Subject: [XML-SIG] SAX Namespaces In-Reply-To: <20000704102421.C14382@newcnri.cnri.reston.va.us> References: <3960F894.2BBA026B@prescod.net> <20000703181046.A29590@lyra.org> <39614727.C188CA85@prescod.net> <14689.19442.805699.406444@cj42289-a.reston1.va.home.com> <20000704102421.C14382@newcnri.cnri.reston.va.us> Message-ID: <14690.10147.737639.416770@cj42289-a.reston1.va.home.com> Andrew Kuchling writes: > I assume you mean the HOWTO? (libpyexpat.tex is already checked in.) > Please do; let me know if you need the LaTeX source. The HOWTO and reference; I have the pyexpat doc. Yes, I'll need the LaTeX. -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From andy@reportlab.com Wed Jul 5 06:25:54 2000 From: andy@reportlab.com (Andy Robinson) Date: Wed, 5 Jul 2000 06:25:54 +0100 Subject: [XML-SIG] Wanted - Heroes Message-ID: (copying to xml-sig as this requires some parsing knowledge) I presume most of you have heard of the Software Carpentry competition (www.software-carpentry.com). If you have not, it is a design competition which will fund the winners to develop some really important Open Source software. Greg Wilson, who runs Software Carpentry, has asked if we can help them produce a bound set of PDF versions of their documents for the Open Source Conference in Monterey, July 17th. Technically I am sure we can do it, but the ReportLab core team is working flat out to get the documents ready for "Release 1.0". Does anyone have a few evenings to spare? This is a project up to two or three people could work on. We cannot offer money, but the glory will all be yours... The plan is as follows: 1. write a basic HTML-to-Flowables filter which initially handles a very simple set of tags. Use the very latest ReportLab snapshot (ftp.reportlab.com/current.tgz). It must parse HTML or XHTML and return a Platypus story (i.e. a list of flowables) as follows:

...

,

,

 will each correspond to a Platypus
paragraph with the formatting given in a stylesheet.  We already
support  and .

We will provide a very rudimentary doctemplate to process this
story, so ReportLab expertise is not necessary.

3. Make a CGI script on www.reportlab.com (we can provide access)
that lets someone submit their HTML inside a form and either
makes a PDF, or tells them the first tag it cannot handle and the
line it occurred on.  We get this running really early -
preferably as soon as one or two tags can be handled.

4. Add more tags.  We need images (between, not within
paragraphs), numbered and bulleted lists, and very very basic
tables (,,

>>> print node.firstChild.parentNode


>>> node2 = node.cloneNode(1)

>>> print node2.toxml()


>>> print node2.firstChild.parentNode
None

I just want to say 'node.parent.removeChild( node )' -- ie. remove a node
from a cloned branch, without knowing what the parent is.  Is there some
alternative I can use?

Thanks
Adam



From Brad Chapman   Sat Jul 15 00:57:42 2000
From: Brad Chapman  (Brad Chapman)
Date: Fri, 14 Jul 2000 19:57:42 -0400
Subject: [XML-SIG] No parents in cloned node?
Message-ID: <200007142357.TAA133682@archa10.cc.uga.edu>

Hi Adam;

> Please let me know if this is the wrong list for this.

Nope, this is the right list!

> So you clone a node, it has no parent.  That makes sense.  But the 
cloned
> node's children also have no parents.  Is there a reason for this?  

[....snip... problems with DOM...]

    From your example, it looks like you are using the DOM that comes 
with the pyXML XML-SIG package, right? This is probably actually the 
biggest problem -- that is official deprecated (I don't believe anyone 
is maintaining it any longer) and the 4Suite DOM 
(http://fourthought.com/4Suite/4DOM/) is becoming the new DOM that 
will be included in the XML-SIG package. Things are in a transition 
state right now, so it is kind of a mess. The most recent 4DOMs do 
have the xml/dom namespace, so you'll need to install the XML-SIG 
package, and then rm -rf the dom directory from this (the dom you've 
been using) then install 4DOM (4DOM also has rpms without xml/dom on 
their site to make it easier, I believe).
    4DOM is a very nice DOM implementation and is actively being 
developed. The developers are also very responsive to bug reports and 
questions. I think it would be very worthwhile to try this out -- you 
might find it'll work better for you.

Brad





From uogbuji@fourthought.com  Sun Jul 16 18:27:23 2000
From: uogbuji@fourthought.com (Uche Ogbuji)
Date: Sun, 16 Jul 2000 11:27:23 -0600
Subject: [XML-SIG] No parents in cloned node?
In-Reply-To: Message from Brad Chapman 
 of "Fri, 14 Jul 2000 19:57:42 EDT." <200007142357.TAA133682@archa10.cc.uga.edu>
Message-ID: <200007161727.LAA01293@localhost.localdomain>

Brad Chapman:

>     From your example, it looks like you are using the DOM that comes 
> with the pyXML XML-SIG package, right? This is probably actually the 
> biggest problem -- that is official deprecated (I don't believe anyone 
> is maintaining it any longer) and the 4Suite DOM 
> (http://fourthought.com/4Suite/4DOM/) is becoming the new DOM that 
> will be included in the XML-SIG package. Things are in a transition 
> state right now, so it is kind of a mess. The most recent 4DOMs do 
> have the xml/dom namespace, so you'll need to install the XML-SIG 
> package, and then rm -rf the dom directory from this (the dom you've 
> been using) then install 4DOM (4DOM also has rpms without xml/dom on 
> their site to make it easier, I believe).
>     4DOM is a very nice DOM implementation and is actively being 
> developed. The developers are also very responsive to bug reports and 
> questions. I think it would be very worthwhile to try this out -- you 
> might find it'll work better for you.

Thanks for the kind words, Brad, but we do really need to sort out the current 
package mess.

I think 4DOM is all integrated into the PyXML CVS.  It's at least one tag 
behind our internal version, but we'll update it before our next 4Suite 
release (RSN).  I think this means that CVS users do have 4DOM.

However, please correct me if I'm wrong or let me know if anyone has had 
problems with the package.

4XPath and 4XSLT have been a rather more complex matter.  We agreed to add 
them to the package as xml/xpath and xml/xslt.  I started this process last 
week, but a key problem immediately ocurred to me.

These packages do require C compilation and they are not integrated into the 
current distutils installer for PyXML.  Internally, we build them as follows:

BisonGen source -> lex/yacc -> C -> .dll/.so

BisonGen is our special tool to take a grammar specified in XML and create the 
lex, yacc and makefiles required to build a Python parser.

We usually do step one before releasing source, just to spare people from 
having to download BisonGen (even though we think it's pretty nifty: all you 
need to know is python and XML to make a pretty efficient parser: all C is 
auto-generated).

So how do we integrate this stuff into PyXML?  I've noticed that the C parts 
are separated from the Python parts and Windows .dll/.pyds are checked in 
wholly.  We could do the same thing, but this would make the PyXML tree look 
quite different from our internal tree.  Not hard to get around with a simple 
script, but worth noting.

We're probably going to incorporate PyXML into our next 4Suite release, and 
that might be a good integration test.

-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +01 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python




From akuchlin@mems-exchange.org  Sun Jul 16 19:03:14 2000
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Sun, 16 Jul 2000 14:03:14 -0400
Subject: [XML-SIG] No parents in cloned node?
In-Reply-To: <200007161727.LAA01293@localhost.localdomain>; from uogbuji@fourthought.com on Sun, Jul 16, 2000 at 11:27:23AM -0600
References:  <200007161727.LAA01293@localhost.localdomain>
Message-ID: <20000716140314.A6681@newcnri.cnri.reston.va.us>

On Sun, Jul 16, 2000 at 11:27:23AM -0600, Uche Ogbuji wrote:
>Thanks for the kind words, Brad, but we do really need to sort out the current
>package mess.

For fixing the 4DOM/PyDOM problem, I think we need to cut a new
packaged release of the current CVS tree.  *However*, before doing
that, I think we want to get a copy of Sean's book and make sure none
of its examples are broken (or, if they are, add the minimum of glue
code to keep them working).

For the lex/yacc stuff, the setup.py script will have to be modified
to build the .c files and compile them; worthwhile, since it'll also
test Distutils support for such generated files.

--amk


From dieter@handshake.de  Sun Jul 16 17:54:53 2000
From: dieter@handshake.de (Dieter Maurer)
Date: Sun, 16 Jul 2000 18:54:53 +0200 (CEST)
Subject: [XML-SIG] Proposed XPath API
In-Reply-To: <396BA154.EB948981@prescod.net>
References: <396B7164.D8CA9111@prescod.net>
 <396BA154.EB948981@prescod.net>
Message-ID: <14705.59561.299237.250493@lindm.dm>

Paul Prescod writes:
 > Mike Olson wrote:
 > > 
 > >...
 > >
 > > I think these will have to be 2 different objects.  In XPath the
 > > difference is Path vs expression.  Its a matter of where you start in
 > > the EBNF.  
 > >
 > > A path is used to select, and an expression is used to
 > > match.
 > 
 > Does it require two APIs, though? You can compile anything as an
 > expression, right? So let's say you do that. Then you could throw an
 > exception in select() if the expression doesn't return a nodelist. Or
 > else you could just return the evaluated result and not worry about it.
Evaluating for a boolean rather than a node list may make a big
efficiency gain.


Dieter


From stephen.boyle@gbst.com  Mon Jul 17 01:50:29 2000
From: stephen.boyle@gbst.com (Stephen Boyle)
Date: Mon, 17 Jul 2000 10:50:29 +1000
Subject: [XML-SIG] Parsing Control Characters
Message-ID: <1CDB101F0CB6D311882F0000F80639240150C321@aquarius.bne.star.com.au>

Hi,

I am trying to send keystrokes in the body of an XML Message, but the parser
falls over whenever I send a Control Keys, such as .  

I am using Python v1.5.2.  I call saxexts.make_parser() which returns a
pyexpat Parser.  I have also tried using xml.dom.builder to generate the
actual XML code.

Can anyone point me in the right direction?

Thanks
Steve Boyle



From paul@prescod.net  Mon Jul 17 02:11:54 2000
From: paul@prescod.net (Paul Prescod)
Date: Sun, 16 Jul 2000 20:11:54 -0500
Subject: [XML-SIG] Parsing Control Characters
References: <1CDB101F0CB6D311882F0000F80639240150C321@aquarius.bne.star.com.au>
Message-ID: <39725D5A.900B9175@prescod.net>

Stephen Boyle wrote:
> 
> Hi,
> 
> I am trying to send keystrokes in the body of an XML Message, but the parser
> falls over whenever I send a Control Keys, such as .

Most control characters are simply disallowed in XML. Send them as two
characters:

["^","F"], ["^","G"] and so forth

-- 
 Paul Prescod - Not encumbered by corporate consensus
It's difficult to extract sense from strings, but they're the only
communication coin we can count on. 
	- http://www.cs.yale.edu/~perlis-alan/quotes.html


From jerome@IDEALX.com  Mon Jul 17 13:17:29 2000
From: jerome@IDEALX.com (Jérôme Marant)
Date: 17 Jul 2000 14:17:29 +0200
Subject: [XML-SIG] Changes from 0.5.1 to 0.5.5
Message-ID: <641z0t6jdy.fsf@amboise.ird.idealx.com>

Hi,

I recently took over the Debian packaging of PyXml. I didn't
find any "CHANGES" file in the tarball so I had a look to
the directory tree and I noticed several significants
differences: some documentations disappeared and I don't
know whether it is normal. I sure won't package anything
if there is a lack of documentation, considering that it
was present is the latter versions.

Here are the results of my investigations. Could someone
give me some details about all of these files ?

Thanks.

----------------------------

python-xml-0.5.1/arch:
total 56
drwxr-xr-x    2 1000     1000         4096 avr 11  1999 .
drwxr-xr-x   17 1000     1000         4096 jun  2 11:17 ..
-rw-r--r--    1 1000     1000           25 mar 27  1999 __init__.py
-rw-r--r--    1 1000     1000        42040 mar 26  1999 xmlarch.py

python-xml-0.5.1/demo/arch:
total 68
drwxr-xr-x    2 1000     1000         4096 avr 11  1999 .
drwxr-xr-x   10 1000     1000         4096 avr 11  1999 ..
-rw-r--r--    1 1000     1000         1116 oct 17  1998 README
-rw-r--r--    1 1000     1000         2681 mar 26  1999 archtest.py
-rw-r--r--    1 1000     1000          244 mar 26  1999 biblio1.out
-rw-r--r--    1 1000     1000          254 oct 17  1998 biblio2.out
-rw-r--r--    1 1000     1000          925 mar 26  1999 complex.py
-rw-r--r--    1 1000     1000         1469 oct 17  1998 complex.xml
-rw-r--r--    1 1000     1000          354 mar 26  1999 gi.xml
-rw-r--r--    1 1000     1000          349 oct 17  1998 html.out
-rw-r--r--    1 1000     1000         1763 mar 26  1999 minitest.py
-rwxr-xr-x    1 1000     1000         1684 mar 26  1999 original.py
-rw-r--r--    1 1000     1000          803 mar 26  1999 rename.xml
-rw-r--r--    1 1000     1000          561 oct 29  1998 simple.py
-rw-r--r--    1 1000     1000          296 oct 17  1998 simple.xml
-rw-r--r--    1 1000     1000          385 sep 16  1998 twoforms.xml
-rw-r--r--    1 1000     1000          660 sep 16  1998 xsademo.xml

python-xml-0.5.1/demo/dom:
total 20
-rwxr-xr-x    1 1000     1000         1903 nov 18  1998 html2html

python-xml-0.5.1/demo/quotes:
total 36
-rw-r--r--    1 1000     1000        13419 d=E9c 16  1998 qtfmt.py
-rw-r--r--    1 1000     1000         1015 sep 15  1998 quotations.dtd

python-xml-0.5.1/demo/unicode:
total 28
drwxr-xr-x    2 1000     1000         4096 avr 11  1999 .
drwxr-xr-x   10 1000     1000         4096 avr 11  1999 ..
-rw-r--r--    1 1000     1000         2889 nov 16  1998 README.unicode
-rw-r--r--    1 1000     1000          668 nov 16  1998 test1.de.po
-rw-r--r--    1 1000     1000          658 nov 16  1998 test1.po

python-xml-0.5.1/demo/xbel:
total 52
drwxr-xr-x    2 1000     1000         4096 avr 11  1999 doc
-rw-r--r--    1 1000     1000         2727 oct 30  1998 xbel.dtd

python-xml-0.5.1/demo/xbel/doc:
total 44
drwxr-xr-x    2 1000     1000         4096 avr 11  1999 .
drwxr-xr-x    3 1000     1000         4096 jun  2 11:17 ..
-rw-r--r--    1 1000     1000         3587 nov  5  1998 xbel.bib
-rw-r--r--    1 1000     1000        30617 d=E9c  3  1998 xbel.tex

python-xml-0.5.1/demo/xmlproc:
total 52
-rw-r--r--    1 1000     1000          558 f=E9v 10  1999 catalog.soc
drwxr-xr-x    2 1000     1000         4096 avr 11  1999 dtds

python-xml-0.5.1/demo/xmlproc/dtds:
total 16
drwxr-xr-x    2 1000     1000         4096 avr 11  1999 .
drwxr-xr-x    3 1000     1000         4096 avr 11  1999 ..
-rw-r--r--    1 1000     1000         2727 f=E9v 10  1999 xbel-1.0.dtd
-rw-r--r--    1 1000     1000          627 f=E9v 10  1999 xsa.dtd

python-xml-0.5.1/doc:
total 248
drwxr-xr-x    3 1000     1000         4096 avr 11  1999 .
drwxr-xr-x   17 1000     1000         4096 jun  2 11:17 ..
-rw-r--r--    1 1000     1000         1108 oct 17  1998 index.html
-rw-r--r--    1 1000     1000         3265 oct 17  1998 indices.html
-rw-r--r--    1 1000     1000          853 f=E9v 10  1999 pythondoc.css
-rw-r--r--    1 1000     1000        11220 mar 26  1999 xml.arch.xmlarc=
h.ArchDocHandler.html
-rw-r--r--    1 1000     1000         3387 mar 26  1999 xml.arch.xmlarc=
h.ArchException.html
-rw-r--r--    1 1000     1000         5966 mar 26  1999 xml.arch.xmlarc=
h.ArchParseState.html
-rw-r--r--    1 1000     1000         9383 mar 26  1999 xml.arch.xmlarc=
h.Architecture.html
-rw-r--r--    1 1000     1000         1904 mar 26  1999 xml.arch.xmlarc=
h.AttributeParser.html
-rw-r--r--    1 1000     1000         2352 mar 26  1999 xml.arch.xmlarc=
h.EventTracker.html
-rw-r--r--    1 1000     1000         7966 mar 26  1999 xml.arch.xmlarc=
h.Prettifier.html
-rw-r--r--    1 1000     1000         3700 mar 26  1999 xml.arch.xmlarc=
h.html
-rw-r--r--    1 1000     1000        38467 mar 26  1999 xml.arch.xmlarc=
h.xml

python-xml-0.5.1/dom:
total 124
-rw-r--r--    1 1000     1000         1481 ao=FB  1  1998 README

python-xml-0.5.1/parsers:
total 64
-rw-r--r--    1 1000     1000        25594 d=E9c 16  1998 xmllib.py

python-xml-0.5.1/pyexpat:
total 44
drwxr-xr-x    2 1000     1000         4096 avr 11  1999 .
drwxr-xr-x   17 1000     1000         4096 jun  2 11:17 ..
-rw-r--r--    1 1000     1000          411 ao=FB  1  1998 _checkversion=
.py
-rw-r--r--    1 1000     1000           12 ao=FB  1  1998 pyexpat.prj.e=
xp
-rw-r--r--    1 1000     1000        26645 ao=FB  1  1998 pyexpat.prj.h=
qx

python-xml-0.5.1/test:
total 88
-rwxr-xr-x    1 1000     1000         2841 d=E9c  3  1998 test_arch.py

python-xml-0.5.1/test/output:
total 60
-rw-r--r--    1 1000     1000          903 d=E9c  3  1998 test_arch

----------------------------------

--=20
J=E9r=F4me Marant 

 -----------------------------------------------------------
| IDEALX - Open Source Engineering / Ing=E9nierie Open Source |
| http://IDEALX.com                                         |
 -----------------------------------------------------------


From rob@hooft.net  Tue Jul 18 07:47:20 2000
From: rob@hooft.net (Rob W. W. Hooft)
Date: Tue, 18 Jul 2000 08:47:20 +0200 (CEST)
Subject: [XML-SIG] CVS update: error messages from "setup.py install"
Message-ID: <14707.64888.363170.649965@temoleh.chem.uu.nl>

I just did a cvs update, build, install cycle (previous was very long
ago), and noticed the following:

--------------------------
  File "/usr/local/nonius/lib/python2.0/site-packages/xml/parsers/xmlproc/catalog.py", line 4
    """
An SGML Open catalog file parser.
$Id: catalog.py,v 1.8 2000/05/12 18:39:58 lars Exp $
"""
       
                                 
                                                    
   ^
SyntaxError: invalid syntax
byte-compiling /usr/local/nonius/lib/python2.0/site-packages/xml/parsers/xmlproc/charconv.py to charconv.pyc
  File "/usr/local/nonius/lib/python2.0/site-packages/xml/parsers/xmlproc/charconv.py", line 1
    
    ^
SyntaxError: invalid syntax
--------------------------
[and so on, a few other ones.]

I haven't tried to find out what is happening here, but I did notice
that these files have "DOS" line endings (cr/lf).

Regards,
-- 
=====   rob@hooft.net          http://www.hooft.net/people/rob/  =====
=====   R&D, Nonius BV, Delft  http://www.nonius.nl/             =====
===== PGPid 0xFA19277D ========================== Use Linux! =========


From alexandre.fayolle@free.fr  Tue Jul 18 09:12:58 2000
From: alexandre.fayolle@free.fr (Alexandre Fayolle)
Date: Tue, 18 Jul 2000 10:12:58 +0200
Subject: [XML-SIG] CVS update: error messages from "setup.py install"
In-Reply-To: <14707.64888.363170.649965@temoleh.chem.uu.nl>
References: <14707.64888.363170.649965@temoleh.chem.uu.nl>
Message-ID: <963907978.3974118ad9f1f@imp.free.fr>

Hello everyone, 

I'm new to the list, so please excuse me if there are some implicit rules that
I'm not aware of...

Here are the modifications I had to make in order to get things working (a bit)
when I cvs checkout'ed the xml repository a couple of days ago :

apply dos2unix on the following files (a pain now, 'cos it fills up cvs diffs
with junk modifications) :

xml/demo/xmlproc/dtd2schema.py
xml/demo/xmlproc/dtdcmd.py
xml/demo/xmlproc/outputters.py
xml/demo/xmlproc/wxValidator.py
xml/xml/parsers/xmlproc/catalog.py
xml/xml/parsers/xmlproc/charconv.py
xml/xml/parsers/xmlproc/dtdparser.py
xml/xml/parsers/xmlproc/errors.py
xml/xml/parsers/xmlproc/namespace.py
xml/xml/parsers/xmlproc/utils.py
xml/xml/parsers/xmlproc/xcatalog.py
xml/xml/parsers/xmlproc/xmlapp.py
xml/xml/parsers/xmlproc/xmldtd.py
xml/xml/parsers/xmlproc/xmlproc.py
xml/xml/parsers/xmlproc/xmlutils.py
xml/xml/parsers/xmlproc/xmlval.py
xml/xml/sax/sax2exts.py
xml/xml/sax/saxexts.py
xml/xml/sax/saxlib.py
xml/xml/sax/saxutils.py
xml/xml/sax/drivers/drv_htmllib.py
xml/xml/sax/drivers/drv_pyexpat.py
xml/xml/sax/drivers/drv_sgmllib.py
xml/xml/sax/drivers/drv_sgmlop.py
xml/xml/sax/drivers/drv_xmldc.py
xml/xml/sax/drivers/drv_xmllib.py
xml/xml/sax/drivers/drv_xmlproc.py
xml/xml/sax/drivers/drv_xmlproc_val.py
xml/xml/sax/drivers/drv_xmltoolkit.py
xml/xml/sax/drivers/pylibs.py
xml/xml/sax/drivers2/drv_pyexpat.py
xml/xml/sax/drivers2/drv_xmlproc.py



I also made a few changes in 4DOM, it looks like someone has been wring too much
C code :o). Anyway, here is what cvs diff tells me I did (mind the line wrapping
of my mail client) :
Index: xml/xml/dom/TreeWalker.py
===================================================================
RCS file: /cvsroot/pyxml/xml/xml/dom/TreeWalker.py,v
retrieving revision 1.2
diff -r1.2 TreeWalker.py
201c201
<             return __dict__['__filter'].acceptNode(node)
---
>             return self.__dict__['__filter'].acceptNode(node)
219c219
<                             _readOnlyAttrs + _readComputedAttrs.keys())
---
>                             Node._readOnlyAttrs +
_readComputedAttrs.keys())
Index: xml/xml/dom/html/HTMLSelectElement.py
===================================================================
RCS file: /cvsroot/pyxml/xml/xml/dom/html/HTMLSelectElement.py,v
retrieving revision 1.3
diff -r1.3 HTMLSelectElement.py
88c88
<         for ctr in range(len(options))
---
>         for ctr in range(len(options)) :
125c125
<         return = implementation._4dom_createHTMLCollection(children)
---
>         return implementation._4dom_createHTMLCollection(children)
Index: xml/xml/dom/html/HTMLTableElement.py
===================================================================
RCS file: /cvsroot/pyxml/xml/xml/dom/html/HTMLTableElement.py,v
retrieving revision 1.3
diff -r1.3 HTMLTableElement.py
223c223
<                 if child.tagName = 'TFOOT':
---
>                 if child.tagName == 'TFOOT':
225c225
<                 elif child.tagName = 'TBODY':
---
>                 elif child.tagName == 'TBODY':
Index: xml/xml/dom/test_suite/newtest_node.py
===================================================================
RCS file: /cvsroot/pyxml/xml/xml/dom/test_suite/newtest_node.py,v
retrieving revision 1.1
diff -r1.1 newtest_node.py
12c12
<           doc = implementation.createDocument(None,'ROOT',dt);
---
>         doc = implementation.createDocument(None,'ROOT',dt)
32c32
<         if self.pNode.nodeValue != 'NODE_VALUE'
---
>         if self.pNode.nodeValue != 'NODE_VALUE':
39,44c39
<     def test
< 
< 
< 
< 
< 
---
>     def test(self):


Happy commiting !

By the way, I noticed a number of 'pass' statements in 4DOM source files. I'm by
no way a python guru, and could not imagine why they are here. Could someone
enlighten me ?

Alexandre Fayolle


From paul@prescod.net  Tue Jul 18 15:38:30 2000
From: paul@prescod.net (Paul Prescod)
Date: Tue, 18 Jul 2000 09:38:30 -0500
Subject: [XML-SIG] CVS update: error messages from "setup.py install"
References: <14707.64888.363170.649965@temoleh.chem.uu.nl> <963907978.3974118ad9f1f@imp.free.fr>
Message-ID: <39746BE6.D1AA4EC@prescod.net>

Alexandre Fayolle wrote:
> 
> ...
> 
> Happy commiting !
> 
> By the way, I noticed a number of 'pass' statements in 4DOM source files. I'm by
> no way a python guru, and could not imagine why they are here. Could someone
> enlighten me ?

A pre-processor stripped out debut statements and replaced them with
"pass".

-- 
 Paul Prescod - Not encumbered by corporate consensus
Just how compassionate can a Republican get before he has to leave the 
GOP and join Vegans for Global Justice? ... One moment, George W. Bush
is holding a get-to-know-you meeting with a bunch of gay Republicans.
The next he is holding forth on education or the environment ... It is
enough to make a red-blooded conservative choke on his spotted-owl
drumstick.     - April 29th, Economist


From paul@prescod.net  Tue Jul 18 15:38:37 2000
From: paul@prescod.net (Paul Prescod)
Date: Tue, 18 Jul 2000 09:38:37 -0500
Subject: [XML-SIG] CVS update: error messages from "setup.py install"
References: <14707.64888.363170.649965@temoleh.chem.uu.nl> <963907978.3974118ad9f1f@imp.free.fr>
Message-ID: <39746BED.7AF206CF@prescod.net>

Alexandre Fayolle wrote:
> 
> ...
> 
> Happy commiting !
> 
> By the way, I noticed a number of 'pass' statements in 4DOM source files. I'm by
> no way a python guru, and could not imagine why they are here. Could someone
> enlighten me ?

A pre-processor stripped out debug statements and replaced them with
"pass".

-- 
 Paul Prescod - Not encumbered by corporate consensus
Just how compassionate can a Republican get before he has to leave the 
GOP and join Vegans for Global Justice? ... One moment, George W. Bush
is holding a get-to-know-you meeting with a bunch of gay Republicans.
The next he is holding forth on education or the environment ... It is
enough to make a red-blooded conservative choke on his spotted-owl
drumstick.     - April 29th, Economist


From alexandre.fayolle@free.fr  Tue Jul 18 16:19:11 2000
From: alexandre.fayolle@free.fr (Alexandre Fayolle)
Date: Tue, 18 Jul 2000 17:19:11 +0200
Subject: [XML-SIG] 4DOM: Weird importNode behaviour
Message-ID: <963933551.3974756f439b8@imp.free.fr>

Hello,

I've came accross a weird behaviour using importNode (I'm using the latest
version of xml package from the SIG CVS repository, with the patches I
mentionned in my previous mail) : when importing a node, I loose all attributes
but one. The following code exibits the behaviour :

-------------------8<------------------------------------------------------

from xml.dom.ext.reader import Sax2
from xml.dom.ext.Printer import PrintVisitor

tree1 = """"""

tree2 = """"""

if __name__ == '__main__':
    doc1 = Sax2.FromXml(tree1,None,0,1)
    doc2 = Sax2.FromXml(tree2,None,0,1)

    child = doc2.documentElement.firstChild

    doc1.documentElement.appendChild(doc1.importNode(child,1))
    pv=PrintVisitor()
    print pv.visitDocument(doc2)
    print
    print pv.visitDocument(doc1)
    
------------------------------8<-------------------------------------------

I'd expect to get twice the same line (and this line to be the same as tree2,
with the possible exception on the ordering of child's attributes), however,
what I get is :
[alf@leo alf]$ python domimport.py

 
   

Am I missing something ?

Thanks in advance for your help.                                     


Alexandre Fayolle


From alexandre.fayolle@free.fr  Tue Jul 18 17:48:46 2000
From: alexandre.fayolle@free.fr (Alexandre Fayolle)
Date: Tue, 18 Jul 2000 18:48:46 +0200
Subject: [XML-SIG] 4DOM: Problem cloning attributes
Message-ID: <963938926.39748a6e5ccdb@imp.free.fr>

Hello,

[I'm not sure a previous mail reporting a weird problem with importNode has made
it to the xml-sig list, since I have not received it myself. If so, this is a
narrowing of this problem]

I'm using 4Dom from the XML-SIG CVS repository.

I think I've found an issue on namespace when cloning attribute nodes. I have a
fix, but am not sure that it will work in all cases.

The problem is the following :

----------------------8<-------------------------------------------
from xml.dom.ext.reader import Sax2

tree = """"""

if __name__ == '__main__':
    doc = Sax2.FromXml(tree,None,0,1)

    child = doc.documentElement.firstChild
    print child
    clone = child.cloneNode(1)
    print clone
----------------------8<-------------------------------------------

The output of the script is :
[alf@leo alf]$ python domimport.py

                                                               

I've lost an attribute during cloning.

By editing Attr.__repr__(), I've been able to notice the following changes
between the original attribute and the clone :
cloning attr 
cloned attr       

The prefix is changed from None to empty string, and the local name from 'bar'
to empty string.

This comes from Attr.cloneNode, where ownerDocument.createAttribute(self.name)
is used, instead of createAttributeNS(self.namespaceURI,self.name)

Therefore, I propose the following implementation for Attr.clone (inspired by
what is done in Element.cloneNode :

        def cloneNode(self, deep, node=None, newOwner=None):
        if node == None:
            if newOwner == None:
                if self.ownerDocument._4dom_isNsAware:
                    node =\
 self.ownerDocument.createAttributeNS(self.namespaceURI,self.name)
                else :
                    node = self.ownerDocument.createAttribute(self.name)
            else:
                if self.ownerDocument._4dom_isNsAware:
                    node =\
 newOwner.createAttributeNS(self.namespaceURI,self.name)
                else :
                    node = newOwner.createAttribute(self.name)
        #Clone from our ancestors
        Node.cloneNode(self, deep, node)
        return node
               

Alexandre Fayolle


From uogbuji@fourthought.com  Tue Jul 18 21:10:14 2000
From: uogbuji@fourthought.com (Uche Ogbuji)
Date: Tue, 18 Jul 2000 14:10:14 -0600
Subject: [XML-SIG] CVS update: error messages from "setup.py install"
In-Reply-To: Message from Paul Prescod 
 of "Tue, 18 Jul 2000 09:38:30 CDT." <39746BE6.D1AA4EC@prescod.net>
Message-ID: <200007182010.OAA10555@localhost.localdomain>

> > By the way, I noticed a number of 'pass' statements in 4DOM source files. I'm by
> > no way a python guru, and could not imagine why they are here. Could someone
> > enlighten me ?
> 
> A pre-processor stripped out debut statements and replaced them with
> "pass".

Not after the next release.  We made the pre-processor a bit smarter.

-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python




From jeremy.kloth@fourthought.com  Wed Jul 19 16:51:51 2000
From: jeremy.kloth@fourthought.com (Jeremy J Kloth)
Date: Wed, 19 Jul 2000 09:51:51 -0600
Subject: [XML-SIG] 4DOM: Problem cloning attributes
References: <963938926.39748a6e5ccdb@imp.free.fr>
Message-ID: <00b401bff199$417d8a60$1b01a8c0@fourthought.com>

> Hello,
>
> [I'm not sure a previous mail reporting a weird problem with importNode
has made
> it to the xml-sig list, since I have not received it myself. If so, this
is a
> narrowing of this problem]
>
> I'm using 4Dom from the XML-SIG CVS repository.
>
> I think I've found an issue on namespace when cloning attribute nodes. I
have a
> fix, but am not sure that it will work in all cases.
>
> The problem is the following :
>
> ----------------------8<-------------------------------------------
> from xml.dom.ext.reader import Sax2
>
> tree = """"""
>
> if __name__ == '__main__':
>     doc = Sax2.FromXml(tree,None,0,1)
>
>     child = doc.documentElement.firstChild
>     print child
>     clone = child.cloneNode(1)
>     print clone
> ----------------------8<-------------------------------------------
>
> The output of the script is :
> [alf@leo alf]$ python domimport.py
> 
>  children>
>
> I've lost an attribute during cloning.
> [... snip ...]
> Alexandre Fayolle

The problem that you encountered has been fixed in our upcoming release (a
day or two). We will be updating the CVS repository at that time.  Thank you
for pointing that out to us.  It slipped through our test net. However, we
have done a complete re-write of the clone and import sections, so the fix
is not usable, but very much appriciated.

--
Jeremy Kloth                        Consultant
jeremy.kloth@fourthought.com        (303)583-9900 x 102
Fourthought, Inc.                   http://www.fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python






From uche.ogbuji@fourthought.com  Fri Jul 21 02:31:04 2000
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Thu, 20 Jul 2000 19:31:04 -0600
Subject: [XML-SIG] Great Reason for 4Suite Release delay
Message-ID: <3977A7D8.8C1137F4@fourthought.com>

We were expecting to have 4XSLT 0.9.2, etc. out late last week, but as
always, our hacking instincts got in the way of the "release early,
release often" philosophy.  We started optimizing and couldn't stop.

As of now, on several of our benchmark documents, 4XSLT is now 100 times
faster than release 0.9.1.  Yes, I meant 100.  One 200K document that
used to take 5 minutes to render now takes 3 seconds.  Not all documents
show such dramatic improvement but the smallest speedup we've seen for
any somewhat normal stylesheets and documents is well over ten-fold.

We're not even yet done with the optimizations, but it's not
inconceivable that 4XSLT is as fast as any of the Java processors.  It's
typically within 10% of saxon.

At any rate, we hope to soon get on with the packaging so our users can
get the benefits.

Thanks.

-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From morton@dennisinter.com  Fri Jul 21 17:07:21 2000
From: morton@dennisinter.com (Damien Morton)
Date: Fri, 21 Jul 2000 12:07:21 -0400
Subject: [XML-SIG] RE: [4suite] Great Reason for 4Suite Release delay
In-Reply-To: <3977A7D8.8C1137F4@fourthought.com>
Message-ID: 

Wow! Congratulations on such an amazing speedup.

Id love to hear a summary of the kinds of things done to achieve this
speedup.

> -----Original Message-----
> From: 4suite-admin@dollar.fourthought.com
> [mailto:4suite-admin@dollar.fourthought.com]On Behalf Of Uche Ogbuji
> Sent: Thursday, July 20, 2000 9:31 PM
> To: 4suite@fourthought.com; xml-sig@python.org
> Subject: [4suite] Great Reason for 4Suite Release delay
>
>
> We were expecting to have 4XSLT 0.9.2, etc. out late last week, but as
> always, our hacking instincts got in the way of the "release early,
> release often" philosophy.  We started optimizing and couldn't stop.
>
> As of now, on several of our benchmark documents, 4XSLT is now 100 times
> faster than release 0.9.1.  Yes, I meant 100.  One 200K document that
> used to take 5 minutes to render now takes 3 seconds.  Not all documents
> show such dramatic improvement but the smallest speedup we've seen for
> any somewhat normal stylesheets and documents is well over ten-fold.
>
> We're not even yet done with the optimizations, but it's not
> inconceivable that 4XSLT is as fast as any of the Java processors.  It's
> typically within 10% of saxon.
>
> At any rate, we hope to soon get on with the packaging so our users can
> get the benefits.
>
> Thanks.
>
> --
> Uche Ogbuji                               Principal Consultant
> uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
> Fourthought, Inc.                         http://Fourthought.com
> 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
> Software-engineering, knowledge-management, XML, CORBA, Linux, Python
> _______________________________________________
> 4suite mailing list
> 4suite@lists.fourthought.com
> http://lists.fourthought.com/mailman/listinfo/4suite
>



From paul@prescod.net  Fri Jul 21 17:26:32 2000
From: paul@prescod.net (Paul Prescod)
Date: Fri, 21 Jul 2000 11:26:32 -0500
Subject: [XML-SIG] Great Reason for 4Suite Release delay
References: <3977A7D8.8C1137F4@fourthought.com>
Message-ID: <397879B8.4D7FA534@prescod.net>

Uche Ogbuji wrote:
> 
> We were expecting to have 4XSLT 0.9.2, etc. out late last week, but as
> always, our hacking instincts got in the way of the "release early,
> release often" philosophy.  We started optimizing and couldn't stop.
> 
> As of now, on several of our benchmark documents, 4XSLT is now 100 times
> faster than release 0.9.1.  Yes, I meant 100.  One 200K document that
> used to take 5 minutes to render now takes 3 seconds.  
> ...

That's amazing progress. Dare I hope that some of that is in your DOM
and XPath which are useful even beyond XSLT?

-- 
 Paul Prescod - Not encumbered by corporate consensus
"Hardly anything more unwelcome can befall a scientific writer than 
having the foundations of his edifice shaken after the work is 
finished.  I have been placed in this position by a letter from 
Mr. Bertrand Russell..." 
 - Frege, Appendix of Basic Laws of Arithmetic (of Russell's Paradox)


From paul@prescod.net  Fri Jul 21 17:26:41 2000
From: paul@prescod.net (Paul Prescod)
Date: Fri, 21 Jul 2000 11:26:41 -0500
Subject: [XML-SIG] Great Reason for 4Suite Release delay
References: <3977A7D8.8C1137F4@fourthought.com>
Message-ID: <397879C1.51094EF5@prescod.net>

Uche Ogbuji wrote:
> 
> We were expecting to have 4XSLT 0.9.2, etc. out late last week, but as
> always, our hacking instincts got in the way of the "release early,
> release often" philosophy.  We started optimizing and couldn't stop.
> 
> As of now, on several of our benchmark documents, 4XSLT is now 100 times
> faster than release 0.9.1.  Yes, I meant 100.  One 200K document that
> used to take 5 minutes to render now takes 3 seconds.  
> ...

That's amazing progress. Dare I hope that some of that is in your DOM
and XPath which are useful even beyond XSLT?

-- 
 Paul Prescod - Not encumbered by corporate consensus
"Hardly anything more unwelcome can befall a scientific writer than 
having the foundations of his edifice shaken after the work is 
finished.  I have been placed in this position by a letter from 
Mr. Bertrand Russell..." 
 - Frege, Appendix of Basic Laws of Arithmetic (of Russell's Paradox)


From Mike.Olson@fourthought.com  Fri Jul 21 17:46:59 2000
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Fri, 21 Jul 2000 10:46:59 -0600
Subject: [XML-SIG] Great Reason for 4Suite Release delay
References: <3977A7D8.8C1137F4@fourthought.com> <397879B8.4D7FA534@prescod.net>
Message-ID: <39787E83.AAFCF08C@FourThought.com>

Paul Prescod wrote:
> 
> Uche Ogbuji wrote:
> >
> > ...
> 
> That's amazing progress. Dare I hope that some of that is in your DOM
> and XPath which are useful even beyond XSLT?

Paul,
    Alot of the changes were in XPath as well.  One example of changes
is the order that we did Steps in.  We used to Select the axis and then
apply the node test to each in the axis.  This was fine for child::, but
horribly slow for descendant::  So now we do both at the same time.

Mike

> 
> --
>  Paul Prescod - Not encumbered by corporate consensus
> "Hardly anything more unwelcome can befall a scientific writer than
> having the foundations of his edifice shaken after the work is
> finished.  I have been placed in this position by a letter from
> Mr. Bertrand Russell..."
>  - Frege, Appendix of Basic Laws of Arithmetic (of Russell's Paradox)
> 
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://www.python.org/mailman/listinfo/xml-sig

-- 
Mike Olson				 Principal Consultant
mike.olson@fourthought.com               (303)583-9900 x 102
Fourthought, Inc.                         http://Fourthought.com 
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Fri Jul 21 18:37:05 2000
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Fri, 21 Jul 2000 11:37:05 -0600
Subject: [XML-SIG] Re: [4suite] Great Reason for 4Suite Release delay
References: 
Message-ID: <39788A41.D70B943D@fourthought.com>

Damien Morton wrote:
> 
> Wow! Congratulations on such an amazing speedup.
> 
> Id love to hear a summary of the kinds of things done to achieve this
> speedup.

An honest statement of what we've done is kill the lion's share of
performance *bugs*.  This mostly involved redundant loops and tree
descents in XPath, with some clean-up of the interface between XPath and
XSLT.

The only design optimization we put in was to develop a micro-dom for
XPath that we call "Domlette".  This is a scaled-back, read-only DOM
subset specialized for XPath (along the spirit if not implementation of
Xalan's DTM).  This was perhaps a fifth of the performance gains we had,
with most of the other gains coming from fixing performance bugs.

We still haven't done many of the performance design improvements we're
planning such as better indexing and expression optimization.  These
will mostly benefit documents in the megabytes.

-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From tpassin@home.com  Fri Jul 21 19:24:57 2000
From: tpassin@home.com (tpassin@home.com)
Date: Fri, 21 Jul 2000 14:24:57 -0400
Subject: [XML-SIG] Great Reason for 4Suite Release delay
References: <3977A7D8.8C1137F4@fourthought.com>
Message-ID: <005201bff340$f95336a0$7cac1218@reston1.va.home.com>

Great work, people! Congratulations.  Thanks.

"Uche Ogbuji" amazed us by saying -

> We were expecting to have 4XSLT 0.9.2, etc. out late last week, but as
> always, our hacking instincts got in the way of the "release early,
> release often" philosophy.  We started optimizing and couldn't stop.
> 
> As of now, on several of our benchmark documents, 4XSLT is now 100 times
> faster than release 0.9.1.  Yes, I meant 100.  One 200K document that
> used to take 5 minutes to render now takes 3 seconds.  Not all documents
> show such dramatic improvement but the smallest speedup we've seen for
> any somewhat normal stylesheets and documents is well over ten-fold.
> 
> We're not even yet done with the optimizations, but it's not
> inconceivable that 4XSLT is as fast as any of the Java processors.  It's
> typically within 10% of saxon.
> 
> At any rate, we hope to soon get on with the packaging so our users can
> get the benefits.
> 
> Thanks.
> 




From InternetSeer.com  Mon Jul 24 13:19:56 2000
From: InternetSeer.com (InternetSeer.com)
Date: 24 Jul 2000 08:19:56 -0400
Subject: [XML-SIG] Your web site has been mapped
Message-ID: <0628b5619121870MARS1@mars1.internetseer.com>

Freewire has added your web site to its map of the World Wide Web.  Freewire will continue to monitor millions of links and web sites every day during its ongoing web survey.

If it is important for you to know that your site is connected to the web at all times, Freewire has arranged with InternetSeer.com to notify you when your site does not respond.  This means that, AT NO CHARGE; InternetSeer.com will monitor your Web site every hour and send notification to you by email whenever your site is not connected to the Web. There are NO current or future charges associated with this service.

To begin your FREE monitoring NOW, activate your account at:
http://www.internetseer.com/signup.asp?email=xml-sig@python.org

Mark McLellan
Chief Technology Officer
Freewire.com

Is your web site status important to you? I'd love your comments. If you prefer not to receive any future notices that result from our ongoing survey please let me know by returning this email with the word "remove" in the subject line.

=============================================
##Remove: xml-sig@python.org##


From fdrake@beopen.com  Mon Jul 24 21:12:31 2000
From: fdrake@beopen.com (Fred L. Drake, Jr.)
Date: Mon, 24 Jul 2000 16:12:31 -0400 (EDT)
Subject: [XML-SIG] packaging question
Message-ID: <14716.41775.982350.547334@cj42289-a.reston1.va.home.com>

  A few weeks ago we discussed the relationship between the xml
package added to the standard library and PyXML.  One person stated
that the biggest requirement was that both be able to use the "xml"
name for the top-level package.  There were at least two proposals
about how to achieve that.
  Before I dig into the mechanics, I want to be sure people agree that
this is the "right thing" to do.  What are the objections to using two
different names ("xml" and "xmlextra") instead of one?


  -Fred

-- 
Fred L. Drake, Jr.  
BeOpen PythonLabs Team Member



From bjorn@roguewave.com  Mon Jul 24 21:41:21 2000
From: bjorn@roguewave.com (Bjorn Pettersen)
Date: Mon, 24 Jul 2000 14:41:21 -0600
Subject: [XML-SIG] packaging question
References: <14716.41775.982350.547334@cj42289-a.reston1.va.home.com>
Message-ID: <397CA9F1.D6DE0330@roguewave.com>

"Fred L. Drake, Jr." wrote:
> 
>   A few weeks ago we discussed the relationship between the xml
> package added to the standard library and PyXML.  One person stated
> that the biggest requirement was that both be able to use the "xml"
> name for the top-level package.  There were at least two proposals
> about how to achieve that.
>   Before I dig into the mechanics, I want to be sure people agree that
> this is the "right thing" to do.  What are the objections to using two
> different names ("xml" and "xmlextra") instead of one?

I think it was mostly a convenience argument. When an item moves from
xmlextra to xml no user code would have to change if they had the same
name. (In addition, everyone who is currently using PyXML from xml.---
wouldn't have to change their code.)

-- bjorn


From gstein@lyra.org  Mon Jul 24 21:46:46 2000
From: gstein@lyra.org (Greg Stein)
Date: Mon, 24 Jul 2000 13:46:46 -0700
Subject: [XML-SIG] packaging question
In-Reply-To: <397CA9F1.D6DE0330@roguewave.com>; from bjorn@roguewave.com on Mon, Jul 24, 2000 at 02:41:21PM -0600
References: <14716.41775.982350.547334@cj42289-a.reston1.va.home.com> <397CA9F1.D6DE0330@roguewave.com>
Message-ID: <20000724134646.Z10898@lyra.org>

On Mon, Jul 24, 2000 at 02:41:21PM -0600, Bjorn Pettersen wrote:
> "Fred L. Drake, Jr." wrote:
> > 
> >   A few weeks ago we discussed the relationship between the xml
> > package added to the standard library and PyXML.  One person stated
> > that the biggest requirement was that both be able to use the "xml"
> > name for the top-level package.  There were at least two proposals
> > about how to achieve that.
> >   Before I dig into the mechanics, I want to be sure people agree that
> > this is the "right thing" to do.  What are the objections to using two
> > different names ("xml" and "xmlextra") instead of one?
> 
> I think it was mostly a convenience argument. When an item moves from
> xmlextra to xml no user code would have to change if they had the same
> name. (In addition, everyone who is currently using PyXML from xml.---
> wouldn't have to change their code.)

I agree with Bjorn.

I've posted a description of how to accomplish this "melding" of the
packages in a flexible manner. It allows us to ship the "xml" package in the
Python distro with whatever content we choose, and then allow PyXML to place
stuff into the xml namespace at its discretion. i.e. we can independently
update/release PyXML and its xml-insertion.

It seems there was some partial consensus on the approach that I outlined.
There wasn't "enough" in my mind, though, so I haven't attempted to code up
a concrete implementation.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From andy@reportlab.com  Mon Jul 24 21:42:52 2000
From: andy@reportlab.com (Andy Robinson)
Date: Mon, 24 Jul 2000 13:42:52 -0700
Subject: [XML-SIG] packaging question
In-Reply-To: <14716.41775.982350.547334@cj42289-a.reston1.va.home.com>
Message-ID: 


> 
>   A few weeks ago we discussed the relationship between the xml
> package added to the standard library and PyXML.  One person stated
> that the biggest requirement was that both be able to use the "xml"
> name for the top-level package.  There were at least two proposals
> about how to achieve that.
>   Before I dig into the mechanics, I want to be sure people 
> agree that
> this is the "right thing" to do.  What are the objections 
> to using two
> different names ("xml" and "xmlextra") instead of one?
> 
Two names makes more sense to me.  Frankly, for production
projects I want to use what is in the Python core and not
liable to further change.  Putting something in "xmlextra"
is great because there is a guarantee of at least one
future change (even if it is a grep to delete the string 'extra')
so it clearly signals to the user base what is going on.

- Andy Robinson
(in 120' heat in Phoenix, so flame away, I won't even feel it)


From fdrake@beopen.com  Mon Jul 24 21:54:57 2000
From: fdrake@beopen.com (Fred L. Drake, Jr.)
Date: Mon, 24 Jul 2000 16:54:57 -0400 (EDT)
Subject: [XML-SIG] packaging question
In-Reply-To: <20000724134646.Z10898@lyra.org>
References: <14716.41775.982350.547334@cj42289-a.reston1.va.home.com>
 <397CA9F1.D6DE0330@roguewave.com>
 <20000724134646.Z10898@lyra.org>
Message-ID: <14716.44321.419016.611268@cj42289-a.reston1.va.home.com>

Greg Stein writes:
 > I agree with Bjorn.

  That it's a convenience issue or that it's the right thing to do?

 > I've posted a description of how to accomplish this "melding" of the
 > packages in a flexible manner. It allows us to ship the "xml" package in the

  Yes; I've re-read your proposal and Martin's patch.  I don't want to
worry about *how* until it's clear *what* the right result is.


  -Fred

-- 
Fred L. Drake, Jr.  
BeOpen PythonLabs Team Member



From gstein@lyra.org  Mon Jul 24 23:41:51 2000
From: gstein@lyra.org (Greg Stein)
Date: Mon, 24 Jul 2000 15:41:51 -0700
Subject: [XML-SIG] packaging question
In-Reply-To: <14716.44321.419016.611268@cj42289-a.reston1.va.home.com>; from fdrake@beopen.com on Mon, Jul 24, 2000 at 04:54:57PM -0400
References: <14716.41775.982350.547334@cj42289-a.reston1.va.home.com> <397CA9F1.D6DE0330@roguewave.com> <20000724134646.Z10898@lyra.org> <14716.44321.419016.611268@cj42289-a.reston1.va.home.com>
Message-ID: <20000724154151.B10898@lyra.org>

On Mon, Jul 24, 2000 at 04:54:57PM -0400, Fred L. Drake, Jr. wrote:
> 
> Greg Stein writes:
>  > I agree with Bjorn.
> 
>   That it's a convenience issue or that it's the right thing to do?

Both.

Having the core distro and PyXML both use the "xml" namespace means that we
can migrate stuff from PyXML into the as those items become stable. This is
convenient for programmers (don't worry where it comes from or whether it
has moved; just use "xml"), and is the right thing (it all "just works" and
provides a mechanism for future changes).

>  > I've posted a description of how to accomplish this "melding" of the
>  > packages in a flexible manner. It allows us to ship the "xml" package in the
> 
>   Yes; I've re-read your proposal and Martin's patch.  I don't want to
> worry about *how* until it's clear *what* the right result is.

You know... sometimes it is important to just *DO* something rather than
talk endlessly about whether it is perfect or not. How long has this subject
been "on the table"? Too long.

There has been a lot of people talking, trying to be heard about this or
that. It would be nice to actually see people doing something other than
typing text into their email clients.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From uche.ogbuji@fourthought.com  Mon Jul 24 23:40:27 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Mon, 24 Jul 2000 16:40:27 -0600
Subject: [XML-SIG] ANN: 4DOM 0.10.2
Message-ID: <200007242240.QAA11915@localhost.localdomain>

Fourthought, Inc. (http://Fourthought.com) announces the release of

                             4DOM 0.10.2
                      -----------------------
                An XML/HTML Python library using the
                  Document Object Model interface

4DOM is a Python library for XML and HTML processing and manipulation
using the W3C's Document Object Model for interface.  4DOM implements
DOM Core level 2, HTML level 2 and Level 2 Document Traversal.

4DOM should work on all platforms supported by Python.  If you have
any problems with a particular platform, please e-mail the authors.

4DOM is designed to allow developers rapidly design applications
that read, write or manipulate HTML and XML.

News
----

 - Support wide range of output encodings via wstring
 - Updated conformance to 20000510 DOM CR
 - Changed internals to use Node as the clone manager, using a pickle-
   style interface.
 - Changed many classes to be generated in the HTML Extension
 - Other bug-fixes

More info and Obtaining 4DOM
----------------------------

Please see

	http://Fourthought.com/4Suite/4DOM

Or you can download 4DOM from

	ftp://Fourthought.com/pub/4Suite

There are Windows Packages available at

	ftp://Fourthought.com/pub/4Suite/binaries/windows/

And Linux RPMs available at

	ftp://Fourthought.com/pub/4Suite/binaries/redhat/

4DOM is distributed under a license similar to that of Python.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +01 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python




From uche.ogbuji@fourthought.com  Mon Jul 24 23:41:35 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Mon, 24 Jul 2000 16:41:35 -0600
Subject: [XML-SIG] ANN: 4XPath 0.9.2 and 4XSLT 0.9.2
Message-ID: <200007242241.QAA11967@localhost.localdomain>

Fourthought, Inc. (http://Fourthought.com) announces the release of

                      4XSLT and 4XPath 0.9.2
                      ----------------------
                      A python implementation
                     of the W3C's XSLT language


4XSLT is an XML transformation processor based on the W3C's specification
for the XSLT transform language.  4XPath implements the W3C XPath language
for indicating and selecting XML document components.

http://www.w3.org/TR/xpath
http://www.w3.org/TR/xslt

4XPath is a complete implementation of the 4XPath 1.0 recommendation.

4XSLT is a complete implementation of the XSLT 1.0 Recommendation.

Note: 4XSLT and 4XPath cannot work with JPython.

News
----

 - Now XSLT 1.0 feature-complete
 - Implemented ft:set-mode and ft:write-file extension elements
 - Implemented exclude-result-prefixes
 - Implemented full range of encoding support (really done in 4DOM)
 - Implemented extension elements and fallback
 - Improved documentation
 - Optimizations like there's no tomorrow (100X speedup in some cases)
     o Fixed many XPath performance bugs: redundant loops and
tree-descents
     o Cleaned up XSLT/XPath interface
     o Implemented Domlette: a specialized pseudo-DOM for
       XPath and result-tree fragments
     o Cleaned up white-space stripping and document-order
       sorting/indexing
 - Cleaned up Processor API
 - Cleaned up XPath API
 - Restructured for cleanliness of stylesheet objects
 - simplify and document extension functions
 - Fixed function-available
 - BaseUri support
 - Fixes to modes
 - Fixes to xsl:import
 - Fixes to position in Pattern matches
 - Better exception handling from XPatternParser
 - Numerous bug-fixes

More info and Obtaining 4XPath and 4XSLT
----------------------------------------

Please see

	http://Fourthought.com/4Suite/4XPath
	http://Fourthought.com/4Suite/4XSLT

Or you can download 4XSLT from

	ftp://Fourthought.com/pub/4Suite/

Please see ftp://Fourthought.com/pub/4Suite/INSTALL for explanations
of the various available packages.

There are Windows binaries at

	ftp://Fourthought.com/pub/4Suite/binaries/windows

And Linux RPMs available at 

        ftp://Fourthought.com/pub/4Suite/binaries/redhat/

4XPath and 4XSLT are distributed under a license similar to that of
Python.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +01 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python




From fdrake@beopen.com  Mon Jul 24 23:49:43 2000
From: fdrake@beopen.com (Fred L. Drake, Jr.)
Date: Mon, 24 Jul 2000 18:49:43 -0400 (EDT)
Subject: [XML-SIG] packaging question
In-Reply-To: <20000724154151.B10898@lyra.org>
References: <14716.41775.982350.547334@cj42289-a.reston1.va.home.com>
 <397CA9F1.D6DE0330@roguewave.com>
 <20000724134646.Z10898@lyra.org>
 <14716.44321.419016.611268@cj42289-a.reston1.va.home.com>
 <20000724154151.B10898@lyra.org>
Message-ID: <14716.51207.615956.725189@cj42289-a.reston1.va.home.com>

Greg Stein writes:
 > Both.

  So noted.

 > You know... sometimes it is important to just *DO* something rather than
 > talk endlessly about whether it is perfect or not. How long has this subject
 > been "on the table"? Too long.

  As I recall, only *one* person said they thought this was the right
thing.  If there's only one, that's not enough.  It wasn't clear that
anyone else really thought it was the right thing, and *that's* what I
want a little more feedback on.  I know what I think, and how I intend
to approach solving the problem, but not until I'm sure that I'm
attacking the real problem.


  -Fred

-- 
Fred L. Drake, Jr.  
BeOpen PythonLabs Team Member



From bjorn@roguewave.com  Tue Jul 25 00:20:51 2000
From: bjorn@roguewave.com (Bjorn Pettersen)
Date: Mon, 24 Jul 2000 17:20:51 -0600
Subject: [XML-SIG] packaging question
References: <14716.41775.982350.547334@cj42289-a.reston1.va.home.com>
 <397CA9F1.D6DE0330@roguewave.com>
 <20000724134646.Z10898@lyra.org>
 <14716.44321.419016.611268@cj42289-a.reston1.va.home.com>
 <20000724154151.B10898@lyra.org> <14716.51207.615956.725189@cj42289-a.reston1.va.home.com>
Message-ID: <397CCF53.D2FFB03@roguewave.com>

"Fred L. Drake, Jr." wrote:
> 
> Greg Stein writes:
>  > Both.
> 
>   So noted.
> 
>  > You know... sometimes it is important to just *DO* something rather than
>  > talk endlessly about whether it is perfect or not. How long has this subject
>  > been "on the table"? Too long.
> 
>   As I recall, only *one* person said they thought this was the right
> thing.  If there's only one, that's not enough.  It wasn't clear that
> anyone else really thought it was the right thing, and *that's* what I
> want a little more feedback on.  I know what I think, and how I intend
> to approach solving the problem, but not until I'm sure that I'm
> attacking the real problem.

I think it's the right thing to do too. (yes, I realize there are issues
with this solution, but I think the convenience of it outweighs all of
them :-)

-- bjorn


From Anthony Baxter   Tue Jul 25 00:41:09 2000
From: Anthony Baxter  (Anthony Baxter)
Date: Tue, 25 Jul 2000 09:41:09 +1000
Subject: [XML-SIG] packaging question
In-Reply-To: Message from "Fred L. Drake, Jr." 
 of "Mon, 24 Jul 2000 18:49:43 -0400." <14716.51207.615956.725189@cj42289-a.reston1.va.home.com>
Message-ID: <200007242341.JAA01051@mbuna.arbhome.com.au>

For what it's worth, having a single xml package name definately
appeals to me. Yes, it's a bit of monkeying around with import magic,
but against the pain of working with multiple XML packages, and stuff
migrating between them...

Anthony
-- 
Anthony Baxter        
It's never too late to have a happy childhood.


From Juergen Hermann" 
Message-ID: <200007250622.IAA12169@statistik.cinetic.de>

On Tue, 25 Jul 2000 09:41:09 +1000, Anthony Baxter wrote:

>For what it's worth, having a single xml package name definately
>appeals to me. 

+1 for me too.


Ciao, J=FCrgen

--
J=FCrgen Hermann (jhe@webde-ag.de)
WEB.DE AG, Amalienbadstr.41, D-76227 Karlsruhe
Tel.: 0721/94329-0, Fax: 0721/94329-22




From paul@prescod.net  Tue Jul 25 08:12:15 2000
From: paul@prescod.net (Paul Prescod)
Date: Tue, 25 Jul 2000 02:12:15 -0500
Subject: [XML-SIG] packaging question
References: <14716.41775.982350.547334@cj42289-a.reston1.va.home.com> <397CA9F1.D6DE0330@roguewave.com> <20000724134646.Z10898@lyra.org> <14716.44321.419016.611268@cj42289-a.reston1.va.home.com> <20000724154151.B10898@lyra.org>
Message-ID: <397D3DCF.B0354AA2@prescod.net>

Greg Stein wrote:
> 
> > ....
> >
> >   Yes; I've re-read your proposal and Martin's patch.  I don't want to
> > worry about *how* until it's clear *what* the right result is.
> 
> You know... sometimes it is important to just *DO* something rather than
> talk endlessly about whether it is perfect or not. 

As I understood it, you had already done it and were waiting to check it
in pending enough + votes. Now Fred is gathering the + votes!

As far as I can see, everything is working as it should, thus far!

> How long has this subject been "on the table"? Too long.
> 
> There has been a lot of people talking, trying to be heard about this or
> that. It would be nice to actually see people doing something other than
> typing text into their email clients.

Collaborative design requires people to use email. I think you've
misdiagnosed our problem. The problem is simply that Fred and Andrew
make all final decisions and they can only give xml-sig at best a
fraction of their attention. Plus, they tend not to rule by fiat. So
threads tend to die out without anyone "ruling" one way or another.
Nevertheless, we still get where we are going.

-- 
 Paul Prescod - Not encumbered by corporate consensus
New from Computer Associates: "Software that can 'think', sold by 
marketers who choose not to."


From larsga@garshol.priv.no  Tue Jul 25 09:01:02 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 25 Jul 2000 10:01:02 +0200
Subject: [XML-SIG] packaging question
In-Reply-To: <14716.41775.982350.547334@cj42289-a.reston1.va.home.com>
References: <14716.41775.982350.547334@cj42289-a.reston1.va.home.com>
Message-ID: 

* Fred L. Drake, Jr.
|
| Before I dig into the mechanics, I want to be sure people agree that
| this is the "right thing" to do.

+1 from me.

| What are the objections to using two different names ("xml" and
| "xmlextra") instead of one?

The only thing I can think of is that it might be confusing to people
when they use some software that depends on PyXML and it attempts to
import things from the xml package that are not there. It may not be
obvious what is wrong in that case. Proper documentation should take
care of that, though.

--Lars M.



From jerome@IDEALX.com  Tue Jul 25 09:11:22 2000
From: jerome@IDEALX.com (Jérôme Marant)
Date: 25 Jul 2000 10:11:22 +0200
Subject: [XML-SIG] Questions about 0.5.5
Message-ID: <7zvgxu1vf9.fsf@amboise.ird.idealx.com>

Hi,

Here is was I can read from the README file in PyXML 0.5.5:

[...]
Software versions and credits:
	DOM 	 		Stefane Fermigier, hacked by A.M. Kuchling
	PyExpat 		Jack Jan=
sen
	saxlib-1.0		Lars Marius Garshol
	sgmlop-981008		Fredrik Lundh
	xmlarch 0.25		Geir Ove Gronmo
        ^^^^^^^^^^^^

I can not see xmlarch within this version of PyXML, was it removed
or is it a mistake ? (The README file still refers to xmlarch in the CV=
S)

Thanks.

Regards,

--=20
J=E9r=F4me Marant 

 -----------------------------------------------------------
| IDEALX - Open Source Engineering / Ing=E9nierie Open Source |
| http://IDEALX.com                                         |
 -----------------------------------------------------------


From larsga@garshol.priv.no  Tue Jul 25 10:04:32 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 25 Jul 2000 11:04:32 +0200
Subject: [XML-SIG] Questions about 0.5.5
In-Reply-To: <7zvgxu1vf9.fsf@amboise.ird.idealx.com>
References: <7zvgxu1vf9.fsf@amboise.ird.idealx.com>
Message-ID: 

* Jérôme Marant
| 
| Here is was I can read from the README file in PyXML 0.5.5:
| 
| [...]
| Software versions and credits:
| 	DOM 	 		Stefane Fermigier, hacked by A.M. Kuchling
| 	PyExpat 		Jack Jansen
| 	saxlib-1.0		Lars Marius Garshol
| 	sgmlop-981008		Fredrik Lundh
| 	xmlarch 0.25		Geir Ove Gronmo
|         ^^^^^^^^^^^^
| 
| I can not see xmlarch within this version of PyXML, was it removed
| or is it a mistake ? 

It was removed, so the README is in error.  The xmlproc version number
is also wrong, it should be 0.70.

--Lars M.



From Juergen Hermann" 
Message-ID: <200007250915.LAA16816@statistik.cinetic.de>

On 25 Jul 2000 10:01:02 +0200, Lars Marius Garshol wrote:

>The only thing I can think of is that it might be confusing to people
>when they use some software that depends on PyXML and it attempts to
>import things from the xml package that are not there. It may not be
>obvious what is wrong in that case. Proper documentation should take
>care of that, though.

Pmw contains code that allows a _client_ module to set a minimal needed =

version. Couldn't we add a similar thing?

import xml
xml.assertFeature("4xslt", "1.1") # throws exception when not installed



Ciao, J=FCrgen

--
J=FCrgen Hermann (jhe@webde-ag.de)
WEB.DE AG, Amalienbadstr.41, D-76227 Karlsruhe
Tel.: 0721/94329-0, Fax: 0721/94329-22




From jerome@IDEALX.com  Tue Jul 25 10:28:54 2000
From: jerome@IDEALX.com (Jérôme Marant)
Date: 25 Jul 2000 11:28:54 +0200
Subject: [XML-SIG] Questions about 0.5.5
In-Reply-To: Lars Marius Garshol's message of "25 Jul 2000 11:04:32 +0200"
References: <7zvgxu1vf9.fsf@amboise.ird.idealx.com> 
Message-ID: <7zn1j61ru1.fsf@amboise.ird.idealx.com>

Lars Marius Garshol  writes:
=20
> It was removed, so the README is in error.  The xmlproc version number
> is also wrong, it should be 0.70.

Ok.

So, when is the next release of PyXMl is due to ?

--=20
J=E9r=F4me Marant 

 -----------------------------------------------------------
| IDEALX - Open Source Engineering / Ing=E9nierie Open Source |
| http://IDEALX.com                                         |
 -----------------------------------------------------------


From dumas@centre-cired.fr  Tue Jul 25 14:19:57 2000
From: dumas@centre-cired.fr (Dumas Patrice)
Date: Tue, 25 Jul 2000 15:19:57 +0200
Subject: [XML-SIG] compilation error during instalation of PyXML
Message-ID: <397D93FD.169E5148@centre-cired.fr>

Hi,
While issuing
    python setup.py build
I got (after other things) :

....
running build_ext
warning: build_ext: old-style (ext_name, build_info) tuple found in
ext_modules for extension 'sgmlop'-- please convert to Extension
instance
warning: build_ext: old-style (ext_name, build_info) tuple found in
ext_modules for extension 'xml.unicode.wstrop'-- please convert to
Extension instance
warning: build_ext: old-style (ext_name, build_info) tuple found in
ext_modules for extension 'xml.parsers.pyexpat'-- please convert to
Extension instance
skipping 'sgmlop' extension (up-to-date)
skipping 'xml.unicode.wstrop' extension (up-to-date)
building 'xml.parsers.pyexpat' extension
gcc -O3 -fomit-frame-pointer -fno-exceptions -fno-rtti -pipe -s
-mpentium -mcpu=pentium -march=pentium -ffast-math
-fexpensive-optimizations -fPIC -Iextensions/expat/xmltok
-Iextensions/expat/xmlparse -I/usr/include/python1.5 -c
extensions/pyexpat.c -o build/temp.linux-i386/extensions/pyexpat.o
extensions/pyexpat.c: In function `newxmlparseobject':
extensions/pyexpat.c:474: parse error before `xmlparseobject'
error: command 'gcc' failed with exit status 1




From uche.ogbuji@fourthought.com  Tue Jul 25 17:06:02 2000
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Tue, 25 Jul 2000 10:06:02 -0600
Subject: [XML-SIG] 4XSLT Test Suite
Message-ID: <397DBAEA.2EEA1935@fourthought.com>

I've been meaning to post this.  I've made available our core test-suite
for 4XSLT at

ftp://fourthought.com/pub/etc/4XSLT-test-suite-2000-07-25.tar.gz

Note that this *excludes* any files that were sent to us for
bug-fix/testing that we were asked to keep private.  It also, for volume
reasons, excludes our large-file tests.

However, it may be of use in learning XSLT and 4XSLT, for constructing
your own tests and bug-reports, or for any other reasons.

-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Tue Jul 25 18:52:19 2000
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Tue, 25 Jul 2000 11:52:19 -0600
Subject: [XML-SIG] [Fwd: Re: [4suite] 4DOM bug in the new release]
Message-ID: <397DD3D3.B31192C5@fourthought.com>


-------- Original Message --------
Subject: Re: [4suite] 4DOM bug in the new release
Date: Tue, 25 Jul 2000 11:34:46 -0600
From: "Jeremy J Kloth" 
To: "4suite mailing list" <4suite@dollar.fourthought.com>,"Alexandre
Fayolle" 
References: <964526371.397d812328311@imp.free.fr>

> Hello,
>
> I think I've found a bug in 4DOM 0.10.2 : the ownerDocument is not set
properly
> on Attr when importing a node :
>
> from xml.dom.ext.reader import Sax2
> tree = """"""
> if __name__ == '__main__':
>     doc1 = Sax2.FromXml(tree,None,0,1)
>     doc2 = Sax2.FromXml(tree,None,0,1)
>     child = doc1.documentElement
>     print "doc1 = " + str(doc1)
>     print "doc2 = " + str(doc2)
>     child2 = doc1.importNode(doc2.documentElement,1)
>     print "child2 owner = " + str(child2.ownerDocument)
>     print "child2 attr owner = " + \
> str(child2.firstChild.getAttributeNodeNS('','foo').ownerDocument)
>
>
> When I run the script, I get the followin output :
> doc1 = 
> doc2 = 
> child2 owner = 
> child2 attr owner = 
>
> I would have expected the attribute to have the same owner as the element
;o).
> Problem is, this breaks a *lot* of things. Try cloning child2, for
> instance...
>
>
> Alexandre Fayolle
> http://alexandre.fayolle.free.fr

Thanks for pointing that out to us.  Sorry for letting that one slip
through
our test net.
This will definitely be added to the tests.  The patch follows.

----------- start of patch --------------
diff -ur dom/Element.py patched/Element.py
--- dom/Element.py      Mon Jul 24 15:06:08 2000
+++ patched/Element.py  Tue Jul 25 11:29:09 2000
@@ -194,7 +194,7 @@
         for attr in attributes:
             # Attribute children are the value, so they're cloned
             # when the attribute is cloned, no need to go deep
-            newAttr = attr.cloneNode(0)
+            newAttr = attr.cloneNode(0, newOwner=self.ownerDocument)
             if self.ownerDocument._4dom_isNsAware:
                 self.attributes.setNamedItemNS(newAttr)
             else:
diff -ur dom/Node.py patched/Node.py
--- dom/Node.py Mon Jul 24 15:06:08 2000
+++ patched/Node.py     Tue Jul 25 11:28:38 2000
@@ -278,16 +278,16 @@
         else:
             state = {}

+        # Set when clone is used for import
+        if newOwner:
+            newNode._4dom_setOwnerDocument(newOwner)
+
         # Assign the current state to the copy
         setstate = getattr(newNode, '__setstate__', None)
         if setstate:
             setstate(state)
         else:
             newNode.__dict__.update(state)
-
-        # Set when clone is used for import
-        if newOwner:
-            newNode._4dom_setOwnerDocument(newOwner)

         # Copy the child nodes if deep
         if deep or self.nodeType == Node.ATTRIBUTE_NODE:
----------- end of patch --------------

--
Jeremy Kloth                        Consultant
jeremy.kloth@fourthought.com        (303)583-9900 x 102
Fourthought, Inc.                   http://www.fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python




_______________________________________________
4suite mailing list
4suite@lists.fourthought.com
http://lists.fourthought.com/mailman/listinfo/4suite


From gwillis@mail.com  Tue Jul 25 23:49:16 2000
From: gwillis@mail.com (george willis)
Date: Tue, 25 Jul 2000 18:49:16 -0400 (EDT)
Subject: [XML-SIG] Help Wanted Advertisement - Zope/python/XML
Message-ID: <384231266.964565356968.JavaMail.root@web303-mc.mail.com>

I hope that I do not offend anyone by posting this in what has to date been a technology discussion. I assume that we all like 
billable hours associated with our efforts, so without further 
discussion...


Title: Developer/Technologist
Start Date: ASAP
Contact: gwillis@mail.com
Pay Rate: TBD
Location: On-site in Rabun County Georgia, the #1 place to retire.
Skills Desired: python, Zope, XML, XSLT, Linux, hardware, networking

Description: Looking for a person who eats technology. Should have familiarity with PC hardware and networking enough to troubleshoot and repair systems. Should be able to install and configure Linux and various applications such as Zope/Apache/etc. Should be familiar with OOP/OOAD including any of the following -- CRC cards, UML, python, java, c++, smalltalk, eiffel. Familiarity with XML technologies is a plus. Must have excellent soft skills and be a team player. If names like Steve McConnell, Steven Covey, and Bertrand Meyer are familiar to you, we want to hear from you.

______________________________________________
FREE Personalized Email at Mail.com
Sign up at http://www.mail.com/?sr=signup



From ibarg@as.arizona.edu  Wed Jul 26 17:17:07 2000
From: ibarg@as.arizona.edu (Irene Barg)
Date: Wed, 26 Jul 2000 09:17:07 -0700
Subject: [XML-SIG] build fails on SunOS 5.6
Message-ID: <397F0F03.37A892BB@as.arizona.edu>

Hi,

The system admin installed Python-1.5.2 is on our system in
"/opt/phthon".  The platform is SunOS 5.6. I want to install the
PyXML-0.5.5 package to my home directory, but the build fails
with the following msg:

building 'xml.parsers.pyexpat' extension
cc -c -Iextensions/expat/xmltok -Iextensions/expat/xmlparse
-I/l/o/python/include/python1.5 -O extensions/pyexpat.c -o
build/temp.sunos5-sun4u/extensions/pyexpat.o
"extensions/pyexpat.c", line 474: syntax error before or at:
xmlparseobject
"extensions/pyexpat.c", line 823: warning: initialization type mismatch
....same msg repeated for 15 diff lines in pyexpat.c
cc: acomp failed for extensions/pyexpat.c
error: command 'cc' failed with exit status 2

Is there any special flags I need to set for SunOS 5.6?
Thanks,
-- irene

------------------------------------------------------------------
Irene Barg			Email:	ibarg@as.arizona.edu
Steward Observatory		Phone:  520-621-2602
933 N. Cherry Ave.
University of Arizona		FAX:    520-621-1891
Tucson, AZ  85721		http://nickel.as.arizona.edu/~barg
------------------------------------------------------------------


From fdrake@beopen.com  Thu Jul 27 06:31:22 2000
From: fdrake@beopen.com (Fred L. Drake, Jr.)
Date: Thu, 27 Jul 2000 01:31:22 -0400 (EDT)
Subject: [XML-SIG] Extending the xml package
Message-ID: <14719.51498.429119.458996@cj42289-a.reston1.va.home.com>

  As promised, I brought up the package extension issue at today's
PythonLabs meeting.  We decided that there are two interesting cases
for package importing involved here.
  The first is package extension -- allowing one package to extend
another.  We basically agreed that the Java model got this right, with
the issue of multiple __init__ modules being a serious problem for
Python (it's not clear what the right way to deal with multiple
__init__ modules; you want to execute all of them, and the current
implementation doesn't lend itself to this).  This is the approach
we've discussed here before.
  Another possibility is providing an extended replacement for the
standard package.  This doesn't sound like it makes sense given that
using the same name creates order dependencies for sys.path, and the
current setup would be wrong for overriding a standard package with a
package installed in site-packages or a user's or application's
private library.
  However, this actually appears to be the most reasonable if we want
to be able to include bug fixes in the "enhanced" version of the
package, and doesn't require weird hacking on distutils.  It does
require that the package that can be overridden in this way be written
to support this.
  Here's how to do it:
  Deploy the "xml" package in the standard library.  Create an
"_xmlplus" package (PyXML) which provides all of the facilities from
the standard library and any extensions.  The "_xmlplus" package can
be treated as any other package with distutils.
  In the __init__.py for the "xml" package, include the following
code:

------------------------------------------------------------
if __name__ == "xml":
    try:
        import _xmlplus
    except ImportError:
        pass
    else:
        import sys
        sys.modules[__name__] = _xmlplus
------------------------------------------------------------

  Yes, this works.
  The leading test for __name__ is useful to allow the same file to be
used for both the xml and _xmlplus packages.
  The PyXML package (providing _xmlplus) could continue to be the
leading-edge development package with all the bells and whistles, and
portions adopted into the standard library could be updated before a
Python release.
  Guido also suggested looking at the version-handling code from Pmw,
but I don't know how valuable that would be.
  Comments?


  -Fred

-- 
Fred L. Drake, Jr.  
BeOpen PythonLabs Team Member



From Juergen Hermann" 
Message-ID: <200007270837.KAA15671@statistik.cinetic.de>

On Thu, 27 Jul 2000 01:31:22 -0400 (EDT), Fred L. Drake, Jr. wrote:

>  Comments?

I like it, as long as I can detect (or require) what version of "xml" I =

import (i.e. plain or extended). I think version handling or some sort 
is essential to make this fly reliably (otherwise, every developer 
using the non-release xml package jumps willingly into a support 
nightmare).


Ciao, J=FCrgen

--
J=FCrgen Hermann, Developer (jhe@webde-ag.de)
WEB.DE AG, http://webde-ag.de/




From webmaster@marktkreuz.de  Thu Jul 27 18:39:20 2000
From: webmaster@marktkreuz.de (Manfred Weber)
Date: Thu, 27 Jul 2000 19:39:20 +0200
Subject: [XML-SIG] Beginner XML-Question
Message-ID: <001901bff7f1$98eb4e00$8866fea9@crafics>

This is a multi-part message in MIME format.

------=_NextPart_000_0016_01BFF802.5BEDD700
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Hi,
I am absolut beginner...to Python..
could anyone just send me a simple PythonScript using the xmlLib...=20
Following the XML Starting out did not do it.

Lets say I use this xml-example:


  
    Neil Gaiman
    Glyn Dillon
    Charles Vess
  


A very simple script that prints out any element (f.e. the writer) would =
do it!

Thanx...
M.Weber

------=_NextPart_000_0016_01BFF802.5BEDD700
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable








Hi,
I am absolut beginner...to = Python..
could anyone just send me a simple = PythonScript=20 using the xmlLib...
Following the XML Starting out did not = do=20 it.
 
Lets say I use this = xml-example:
 
<collection>
  <comic=20 title=3D"Sandman" number=3D'62'>
    = <writer>Neil=20 Gaiman</writer>
    <penciller=20 pages=3D'1-9,18-24'>Glyn = Dillon</penciller>
   =20 <penciller pages=3D"10-17">Charles = Vess</penciller>
 =20 </comic>
</collection>
 
A very = simple script that=20 prints out any element (f.e. the writer) would do = it!
Thanx...
M.Weber
------=_NextPart_000_0016_01BFF802.5BEDD700-- From wbrian2@uswest.net Thu Jul 27 12:43:08 2000 From: wbrian2@uswest.net (Brian Wisti) Date: Thu, 27 Jul 2000 11:43:08 +0000 Subject: [XML-SIG] Beginner XML-Question References: <001901bff7f1$98eb4e00$8866fea9@crafics> Message-ID: <3980204C.B57176CD@uswest.net> --------------E0B915D76FC71FD0C569E24A Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi Manfred, I'm still figuring out both Python and XML, but the script you're looking for is already in the XML-HOWTO: http://www.python.org/doc/howto/xml/node9.html All I've done myself basically uses this sample as a template (adapting it to build a PencillerFinder class, for example). I've just purchased the book "XML Processing With Python," so maybe I'll become more useful (and helpful) soon. Good Luck! Brian Wisti Manfred Weber wrote: > Hi,I am absolut beginner...to Python..could anyone just send me a > simple PythonScript using the xmlLib...Following the XML Starting out > did not do it. Lets say I use this xml-example: > > Neil Gaiman > Glyn Dillon > Charles Vess > > A very simple script that prints out any element (f.e. > the writer) would do it!Thanx...M.Weber --------------E0B915D76FC71FD0C569E24A Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 7bit Hi Manfred,

I'm still figuring out both Python and XML, but the script you're looking for is already in the XML-HOWTO:

    http://www.python.org/doc/howto/xml/node9.html

All I've done myself basically uses this sample as a template (adapting it to build a PencillerFinder class, for example).   I've just purchased the book "XML Processing With Python," so maybe I'll become more useful (and helpful) soon.

Good Luck!
Brian Wisti

Manfred Weber wrote:

Hi,I am absolut beginner...to Python..could anyone just send me a simple PythonScript using the xmlLib...Following the XML Starting out did not do it. Lets say I use this xml-example: <collection>
  <comic title="Sandman" number='62'>
    <writer>Neil Gaiman</writer>
    <penciller pages='1-9,18-24'>Glyn Dillon</penciller>
    <penciller pages="10-17">Charles Vess</penciller>
  </comic>
</collection> A very simple script that prints out any element (f.e. the writer) would do it!Thanx...M.Weber
--------------E0B915D76FC71FD0C569E24A-- From jeremy.kloth@fourthought.com Thu Jul 27 21:17:57 2000 From: jeremy.kloth@fourthought.com (Jeremy J Kloth) Date: Thu, 27 Jul 2000 14:17:57 -0600 Subject: [XML-SIG] 4Suite Bugfixes Message-ID: <00dc01bff807$c10edee0$1b01a8c0@fourthought.com> This is a multi-part message in MIME format. ------=_NextPart_000_00D9_01BFF7D5.7652DD20 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable There is now a patch file that fixes the bugs that have cropped up since = the latest release. Get it at = ftp://ftp.fourthought.com/pub/4Suite/bugfixes-20000727.patch 4DOM ---- Fixed attribute cloning Fixed cloning of HTML elements 4XSLT ----- Fixed xsl:sort Fixed problem with multiple apply-templates in a single template Fixed template priorities -- Jeremy Kloth Consultant jeremy.kloth@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://www.fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python ------=_NextPart_000_00D9_01BFF7D5.7652DD20 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
There is now a patch = file that=20 fixes the bugs that have cropped up since the latest release. Get = it at=20  ftp= ://ftp.fourthought.com/pub/4Suite/bugfixes-20000727.patch
 
4DOM
----
  Fixed attribute = cloning
  Fixed cloning of HTML=20 elements
 
4XSLT
-----
  Fixed = xsl:sort
  Fixed problem with = multiple=20 apply-templates in a single template
  Fixed template=20 priorities
 
--
Jeremy=20 Kloth           &n= bsp;           =20 Consultant
jeremy.kloth@fourthought.com=        =20 (303)583-9900 x 102
Fourthought,=20 Inc.           &nb= sp;      =20 http://www.fourthought.com
Sof= tware-engineering,=20 knowledge-management, XML, CORBA, Linux, = Python
------=_NextPart_000_00D9_01BFF7D5.7652DD20-- From tpassin@home.com Fri Jul 28 04:46:31 2000 From: tpassin@home.com (tpassin@home.com) Date: Thu, 27 Jul 2000 23:46:31 -0400 Subject: [XML-SIG] Beginner XML-Question References: <001901bff7f1$98eb4e00$8866fea9@crafics> Message-ID: <00e401bff846$6ff6aee0$7cac1218@reston1.va.home.com> Manfred Weber" asked >I am absolut beginner...to Python.. >could anyone just send me a simple PythonScript using the xmlLib... Is this simple enough? 1) Create a file "test.xml" containing your xml example. 2) Run xmllib against it (this is on Win95)(python must be on your path or be run from a batch file): D:>python "c:\Program Files\Python\Lib\xmllib.py" test.xml The output: start tag: data: '\012 ' start tag: data: '\012 ' start tag: data: 'Neil Gaiman' end tag: data: '\012 ' start tag: data: 'Glyn Dillon' end tag: data: '\012 ' start tag: data: 'Charles Vess' end tag: data: '\012 ' end tag: data: '\012' end tag: data: '\012 \012' Here is another simple program (not using your data): """ bare.py - Demonstrate handling specifically-named elements using xmllib """ import xmllib class bareBones(xmllib.XMLParser): def __init__(self): xmllib.XMLParser.__init__(self) # Report start of element def start_specialTag(self, attrs): print "Start specialTag" print "element attributes:", attrs self.handle_data=self.do_data # use our data handler to handle the content # Report reaching def end_specialTag(self): print "End specialTag" self.handle_data=self.null_data # reset the data handler # A minimal data handler def do_data(self,data): print "===============\n",data,"\n===============" def null_data(self,data):pass doc=\ """ This element won't be reported This one will """ if __name__=="__main__": parser=bareBones() parser.feed(doc) parser.close() I'm running version 0.2 of xmllib, which came with my Python 1.5.2 distribution. Cheers, Tom Passin >Lets say I use this xml-example: Neil Gaiman Glyn Dillon Charles Vess ?A very simple script that prints out any element (f.e. the writer) would do it! From robin@jessikat.fsnet.co.uk Fri Jul 28 09:02:02 2000 From: robin@jessikat.fsnet.co.uk (Robin Becker) Date: Fri, 28 Jul 2000 09:02:02 +0100 Subject: [XML-SIG] current status of the xml-sig Message-ID: Can someone explain what the status of the XML-sig is. All of the sigs at www.python.org are reported as terminating in June 2000. The SAX tutorial is at 0.5 Jan 2000 and neither it nor the other docs mention minidom or pulldom which are apparently going into 1.6/2.0. Is there a definitive python xml way now and if so where is it? -- Robin Becker From Juergen Hermann" Message-ID: <200007280838.KAA18327@statistik.cinetic.de> On Thu, 27 Jul 2000 14:37:19 -0400 (EDT), Fred L. Drake, Jr. wrote: [I post this back to the list too and hope you don't care; actually I failed to click on "reply to all" once again with my original reply ;) ]= > You can check the value of xml.__name__; it will be "xml" for the >standard module and "_xmlplus" for the extended version. OK, that is a start. :) > That would work, but could easily create increased dependence on >getting out a fresh PyXML package so that the version number will be >available for people that rely on the extended package. > Is this what you mean? No, I did not mean to *internally* check the version in the xml packages= __init__ itself. I think *client* code should be able to do so. Since we are planning to = (more or less regularly) update the core/stable/distributed xml package,= we should support this by enabling client code to say what version of the extended stuff it relies(!) on. If you say "import re" on a Python 1.5.2 system, you know what to expect= (unless you have a wicked installation). This would be NOT the case for = "import xml". Ciao, J=FCrgen -- J=FCrgen Hermann, Developer (jhe@webde-ag.de) WEB.DE AG, http://webde-ag.de/ From abingham@lips.utexas.edu Fri Jul 28 17:51:45 2000 From: abingham@lips.utexas.edu (Austin Bingham) Date: Fri, 28 Jul 2000 11:51:45 -0500 Subject: [XML-SIG] Breeze toolkit replacement Message-ID: Hi everyone. I am new to this list, but I have been mucking with PyXML for the past week or so. Great work :) I wanted to gauge interest in an XML related project I've been considering for some time. www.breezefactor.com produces a product called called (surprise) Breeze Studio which has proven to be a very powerful tool where I work. Basically, it provides a GUI which lets you design what is effectively a DTD. Based on this layout, Breeze produces Java classes which parse/write XML files and produce a a set of objects containing the XML data. These objects provide extremely simple, reliable access to XML files. While this sounds perilously close to DOM, the presentation, simplicity, and usability of this tool are astounding; Breeze seems to simply repackage the work done by DOM (or SAX...who knows). Of course, there are problems. Breeze is pricey, at best, and prohibitively expensive at worst (esp. if you are an academic organization like us). Also, Breeze currently only produces Java classes. The potential for a tool like this provide interoperable classes in many different languages is so obvious that I am amazed that they don't seem to consider it. So, I have been considering writing a free analogue to Breeze. Python seems like an ideal language to do this work in, and the wonderful XML support you guys provide makes it that much easier. I could certainly use some help on this. If there's no real interest, that's cool...sorry for taking up some of your bandwidth. If there is, I'd love to discuss the possibilities with anyone who might want to help or is just interested in the project. Thanks, and keep up the good work! Austin Bingham Laboratory for Intelligent Processes and Systems University of Texas at Austin abingham@lips.utexas.edu From walter@livinglogic.de Fri Jul 28 17:54:41 2000 From: walter@livinglogic.de (=?ISO-8859-1?Q?=22Walter_D=F6rwald=22?=) Date: Fri, 28 Jul 2000 18:54:41 +0200 Subject: [XML-SIG] Another bug in sgmlop? Message-ID: <200007281854410031.007D179E@mail.tmt.de> Hello all! I think I discovered another bug in sgmlop. ----------------------------------------------------------- import sgmlop class Handler: def finish_starttag(self,name,attrs): print name,attrs parser =3D sgmlop.SGMLParser() parser.register(Handler()) parser.parse('') ----------------------------------------------------------- what this script prints is the following: page [('test', 'test'), ('test', 'test'), ('nohome', 'nohome')] Segmentation fault. (I installed the Fredrik's patch from the 5th July. Bye, Walter D=F6rwald -- Walter D=F6rwald =B7 LivingLogic AG =B7 Bayreuth, Germany =B7 www.livinglogic.de From Lisa_Iarkowski@prenhall.com Sat Jul 29 21:22:56 2000 From: Lisa_Iarkowski@prenhall.com (Lisa_Iarkowski@prenhall.com) Date: 29 Jul 2000 16:22:56 -0400 Subject: [XML-SIG] Python XML package, Permission Request for XML Handbook, 3rd Ed., Charles Goldfarb Message-ID: <"/GUID:Q/2tt9ORk1BG5EwAQS4zudw*/G=Lisa/S=Iarkowski/OU=exchange/O=pearsontc/PRMD=pearson/ADMD=telemail/C=us/"@MHS> Charles F. Goldfarb and Paul Prescod, authors of the forthcoming Prentice Hall book, "The XML Handbook, Third Edition," would like to include the documentation and code for Python XML package on the CD-ROM accompanying the book. The agreement will appear on the CD-ROM. We would very much appreciate hearing from you on this matter via return e-mail prior to August 18, our due date for including software on the CD-ROM. Please direct your response to Peter S. Snell at peter_snell@prenhall.com. If you are not the appropriate person to consider this request, I would be grateful if you would direct this message to that person and notify me so that I can update our records. Thank you in advance. ____________________________________________________________ TO: Charles F. Goldfarb and Paul Prescod c/o Peter S. Snell - Prentice Hall Publishers I(we) grant you permission to include documentation and code for Python XML package in the book "The XML Handbook, Third Edition" to be published by Prentice Hall publishers in the Charles F. Goldfarb Series on Open Information Management, and grant you non-exclusive worldwide distribution rights. I(we) also grant you permission to include documentation and code for Python XML package on any future book in the Charles F. Goldfarb Series on Open Information Management published by Prentice Hall Publishers, and grant you non-exclusive worldwide distribution rights. Peter S. Snell peter_snell@prenhall.com PTR Production Pearson Education 1 Lake Street Upper Saddle River, NJ 07458 From larsga@garshol.priv.no Sun Jul 30 16:38:31 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 30 Jul 2000 17:38:31 +0200 Subject: [XML-SIG] SAX 2.0 resolution? Message-ID: This is my proposal for the solution of the namespace-related SAX 2.0 design problems. The startElement and endElement methods are split into startElement(name, attrs) / startElementNS(name, qname, attrs) and endElement(name, attrs) / endElementNS(name, qname, attrs). I feel that this solution is better than the two alternatives, because it avoids dummy arguments, weird argument encoding and also because it makes the dual mode operation of SAX clearer than do the alternatives. Another benefit is that it is consistent with the DOM and it also seems easier to explain to people. The only disadvantages I see are that this may cost an extra method call per callback for some generic filters and that it does not make it clear that it is not allowed to mix the namespace and non-namespace methods in a single document. The Attributes object is retained, but extended with a new method getNamespaceItems() that returns a ((uri, lname), (qname, value)) list. This should solve the efficiency problems for the DOM implementations and at the same time provide us with the greatest implementation flexibility. It also makes it much less awkward to provide support for type information for attributes and makes it easier to extend this interface later. Note that drivers are allowed to recycle Attributes instances between method calls. As for the qname / prefix discussion I have no more of a standpoint in this debate than I did before. Prefixes seemed to be the most popular alternative when we last discussed this, but I have since noted that both the minidom and the 4DOM DOM builders use DOM methods to build DOM trees, and the DOM wants qname arguments. This seems to me to indicate that we should go for qnames. If anyone still has opinions on both the qname/prefix issue and the other two issues I would like to hear those. Once this has been sorted out I will start work on SAX 2.0 again. --Lars M. From jawarren@trentu.ca Mon Jul 31 01:45:27 2000 From: jawarren@trentu.ca (jawarren@trentu.ca) Date: Mon, 31 Jul 2000 00:45:27 +0000 (GMT) Subject: [XML-SIG] build IOError: no LIBPL Message-ID: <01JSE3V31NKQ00067D@trentu.ca> When attempting to build ('python setup.py build') on a Debian 2.1 system (Python 1.5.1, GCC 2.7.2.3) I get the following error: ----- make[1]: *** No rule to make target `/usr/lib/python1.5/config/Makefile', needed by `sedscript'. Stop. make[1]: Leaving directory `/usr/src/PyXML-0.5.4/extensions' make: *** [boot] Error 2 make: *** No targets. Stop. Executing 'build' action... Running command: make -f Makefile.pre.in boot Running command: make Traceback (innermost last): File "setup.py", line 185, in ? func() File "setup.py", line 155, in build_unix shutil.copy('extensions/' + filename, 'build/xml/parsers/') File "/usr/lib/python1.5/shutil.py", line 51, in copy copyfile(src, dst) File "/usr/lib/python1.5/shutil.py", line 16, in copyfile fsrc = open(src, 'rb') IOError: (2, 'No such file or directory') ----- I believe the problem is that there *is* no /usr/lib/python1.5/config/ so I would assume I was missing some package except that the README clearly states: "The only requirements for installing the package are Python 1.5 or later, and a C compiler." Thanks for your help, ~ Jared Warren From ken@bitsko.slc.ut.us Mon Jul 31 22:01:42 2000 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 31 Jul 2000 16:01:42 -0500 Subject: [XML-SIG] Python XML package, Permission Request for XML Handbook, 3rd Ed., Charles Goldfarb In-Reply-To: Lisa_Iarkowski@prenhall.com's message of "29 Jul 2000 16:22:56 -0400" References: <"/GUID:Q/2tt9ORk1BG5EwAQS4zudw*/G=Lisa/S=Iarkowski/OU=exchange/O=pearsontc/PRMD=pearson/ADMD=telemail/C=us/"@MHS> Message-ID: TO: Charles F. Goldfarb and Paul Prescod c/o Peter S. Snell - Prentice Hall Publishers I(we) grant you permission to include documentation and code for Python XML package in the book "The XML Handbook, Third Edition" to be published by Prentice Hall publishers in the Charles F. Goldfarb Series on Open Information Management, and grant you non-exclusive worldwide distribution rights. I(we) also grant you permission to include documentation and code for Python XML package on any future book in the Charles F. Goldfarb Series on Open Information Management published by Prentice Hall Publishers, and grant you non-exclusive worldwide distribution rights. -- Ken MacLeod From ken@bitsko.slc.ut.us Mon Jul 31 22:07:56 2000 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 31 Jul 2000 16:07:56 -0500 Subject: [XML-SIG] SAX 2.0 resolution? In-Reply-To: Lars Marius Garshol's message of "30 Jul 2000 17:38:31 +0200" References: Message-ID: Lars Marius Garshol writes: > This is my proposal for the solution of the namespace-related SAX 2.0 > design problems. > > > The startElement and endElement methods are split into > startElement(name, attrs) / startElementNS(name, qname, attrs) and > endElement(name, attrs) / endElementNS(name, qname, attrs). > > I feel that this solution is better than the two alternatives, > because it avoids dummy arguments, weird argument encoding and also > because it makes the dual mode operation of SAX clearer than do the > alternatives. Another benefit is that it is consistent with the DOM > and it also seems easier to explain to people. > > The only disadvantages I see are that this may cost an extra method > call per callback for some generic filters and that it does not make > it clear that it is not allowed to mix the namespace and > non-namespace methods in a single document. How does this affect down-line filters? what happens when a non-NS-using filter precedes an NS-using filter or handler? With the startElement(namespaceURI, localName, qName, attrs) model, you'd expect upline filters to pass all the parameters whether or not they themselves used it. (And, of course, you wouldn't be expected to mix SAX1 and SAX2 filters.) -- Ken From sales@lookelu.com Mon Jul 31 21:56:23 2000 From: sales@lookelu.com (The Western Web) Date: Mon, 31 Jul 2000 20:56:23 Subject: [XML-SIG] The Western Web Newsletter Message-ID: <20000801035420.42A2F1CF6D@dinsdale.python.org> THE WESTERN WEB WEEKLY NEWS LETTER Week of July 24, 2000 Serving Over 75000 Recipients With your assistance "The Western Web" continues to improve and your input is helpful.Our goal is to make "The Western Web" THE one place stop for all your Horse, Livestock and Western Life Style needs. If You have added your site to our search engine, please make sure everything is correct. If you haven't noticed we have upgraded the look and capabilities of The Western Web search engine. You can now type in your search word and find all related site links. Don't forget to add your Web Site to our search engine too. http://www.searchthewesternweb.com This week you might take a look at our "Events Calendar" in our Classified Ad section. You can post your upcoming events in subcategories such as: Events, Shows, Cuttings, Team Roping, Gymkhana, Clinics, Trail, Auctions, Rodeos, Reining, Barrel Racing, Team Penning and Performance & Halter. We also have a subcategory for "Other" to place any event not categorized. These ads are free and you can add pictures, video and audio. A note to our subscribers who have posted ads, with you User Name and Password you can update your events. http://www.westernwebclassified.com/cgi-bin/classifieds/classifieds.cgi At last, an online service available with the horse lover in mind, The Sale Barn.Com (www.thesalebarn.com). The Sale Barn offers an online auction specifically for horse-related items, whether you are buying or selling. The Sale Barn auctions off 100s of items daily with many items in the Hot Items Listing starting at $1.00! Usually there are from 150 to 200 items starting at only $1.00. From saddles, bridles, bits, spurs and unique gift items. Register now to qualify for our weekly drawing. The current prize is a 34 x 36 Wool Blend Show Blanket with wear leathers and silver conchos valued at $99.95! This item is featured on our Home Page at www.thesalebarn.com. Registration is free on our secure site with no credit card necessary. The Sale Barn is amongst the top 10 visited horse sites on the Internet with over 10,000 hits a day. A perfect opportunity to turn unneeded horse related items in to cash. The Sale Barn is the ebay of the horse world with categories directed to specific items such as saddles, headstalls, bits, spurs, ropes, gift items, horse trailers, etc. http://www.thesalebarn.com We appreciate you patronizing our sponsors. You to can have your web site on our front page along with Banks Power, Roo-hyde Saddlery, GMC, Bootbarn.com,Truckloads.net, Zig Zigler, Comforce, The Gaited Horse, Cowboy Tack, Painted Acres Ranch,The ShawnOshine,Tom Balding Bits & Spurs, Centenary of Federation and Stoxrus.com. You can find our reasonable rates at: http://www.thewesternweb.com/Advertising/Advertising.htm While at The Western Web site take a look at our message board: http://www.westernmessageboard.com/cgi-bin/Ultimate.cgi We can Design & Host your web site. Check out our low domain name registration prices at: http://www.thewesternweb.com/Web_Design/Domain_Name_Registration.htm For you convenience, there are links to these sites and more, from The Western Web Home Page. http://www.thewesternweb.com/ If you receive this message in error or want us to remove you from our newsletter e-mail list, please reply to this email address with the word "Remove" in the subject line. Thank You, http://www.thewesternweb.com
, initially with no formatting, column or row spanning). Maintain a basic page saying what tags you can work with. 5. Work on the formatting script and doc template with the SC people until it looks great. There will be a couple of options: "format my proposal now" and "bind all entries together". 6. Finally, the HTML to PDF filter will go in our standard library, with your name on it - maybe even next week! I would expect such a tool to get a lot of attention at the conference and afterwards, especially if it is web based and the output is good looking. If more people want to pitch in and add more features, we could perhaps have a lot of conference attendees using it to generate great documents. And it will be a fantastic example of the potential of ReportLab. Please reply to this list; if several people express interest, see if you can help each other and work together. Best Regards, Andy Robinson CEO/Chief Architect, ReportLab Inc. From andy@reportlab.com Wed Jul 5 06:39:24 2000 From: andy@reportlab.com (Andy Robinson) Date: Wed, 5 Jul 2000 06:39:24 +0100 Subject: [XML-SIG] RE: Wanted - Heroes In-Reply-To: Message-ID: > -----Original Message----- > From: Andy Robinson [mailto:andy@reportlab.com] > > > 3. Make a CGI script on www.reportlab.com (we can > provide access) that lets someone submit their HTML > inside a form and either makes a PDF, or tells them > the first tag it cannot handle and the line it > occurred on. We get this running really early - > preferably as soon as one or two tags can be handled. Better idea - we'll make this a subproject of reportlab on SourceForge. We can now grant rights to subprojects, and the machine should be able to handle the CGI initially. We'd just mirror the finished script to our main site which is much faster before the conference. - Andy Robinson From larsga@garshol.priv.no Wed Jul 5 08:40:54 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 05 Jul 2000 09:40:54 +0200 Subject: [XML-SIG] SAX namespaces discussion status In-Reply-To: <200007041620.KAA16440@localhost.localdomain> References: <200007041620.KAA16440@localhost.localdomain> Message-ID: * Uche Ogbuji | | Basically, if someone were writing a generic app with different | actions in namespace and non-namespace mode, they would have to have | a conditional such as: | | def startElement( self, name, qname, attrs ): | if type(name) == type(()): | uri, lname = name | #namespace processing | else: | #non-namespace processing | | Not the end of the world, of course, but we must remember that there | are applications, filters, for example, that would have to deal with | either mode. I agree that this is awkward, but I wonder how common it will be. It seems to me that in most of these cases the name would be used for comparisons and for dictionary lookups. These can all be done with no knowledge of the mode or any special handling for the different modes. The typical case, I guess, is filters. I think their services will usually fall into the following categories: - names are only compared with configuration values and other names already passed from the parser: no problem - names are not examined at all: no problem - names are compared with fixed element names built-in to the filter, such as in the case of filters that implement XInclude or XBase. In these cases namespace processing is required, so there is no need for mode awareness. In the rare case that mode awareness is needed with respect to literals the filter can use the startDocument callback to query the parser with regards to the namespace processing setting and set its internal literals accordingly. So I can't really think of any common cases where this kind of action is necessary. If anyone can I would be interested to hear of them. --Lars M. From larsga@garshol.priv.no Wed Jul 5 16:37:26 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 05 Jul 2000 17:37:26 +0200 Subject: [XML-SIG] SAX namespaces discussion status In-Reply-To: <3961EF4A.D6615406@prescod.net> References: <3961EF4A.D6615406@prescod.net> Message-ID: * Lars Marius Garshol | | I don't think the backwards compatibility argument carries much | weight. Names have changed anyway, and in rewriting the code | adapting the startElement / endElement methods is very little work. | At least it was for me, and I've rewritten heaps of example code for | my book for just this. * Paul Prescod | | Oh geez are we going to break another book! Calm down, Paul, and read the paragraph again. The book is already broken twice, and I'm not trying to make it an issue in this discussion. | Anyhow, the more interesting backwards compatibility is between the | namespaces and no-namespaces mode. You say: | | > [non-namespace processing] makes XML much more approachable for novices, | | and | | > I would prefer replacing the tuple with the qname. Any code that | > looks at the internal structure of names for (uri, localname) will | > assume namespace processing anyway, methinks. | | Now you've got all these handlers in novice mode like this: | | def startElement( self, name, qname, attrs ): | ... | | Where name is always equal to qname! That strikes me as confusing and | unhelpful. If we are making a namespaces-off mode then you shouldn't | have to think about namespaces. Good point. | I think that the most tenable compromise is working out to: | | def startElement( self, name, attrs ): | def startElement( self, ((uri, localname,), qname), attrs ): | | where "qname" could be "qname" or "prefix" I feel that this solution is better in non-namespace mode and worse in namespace mode. It's acceptable to me, but I feel we should look a little closer at the attributes issue. Which operations is it DOM/XPath/whatnot require efficient implementations of in the Attributes interface? I'm asking not just because I'd like to require that attrs above be an Attributes instance, but also because I think the current Attributes design could well be improved. --Lars M. From walter@livinglogic.de Wed Jul 5 16:49:46 2000 From: walter@livinglogic.de (Walter Doerwald) Date: Wed, 05 Jul 2000 17:49:46 +0200 Subject: [XML-SIG] Bug in sgmlop? (_ in names) In-Reply-To: <20000623121037.C4805@amarok.cnri.reston.va.us> References: <4.3.1.0.20000623135801.00b0ae10@mail.tmt.de> <4.3.1.0.20000623135801.00b0ae10@mail.tmt.de> Message-ID: <4.3.1.0.20000705174932.00b4cd40@mail.tmt.de> At 18:10 23.06.00, you wrote: >On Fri, Jun 23, 2000 at 02:00:26PM +0200, Walter Doerwald wrote: > >I think I found a bug in sgmlop (from PyXML 0.5.5.1). It doesn't > >recognize _ in element names. The following code: > >I believe this bug was fixed in the 2000/05/28 update of sgmlop, which >is what's currently in the CVS tree. (And recent releases of 0.5.x >should have the updated version, therefore.) I finally found the time to try it with the new version from http://www.pythonware.com/products/xml/index.htm (sgmlop-000528.zip (Jun=20 28) (?)) and the bug is still there. And I think I found a new one: import sgmlop class Handler: def finish_starttag(self,name,attrs): print name,attrs p =3D sgmlop.SGMLParser() p.register(Handler()) p.parse("") results in foo {'_bar' : '_bar', 'baz': '', 'bar': 'bar'} So the attributes bar and baz seem to be treated differently. If using XMLParser instead of SGMLParser, the result is: foo_bar {'baz': '', 'bar': 'bar'} Bye, Walter D=F6rwald -- Walter D=F6rwald =B7 LivingLogic AG =B7 Bayreuth, Germany =B7 www.livinglogi= c.de From larsga@garshol.priv.no Wed Jul 5 16:52:35 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 05 Jul 2000 17:52:35 +0200 Subject: [XML-SIG] SAX Namespaces In-Reply-To: <20000704025655.X29590@lyra.org> References: <3961488E.68099597@prescod.net> <20000703202458.J29590@lyra.org> <20000704025655.X29590@lyra.org> Message-ID: * Lars Marius Garshol | | I would prefer replacing the tuple with the qname. Any code that | looks at the internal structure of names for (uri, localname) will | assume namespace processing anyway, methinks. If anyone can think of | convincing use cases that are made awkward by the string | representation I will reconsider. * Greg Stein | | As long as you're saying it is ((uri, localname), qname) or (qname, | qname), then I'm fine with that. Actually, I was thinking (uri, localname) or qname. (I was talking about #1., not #2, using Paul's numbers.) | In either case value[0] is the "name" of the item. This is a good point. If we go for Paul's #2 we have two choices (let's call them #2a and #2b): #2a: namespaces on: startElement(self, ((uri, local), qname), attrs) namespaces off: startElement(self, (qname, qname), attrs) #2b: namespaces on: startElement(self, ((uri, local), qname), attrs) namespaces off: startElement(self, qname, attrs) #2a is troublesome because it's ugly in non-namespace mode and has the exact same problem as my #1 in non-namespace mode: it forces namespace awareness on the poor novice. #2b is troublesome because it means that you have to act differently to get the true name depending on whether namespaces are on or off. To me this makes #2b very close to unacceptable. #2a seems to me to be inferior to #1 at first glance. The final qname in the outer tuple seems better off as a separate parameter than stuck inside that tuple. Of course, this leaves the question of how we represent attributes. In my opinion we could go for #1 and still use some variant of Paul's attribute lists. The question is just which one: i) [((URI, localname, qname), value), ...] ii) [(((URI, localname), qname), value), ...] iii) [((URI, localname), qname, value), ...] iv) {(URI, localname) : (qname, value), ...} v) ({(URI, localname) : value, ...}, {qname : (URI, localname), ...}) More ideas? Opinions? Does it really matter what we choose here, if we are going to have a convenience wrapper class anyway? --Lars M. From Fredrik Lundh" <4.3.1.0.20000623135801.00b0ae10@mail.tmt.de> <4.3.1.0.20000705174932.00b4cd40@mail.tmt.de> Message-ID: <002e01bfe6a0$0831cae0$f2a6b5d4@hagrid> walter wrote: > I finally found the time to try it with the new version from > http://www.pythonware.com/products/xml/index.htm > (sgmlop-000528.zip (Jun 28) (?)) > and the bug is still there. >=20 > And I think I found a new one: > import sgmlop >=20 > class Handler: > def finish_starttag(self,name,attrs): > print name,attrs >=20 > p =3D sgmlop.SGMLParser() > p.register(Handler()) > p.parse("") >=20 > results in >=20 > foo {'_bar' : '_bar', 'baz': '', 'bar': 'bar'} >=20 > So the attributes bar and baz seem to be treated differently. the SGMLParser is designed to be compatible with sgmllib, and sgmllib doesn't accept underscores in tag names. > If using XMLParser instead of SGMLParser, the result is: >=20 > foo_bar {'baz': '', 'bar': 'bar'} this looks like a bug in attrparse (it probably works if you add a space before the closing slash). I'll post a patch as soon as I have one... cheers /F From Fredrik Lundh" I wrote: > > foo_bar {'baz': '', 'bar': 'bar'} > > this looks like a bug in attrparse (it probably works if you add > a space before the closing slash). I'll post a patch as soon as > I have one... here's one: ... diff -u sgmlop.c~ sgmlop.c --- sgmlop.c~ Sun May 28 20:14:01 2000 +++ sgmlop.c Wed Jul 05 22:09:01 2000 @@ -23,6 +23,7 @@ * 2000-05-28 fl Added temporary workaround for unicode problem = (@SGMLOP2) * 2000-05-28 fl Removed optional close argument (@SGMLOP3) * 2000-05-28 fl Raise exception on recursive feed (@SGMLOP4) + * 2000-07-05 fl Fixed attribute handling in empty tags (@SGMLOP6) * * Copyright (c) 1998-2000 by Secret Labs AB * Copyright (c) 1998-2000 by Fredrik Lundh @@ -972,7 +973,7 @@ if (!self->xml) while (ISALNUM(*p) || *p =3D=3D '-' || *p =3D=3D '.' || *p =3D=3D ':' || *p =3D=3D '?') { - *p =3D TOLOWER(*p); + *p =3D (CHAR_T) TOLOWER(*p); if (++p >=3D end) goto eol; } @@ -1199,7 +1200,7 @@ CHAR_T *p; ch =3D 0; for (p =3D b; p < e; p++) - ch =3D ch*10 + *p - '0'; + ch =3D (CHAR_T) (ch*10 + *p - '0'); res =3D PyObject_CallFunction(self->handle_data, "s#", &ch, sizeof(CHAR_T)); } @@ -1278,18 +1279,18 @@ if (key =3D=3D NULL) goto err; =20 + if (xml) + value =3D Py_None; + else + value =3D key; /* in SGML mode, default is same as key */ + while (p < end && ISSPACE(*p)) p++; =20 - if (p < end && *p !=3D '=3D') { - - /* attribute value not specified: set value to name */ - value =3D key; - Py_INCREF(value); - - } else { + if (p < end && *p =3D=3D '=3D') { =20 /* attribute value found */ + Py_DECREF(value); =20 if (p < end) p++; ... From gstein@lyra.org Thu Jul 6 11:01:04 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 6 Jul 2000 03:01:04 -0700 Subject: [XML-SIG] SAX Namespaces In-Reply-To: ; from larsga@garshol.priv.no on Wed, Jul 05, 2000 at 05:52:35PM +0200 References: <3961488E.68099597@prescod.net> <20000703202458.J29590@lyra.org> <20000704025655.X29590@lyra.org> Message-ID: <20000706030104.E29590@lyra.org> On Wed, Jul 05, 2000 at 05:52:35PM +0200, Lars Marius Garshol wrote: >... > #2a seems to me to be inferior to #1 at first glance. The final qname > in the outer tuple seems better off as a separate parameter than stuck > inside that tuple. Of course, this leaves the question of how we > represent attributes. In my opinion we could go for #1 and still use > some variant of Paul's attribute lists. The question is just which > one: > > i) [((URI, localname, qname), value), ...] > ii) [(((URI, localname), qname), value), ...] > iii) [((URI, localname), qname, value), ...] > iv) {(URI, localname) : (qname, value), ...} > v) ({(URI, localname) : value, ...}, {qname : (URI, localname), ...}) > > More ideas? Opinions? Does it really matter what we choose here, if we > are going to have a convenience wrapper class anyway? I say (iv) [and qname is the prefix that was used] Using (iv) means that the passed attribute dictionary is immediately usable. The other forms require some initial processing, yet provide no value-add. Please do not assume that a convenience class will always be used. Cheers, -g -- Greg Stein, http://www.lyra.org/ From paul@prescod.net Thu Jul 6 17:04:00 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 06 Jul 2000 11:04:00 -0500 Subject: [XML-SIG] SAX Namespaces References: <3961488E.68099597@prescod.net> <20000703202458.J29590@lyra.org> <20000704025655.X29590@lyra.org> <20000706030104.E29590@lyra.org> Message-ID: <3964ADF0.8148FF9B@prescod.net> Greg Stein wrote: > > iv) {(URI, localname) : (qname, value), ...} > > Using (iv) means that the passed attribute dictionary is immediately usable. > The other forms require some initial processing, yet provide no value-add. It is immediately usable as a dictionary, but it must be converted to a list for apps that want to iterate over attributes. Examples include canonizers, tree builders, pretty printers and so forth. Here's the first line of qp_xml dealing with attributes: > for name, value in attrs.items(): Minidom uses the same first line, and so do a bunch of our other sample programs. Here are my reasons for preferring a list to a dictionary, from most important to least: 1. Many (most?) apps turn the dictionary into a list immediately. 2. Those that want "lookup" capability might want (URI,name)-based lookup, qname-based lookup, or both. The AttributesList interface provided both. 3. Dictionary building and populating is more expensive than list building. 4. Attribute lists are typically so small (two or three items) that it is debatable whether a hashtable is the right index structure for them anyhow. Maybe linear search is better for a lot of apps. Maybe "lazy" indexing is better. I'd rather leave it up to the app. 5. Pyexpat delivers the attributes as a list. Python 1.7 might just wrap the pyexpat data structure as a sequence rather than copying the attributes out (admittedly, more research is needed...!) 6. Bundling the qname with the value is not that intuitive. I vote for: > ii) [(((URI, localname), qname), value), ...] -- Paul Prescod - Not encumbered by corporate consensus The distinction between the real twentieth century (1914-1999) and the calenderical one (1900-2000) is based on the convincing idea that the century's bouts of unprecented violence, both within nations and between them, possess a definite historical coherence -- that they constitute, to put it simply, a single story. - The Unfinished Twentieth Century, Jonathan Schell Harper's Magazine, January 2000 From paul@prescod.net Thu Jul 6 17:08:31 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 06 Jul 2000 11:08:31 -0500 Subject: [XML-SIG] SAX Namespaces References: <3961488E.68099597@prescod.net> <20000703202458.J29590@lyra.org> <20000704025655.X29590@lyra.org> Message-ID: <3964AEFF.5CABCBD1@prescod.net> Back to startElement. I'm not entirely happy with any of the choices before us. The status quo is probably not much better or worse than anything else so we might as well leave it. That means: def startElement( self, (URI, localname), rawname, value ): .... I propose a design goal: "Code that is namespace-unaware should work the same whether namespaces are turned on or not." This allows an incremental upgrade from namespace mode to non-namespace mode. If we accept that goal, then rawname should be rawname, not prefix. Namespace-oblivious code should work with the rawname and ignore the "namespace name" parameter altogether. Making Expat live up to this expectation could be a headache, but I think that it is useful. The documentation should describe the rawname parameter as the appropriate one for people uninterested in namespaces. If you know about namespaces, and want to make code that works the same in namespace mode or non-namespace mode, then you can match on the first parameter, which would be either ("URI", "localname") or just "name". If you don't care about namespace mode then you can just configure your parser to always use namespaces and presume that the value is a tuple. -- Paul Prescod - Not encumbered by corporate consensus The distinction between the real twentieth century (1914-1999) and the calenderical one (1900-2000) is based on the convincing idea that the century's bouts of unprecented violence, both within nations and between them, possess a definite historical coherence -- that they constitute, to put it simply, a single story. - The Unfinished Twentieth Century, Jonathan Schell Harper's Magazine, January 2000 From paul@prescod.net Thu Jul 6 17:18:33 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 06 Jul 2000 11:18:33 -0500 Subject: [XML-SIG] SAX namespaces discussion status References: <200007041347.HAA16154@localhost.localdomain> Message-ID: <3964B159.D0C73EC6@prescod.net> Uche Ogbuji wrote: > > 4XPath and 4XSLT are absolutely littered with SplitQName() calls > that would be somewhat reduced in this case. I claim that it is not SAX's fault that you need to do all of these SplitQName's. SAX can only force you to code two SplitQName calls: one for elements and one for attributes. Once you have the prefix you never lose it unless you throw it away! The DOM does require you to do SplitQName() calls in createElementNS, but SAX can't help you avoid those. XSLT will also force you to do SplitQName when you see prefixes in XPath select attribute values (not names). Once again, SAX won't help there. Let's also consider the argument that gluing is easier than splitting. The code for gluing looks something like: if prefix: rawname=prefix+":"+localname else: rawname=localname The code for splitting is (something like): pair=rawname.split( ":" ) if len( pair>1 ): prefix=pair[0] else: prefix="" Not a big difference in my mind. -- Paul Prescod - Not encumbered by corporate consensus From fdrake@beopen.com Thu Jul 6 17:45:22 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Thu, 6 Jul 2000 12:45:22 -0400 (EDT) Subject: [XML-SIG] SAX Namespaces In-Reply-To: <3964ADF0.8148FF9B@prescod.net> References: <3961488E.68099597@prescod.net> <20000703202458.J29590@lyra.org> <20000704025655.X29590@lyra.org> <20000706030104.E29590@lyra.org> <3964ADF0.8148FF9B@prescod.net> Message-ID: <14692.47010.265123.429300@cj42289-a.reston1.va.home.com> Greg Stein wrote: > iv) {(URI, localname) : (qname, value), ...} > > Using (iv) means that the passed attribute dictionary is immediately usable. > The other forms require some initial processing, yet provide no value-add. Paul Prescod writes: > It is immediately usable as a dictionary, but it must be converted to a > list for apps that want to iterate over attributes. Examples include ... > 1. Many (most?) apps turn the dictionary into a list immediately. An unexpected observation! When we were working on Grail, the lists of attributes returned by sgmllib/htmllib were a substantial nuissance, and we *really* wanted dictionaries. The problem of looping over the attributes to get the ones we wanted was sufficient to fork the modules from the standard library and create the code that's in the later versions of Grail (see the grail/src/sgml directory in the Grail CVS tree at SourceForge), which was *much* easier to work with. Perhaps there's a split here between general tools that work on arbitrary XML and "applications" that don't care about the XML but only need to extract the information to solve some specific problem? That actually seems fairly likely to me, on first thought. > 3. Dictionary building and populating is more expensive than list > building. But still trivial compared to actually doing anything interesting with the attribute values. > 4. Attribute lists are typically so small (two or three items) that it > is debatable whether a hashtable is the right index structure for them > anyhow. Maybe linear search is better for a lot of apps. Maybe "lazy" > indexing is better. I'd rather leave it up to the app. Whether this is the most efficient structure is only part of it -- the usage pattern we observed in Grail was that we'd set up default values in locals, loop over the attributes list to set up locals, and then use the locals while doing whatever we needed to do. It was a real pain if we needed to branch on one attribute and then only use some others in one branch or another; we still had to loop and extract first, and then do the application work. We couldn't branch on the one that mattered, and then get the others only as needed. Unless we looped more than once, which is heinous. > 5. Pyexpat delivers the attributes as a list. Python 1.7 might just wrap > the pyexpat data structure as a sequence rather than copying the > attributes out (admittedly, more research is needed...!) If this really is an efficiency problem, then perhaps creating a highly efficient AttributeList implementation in C is worth the effort, otherwise, something that allows random access to attributes by name (such as the AttributeList) in Python is fine. Lists of attributes seem really hard to work with. -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From paul@prescod.net Thu Jul 6 20:53:17 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 06 Jul 2000 14:53:17 -0500 Subject: [XML-SIG] SAX Namespaces References: <3961488E.68099597@prescod.net> <20000703202458.J29590@lyra.org> <20000704025655.X29590@lyra.org> <20000706030104.E29590@lyra.org> <3964ADF0.8148FF9B@prescod.net> <14692.47010.265123.429300@cj42289-a.reston1.va.home.com> Message-ID: <3964E3AD.65779914@prescod.net> "Fred L. Drake, Jr." wrote: > > Greg Stein wrote: > > iv) {(URI, localname) : (qname, value), ...} > > > > Using (iv) means that the passed attribute dictionary is immediately usable. > > The other forms require some initial processing, yet provide no value-add. > > Paul Prescod writes: > > It is immediately usable as a dictionary, but it must be converted to a > > list for apps that want to iterate over attributes. Examples include > ... > > 1. Many (most?) apps turn the dictionary into a list immediately. > > An unexpected observation! When we were working on Grail, the lists > of attributes returned by sgmllib/htmllib were a substantial > nuissance, and we *really* wanted dictionaries. Agreed. I've always said that dictionary-based lookup is important and must be provided. To get that, you would use code like this: def startElement( self, name, attrs ): attrs=sax.AttributeList( attrs ) a=attrs["abc"] b=attrs["def"] The question is not which mode should be available, but which should be default. > The problem of > looping over the attributes to get the ones we wanted was sufficient > to fork the modules from the standard library and create the code > that's in the later versions of Grail (see the grail/src/sgml > directory in the Grail CVS tree at SourceForge), which was *much* > easier to work with. I would have suggested you use a wrapper approach rather than forking the codebase! > Perhaps there's a split here between general tools that work on > arbitrary XML and "applications" that don't care about the XML but > only need to extract the information to solve some specific problem? > That actually seems fairly likely to me, on first thought. I agree. We need both list-indexing and name-based indexing. The question is which should be default and which should require a method call or object construction. In the pre-namespace world I was so in-favor of name-based indexing that I actually changed PyExpat to use dictionaries myself. But in the post-namespace world, it isn't clear what to index upon, because it depends on what the application is interested in. A lot will care about localname/URI pairs. A lot will care about rawnames. A few (e.g. search engines) may want lists of attributes with a particular namespace. > The usage pattern we observed in Grail was that we'd set up default > values in locals, loop over the attributes list to set up locals, and > then use the locals while doing whatever we needed to do. It was a > real pain if we needed to branch on one attribute and then only use > some others in one branch or another; we still had to loop and extract > first, and then do the application work. We couldn't branch on the > one that mattered, and then get the others only as needed. Unless we > looped more than once, which is heinous. I would encapsulate this behavior in the wrapper class. That's what I meant by "lazy indexing." You ask for one attribute and an internal dictionary "remembers" where it found it. You ask for another and it remembers where it found that. If you only ask by URN/localname pair then you don't incur the cost of indexing by qname and if you only ask by qname then you don't incur the cost of indexing by pair. I haven't benchmarked this or any strategy. It depends on the application. If you find that attributes are slow in your application, you could benchmark and replace the AttributeList class with something that is more appropriate for it. > Lists of > attributes seem really hard to work with. It depends on what you are trying to do. There are vast classes of applications that I would expect to use the AttributeList class. I'm just trying to allow applications to NOT use it if they don't want it. Here is one way of looking at it. Let's say that there are four popular APIs out there: SAX DOM Pyxie QP_xml Let's say an equal number of people use all four. Finally, let's say that all but SAX are built on SAX. If this is the case, then 75% of all Python SAX users are going to be using APIs that immediately copy attributes into a list (which all of the non-SAX APIs do today) and 25% are going to be using the dictionary-structure directly. And even some subset of the 25% may find that the dictionary is sub-optimal because it is indexed based on the wrong property or properties. -- Paul Prescod - Not encumbered by corporate consensus The distinction between the real twentieth century (1914-1999) and the calenderical one (1900-2000) is based on the convincing idea that the century's bouts of unprecented violence, both within nations and between them, possess a definite historical coherence -- that they constitute, to put it simply, a single story. - The Unfinished Twentieth Century, Jonathan Schell Harper's Magazine, January 2000 From fdrake@beopen.com Thu Jul 6 21:25:41 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Thu, 6 Jul 2000 16:25:41 -0400 (EDT) Subject: [XML-SIG] SAX Namespaces In-Reply-To: <3964E3AD.65779914@prescod.net> References: <3961488E.68099597@prescod.net> <20000703202458.J29590@lyra.org> <20000704025655.X29590@lyra.org> <20000706030104.E29590@lyra.org> <3964ADF0.8148FF9B@prescod.net> <14692.47010.265123.429300@cj42289-a.reston1.va.home.com> <3964E3AD.65779914@prescod.net> Message-ID: <14692.60229.384642.526502@cj42289-a.reston1.va.home.com> Paul Prescod writes: > Agreed. I've always said that dictionary-based lookup is important and > must be provided. The unexpected part was that you'd *ever* want to iterate over a list in "normal" applications! Unless the order of the attributes in the source instance is important, I don't see why. The more I think about it, the more I think a dict-like approach is the only useful way. > The question is not which mode should be available, but which should be > default. Agreed. > I would have suggested you use a wrapper approach rather than forking > the codebase! Yes, and we'd have said "Have you ever waited for Grail?" Building wrappers would have been really bad; Grail was never exactly a speed demon. ;) > But in the post-namespace world, it isn't clear what to index upon, > because it depends on what the application is interested in. A lot will > care about localname/URI pairs. A lot will care about rawnames. A few > (e.g. search engines) may want lists of attributes with a particular > namespace. Understood; I'm not going to argue that the dictionary syntax is particularly desirable, since there's no one key type that makes sense; methods are fine for the interface to sets of attributes. (And it's not that having sequence behavior is something I see as bad for whatever that object is.) I just think that the thing that handles all of this should be the default. I described the Grail experience: > The usage pattern we observed in Grail was that we'd set up default > values in locals, loop over the attributes list to set up locals, and > then use the locals while doing whatever we needed to do. It was a > real pain if we needed to branch on one attribute and then only use [...] And Paul said: > I would encapsulate this behavior in the wrapper class. That's what I > meant by "lazy indexing." You ask for one attribute and an internal > dictionary "remembers" where it found it. You ask for another and it > remembers where it found that. If you only ask by URN/localname pair > then you don't incur the cost of indexing by qname and if you only ask > by qname then you don't incur the cost of indexing by pair. This makes a lot of sense. And it points out why we should have a really efficient implementation of this; I can imagine a C implementation that does all the work and maintains all the appropriate caches, and the parser would use one of them, just like the Java flavors. A copy() method would be used to make a copy of the object as needed, and the parser could just call the clear() method at the start of each start tag. But I still think this should be the default type for the attributes set. > I haven't benchmarked this or any strategy. It depends on the > application. If you find that attributes are slow in your application, > you could benchmark and replace the AttributeList class with something > that is more appropriate for it. And if there's only one really good C implementation, everyone is happy with what comes out of the box. It makes a nice "battery" to include. ;) > It depends on what you are trying to do. There are vast classes of > applications that I would expect to use the AttributeList class. I'm > just trying to allow applications to NOT use it if they don't want it. Is the point of not using an efficiency issue? > Here is one way of looking at it. Let's say that there are four popular > APIs out there: Ok: SAX -- efficient version is sufficient DOM -- all my DOM code requests attributes by name, so lookup approach works; can be copied to a list on demand, or the efficient C AttributeList can provide this internally Pyxie -- not sure QP_xml -- exposes a dictionary interface, so something dict-like should work nicely as long as the interface & efficiency are right. > dictionary-structure directly. And even some subset of the 25% may find > that the dictionary is sub-optimal because it is indexed based on the > wrong property or properties. Again, I agree that this is an issue, and using a plain dict is not the right solution. But methods that do name lookup make a lot of sense, while a list interface doesn't. I don't think we really radically disagree; we just need an AttributeList implementation that meets the performance & sequence criteria. Nothing that a little time & C code can't fix. ;) Should I persue that possibility, or am I missing something really substantial somewhere? (Probably several things, but... related to this?) -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From paul@prescod.net Thu Jul 6 22:42:13 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 06 Jul 2000 16:42:13 -0500 Subject: [XML-SIG] SAX Namespaces References: <3961488E.68099597@prescod.net> <20000703202458.J29590@lyra.org> <20000704025655.X29590@lyra.org> <20000706030104.E29590@lyra.org> <3964ADF0.8148FF9B@prescod.net> <14692.47010.265123.429300@cj42289-a.reston1.va.home.com> <3964E3AD.65779914@prescod.net> <14692.60229.384642.526502@cj42289-a.reston1.va.home.com> Message-ID: <3964FD35.31B2AFDF@prescod.net> "Fred L. Drake, Jr." wrote: > > ... > > The unexpected part was that you'd *ever* want to iterate over a > list in "normal" applications! What's normal? There is a lot of code out there that is unconcerned with a particular vocabulary. > Unless the order of the attributes in > the source instance is important, I don't see why. It isn't at the top of my list, but maintaining attribute order is nice. Think about "diff". and other line-oriented applications like "grep". > Ok: > > SAX -- efficient version is sufficient > DOM -- all my DOM code requests attributes by name, so > lookup approach works; can be copied to a list on > demand, or the efficient C AttributeList can > provide this internally > Pyxie -- not sure > QP_xml -- exposes a dictionary interface, so something > dict-like should work nicely as long as the > interface & efficiency are right. Okay, you are presuming that this object would be used by all APIs. I was presuming that each API sets up its own data structures. A shared structure can only work if the APIs all expose the same interface or if the APIs that wanted non-standard access "wrapped" the standard object. This would not be too bad if the "standard object" is itself as efficient as the custom-data structures would be or at least as efficient as the list would be so that the APIs have the option of copying data out. > Should I persue that possibility, or am I missing something really > substantial somewhere? (Probably several things, but... related to > this?) No, if you have time to work on it and can work out an API and implementation that performs roughly comparably to built-in Python data structures, I would buy it. But we're a little short of time! -- Paul Prescod - Not encumbered by corporate consensus The distinction between the real twentieth century (1914-1999) and the calenderical one (1900-2000) is based on the convincing idea that the century's bouts of unprecented violence, both within nations and between them, possess a definite historical coherence -- that they constitute, to put it simply, a single story. - The Unfinished Twentieth Century, Jonathan Schell Harper's Magazine, January 2000 From gstein@lyra.org Thu Jul 6 23:21:55 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 6 Jul 2000 15:21:55 -0700 Subject: [XML-SIG] SAX Namespaces In-Reply-To: <3964ADF0.8148FF9B@prescod.net>; from paul@prescod.net on Thu, Jul 06, 2000 at 11:04:00AM -0500 References: <3961488E.68099597@prescod.net> <20000703202458.J29590@lyra.org> <20000704025655.X29590@lyra.org> <20000706030104.E29590@lyra.org> <3964ADF0.8148FF9B@prescod.net> Message-ID: <20000706152155.N29590@lyra.org> On Thu, Jul 06, 2000 at 11:04:00AM -0500, Paul Prescod wrote: > Greg Stein wrote: > > > > iv) {(URI, localname) : (qname, value), ...} > > > > Using (iv) means that the passed attribute dictionary is immediately usable. > > The other forms require some initial processing, yet provide no value-add. > > It is immediately usable as a dictionary, but it must be converted to a > list for apps that want to iterate over attributes. Oh, no. An app must say dict.items(). No biggy for that app. > Examples include > canonizers, tree builders, pretty printers and so forth. Here's the > first line of qp_xml dealing with attributes: > > > for name, value in attrs.items(): > > Minidom uses the same first line, and so do a bunch of our other sample > programs. That is an improper basis for your claim. I do the .items() because qp_xml does the namespace processing itself. If Expat does the processing, then I would no longer need a lot of the work in qp_xml.Parser.start. And I certainly would never do the .items() any more. It is easy to transform a dictionary to a list. The other direction is much harder. > Here are my reasons for preferring a list to a dictionary, from most > important to least: > > 1. Many (most?) apps turn the dictionary into a list immediately. > > 2. Those that want "lookup" capability might want (URI,name)-based > lookup, qname-based lookup, or both. The AttributesList interface > provided both. How could anybody do a lookup based on a qname? There is no way to know the prefix. If you're talking about the "xml:" prefix, then you also know the URI, so the lookup on a (URI, name) is a cakewalk. > 3. Dictionary building and populating is more expensive than list > building. Eh? How is that? And we are talking mostly about convenience for the Python programmer here. Shaving a few cycles of C code is moot w.r.t. what the Python result is. > 4. Attribute lists are typically so small (two or three items) that it > is debatable whether a hashtable is the right index structure for them > anyhow. Maybe linear search is better for a lot of apps. Maybe "lazy" > indexing is better. I'd rather leave it up to the app. The app can use either input. This is a no-op. > 5. Pyexpat delivers the attributes as a list. Python 1.7 might just wrap > the pyexpat data structure as a sequence rather than copying the > attributes out (admittedly, more research is needed...!) We're talking about delivering the right semantic to the Python user. Expat doesn't have dictionaries, so it must deliver them that way. We are under no requirement to match it exactly. > 6. Bundling the qname with the value is not that intuitive. It should be the prefix, not the qname. But yes: it isn't as intuitive as it could be. But the (URI, name) key is definitely intuitive. It also stresses the simple fact that you can only have one key/value for a particular attribute. The semantics are a much better match. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Thu Jul 6 23:25:36 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 6 Jul 2000 15:25:36 -0700 Subject: [XML-SIG] efficiency? (was: SAX Namespaces) In-Reply-To: <14692.47010.265123.429300@cj42289-a.reston1.va.home.com>; from fdrake@beopen.com on Thu, Jul 06, 2000 at 12:45:22PM -0400 References: <3961488E.68099597@prescod.net> <20000703202458.J29590@lyra.org> <20000704025655.X29590@lyra.org> <20000706030104.E29590@lyra.org> <3964ADF0.8148FF9B@prescod.net> <14692.47010.265123.429300@cj42289-a.reston1.va.home.com> Message-ID: <20000706152536.O29590@lyra.org> On Thu, Jul 06, 2000 at 12:45:22PM -0400, Fred L. Drake, Jr. wrote: >... > > 5. Pyexpat delivers the attributes as a list. Python 1.7 might just wrap > > the pyexpat data structure as a sequence rather than copying the > > attributes out (admittedly, more research is needed...!) > > If this really is an efficiency problem, then perhaps creating a > highly efficient AttributeList implementation in C is worth the > effort, otherwise, something that allows random access to attributes > by name (such as the AttributeList) in Python is fine. Lists of > attributes seem really hard to work with. I don't believe that an efficiency problem exists in there. We are simply mapping some Expat data over to Python data. Straight-forward. I certainly would not want to see a C-based attribute handling thingy. Python has rich enough data structures -- this stuff can go right over the wall into Python without the need for wrapping it. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Thu Jul 6 23:30:09 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 6 Jul 2000 15:30:09 -0700 Subject: [XML-SIG] SAX Namespaces In-Reply-To: <3964E3AD.65779914@prescod.net>; from paul@prescod.net on Thu, Jul 06, 2000 at 02:53:17PM -0500 References: <3961488E.68099597@prescod.net> <20000703202458.J29590@lyra.org> <20000704025655.X29590@lyra.org> <20000706030104.E29590@lyra.org> <3964ADF0.8148FF9B@prescod.net> <14692.47010.265123.429300@cj42289-a.reston1.va.home.com> <3964E3AD.65779914@prescod.net> Message-ID: <20000706153009.P29590@lyra.org> On Thu, Jul 06, 2000 at 02:53:17PM -0500, Paul Prescod wrote: > "Fred L. Drake, Jr." wrote: >... > > The usage pattern we observed in Grail was that we'd set up default > > values in locals, loop over the attributes list to set up locals, and > > then use the locals while doing whatever we needed to do. It was a > > real pain if we needed to branch on one attribute and then only use > > some others in one branch or another; we still had to loop and extract > > first, and then do the application work. We couldn't branch on the > > one that mattered, and then get the others only as needed. Unless we > > looped more than once, which is heinous. > > I would encapsulate this behavior in the wrapper class. That's what I > meant by "lazy indexing." You ask for one attribute and an internal > dictionary "remembers" where it found it. You ask for another and it > remembers where it found that. If you only ask by URN/localname pair > then you don't incur the cost of indexing by qname and if you only ask > by qname then you don't incur the cost of indexing by pair. Holy smokes. That is awfully complicated. Why the "lazy indexing?" A dictionary works quite well here. Stuff them in a dictionary and you're done. >... > Here is one way of looking at it. Let's say that there are four popular > APIs out there: > > SAX > DOM > Pyxie > QP_xml > > Let's say an equal number of people use all four. Finally, let's say > that all but SAX are built on SAX. > > If this is the case, then 75% of all Python SAX users are going to be > using APIs that immediately copy attributes into a list (which all of > the non-SAX APIs do today) and 25% are going to be using the > dictionary-structure directly. qp_xml will also use the dictionary. As I said in the other note, it doesn't today because it is doing the NS processing itself. Let Expat do that, and all the pain goes away. > And even some subset of the 25% may find > that the dictionary is sub-optimal because it is indexed based on the > wrong property or properties. How can (URI, name) be the wrong index? That is the name of the dumb thing. qname certainly isn't because you can't know what prefix was used. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Thu Jul 6 23:34:04 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 6 Jul 2000 15:34:04 -0700 Subject: [XML-SIG] SAX Namespaces In-Reply-To: <14692.60229.384642.526502@cj42289-a.reston1.va.home.com>; from fdrake@beopen.com on Thu, Jul 06, 2000 at 04:25:41PM -0400 References: <3961488E.68099597@prescod.net> <20000703202458.J29590@lyra.org> <20000704025655.X29590@lyra.org> <20000706030104.E29590@lyra.org> <3964ADF0.8148FF9B@prescod.net> <14692.47010.265123.429300@cj42289-a.reston1.va.home.com> <3964E3AD.65779914@prescod.net> <14692.60229.384642.526502@cj42289-a.reston1.va.home.com> Message-ID: <20000706153404.Q29590@lyra.org> On Thu, Jul 06, 2000 at 04:25:41PM -0400, Fred L. Drake, Jr. wrote: >... > > Here is one way of looking at it. Let's say that there are four popular > > APIs out there: > > Ok: > > SAX -- efficient version is sufficient > DOM -- all my DOM code requests attributes by name, so > lookup approach works; can be copied to a list on > demand, or the efficient C AttributeList can > provide this internally > Pyxie -- not sure > QP_xml -- exposes a dictionary interface, so something > dict-like should work nicely as long as the > interface & efficiency are right. The efficiency is fine. There are two options: 1) in C code, build a list of nested tuples 2) in C code, build a dict of tuple keys and tuple values These are both about the same speed. Usability of the end result wins out over any minor speed issue. It isn't even worth worrying about. > > > dictionary-structure directly. And even some subset of the 25% may find > > that the dictionary is sub-optimal because it is indexed based on the > > wrong property or properties. > > Again, I agree that this is an issue, and using a plain dict is not > the right solution. But methods that do name lookup make a lot of > sense, while a list interface doesn't. Agreed. > I don't think we really radically disagree; we just need an > AttributeList implementation that meets the performance & sequence > criteria. Nothing that a little time & C code can't fix. ;) > Should I persue that possibility, or am I missing something really > substantial somewhere? (Probably several things, but... related to > this?) qp_xml doesn't need an AttributeList implementation. That will just get in the way. In fact, qp_xml will use *exactly* what is returned in the callback. Maybe I'll strip the prefix out of the attr values, but probably not. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Thu Jul 6 23:37:23 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 6 Jul 2000 15:37:23 -0700 Subject: [XML-SIG] jumping the gun (was: SAX Namespaces) In-Reply-To: <3964FD35.31B2AFDF@prescod.net>; from paul@prescod.net on Thu, Jul 06, 2000 at 04:42:13PM -0500 References: <20000703202458.J29590@lyra.org> <20000704025655.X29590@lyra.org> <20000706030104.E29590@lyra.org> <3964ADF0.8148FF9B@prescod.net> <14692.47010.265123.429300@cj42289-a.reston1.va.home.com> <3964E3AD.65779914@prescod.net> <14692.60229.384642.526502@cj42289-a.reston1.va.home.com> <3964FD35.31B2AFDF@prescod.net> Message-ID: <20000706153723.R29590@lyra.org> On Thu, Jul 06, 2000 at 04:42:13PM -0500, Paul Prescod wrote: > "Fred L. Drake, Jr." wrote: >... > > Should I persue that possibility, or am I missing something really > > substantial somewhere? (Probably several things, but... related to > > this?) > > No, if you have time to work on it and can work out an API and > implementation that performs roughly comparably to built-in Python data > structures, I would buy it. But we're a little short of time! This is totally jumping the gun. "roughly comparably to built-in Python data structures." Does that tell you something? ... just use the Python data structures! Why are we getting complicated here? The parser and SAX should just return some simple Python data structs. If an app wants an AttributeList, then let it build one. Now... if you're suggesting building this little utility as a separate and disjoint project, then okay. But that is exactly that: separate. It doesn't apply to the discussion of what type is used in the event handler callback. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Thu Jul 6 23:44:03 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 6 Jul 2000 15:44:03 -0700 Subject: [XML-SIG] operating modes (was: SAX Namespaces) In-Reply-To: <3964AEFF.5CABCBD1@prescod.net>; from paul@prescod.net on Thu, Jul 06, 2000 at 11:08:31AM -0500 References: <3961488E.68099597@prescod.net> <20000703202458.J29590@lyra.org> <20000704025655.X29590@lyra.org> <3964AEFF.5CABCBD1@prescod.net> Message-ID: <20000706154403.S29590@lyra.org> On Thu, Jul 06, 2000 at 11:08:31AM -0500, Paul Prescod wrote: > Back to startElement. I'm not entirely happy with any of the choices > before us. The status quo is probably not much better or worse than > anything else so we might as well leave it. That means: > > def startElement( self, (URI, localname), rawname, value ): > .... > > I propose a design goal: "Code that is namespace-unaware should work the > same whether namespaces are turned on or not." This allows an > incremental upgrade from namespace mode to non-namespace mode. Seems reasonable. I'm okay with this. > If we accept that goal, then rawname should be rawname, not prefix. > Namespace-oblivious code should work with the rawname and ignore the > "namespace name" parameter altogether. Making Expat live up to this > expectation could be a headache, but I think that it is useful. The > documentation should describe the rawname parameter as the appropriate > one for people uninterested in namespaces. > > If you know about namespaces, and want to make code that works the same > in namespace mode or non-namespace mode, then you can match on the first > parameter, which would be either ("URI", "localname") or just "name". If > you don't care about namespace mode then you can just configure your > parser to always use namespaces and presume that the value is a tuple. These two paragraphs seem wrong. Given: def startElement(self, name, something, value): The code should simply use "name" for all name references. That works in both non-namespace and namespace mode. *If* the user is building namespace-aware code, *then* they look at the "something" parameter. IMO, that second parameter is a prefix, but that does not negate standard operating procedure: key all your work off the first parameter. I don't see how it would be possibly to write any code using a rawname when in NS-mode. That prefix could be anything. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Thu Jul 6 23:47:55 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 6 Jul 2000 15:47:55 -0700 Subject: [XML-SIG] combine vs split (was: SAX namespaces discussion status) In-Reply-To: <3964B159.D0C73EC6@prescod.net>; from paul@prescod.net on Thu, Jul 06, 2000 at 11:18:33AM -0500 References: <200007041347.HAA16154@localhost.localdomain> <3964B159.D0C73EC6@prescod.net> Message-ID: <20000706154755.T29590@lyra.org> But if the data arrives already-split, then the application is ahead of the game. It may never need to recombine it. If the data arrives joined, then the app is probably going to have to split the thing. in your example, I would always go with the simple string concatenations over the split() invocation. Consider that the output is probably going to look like this: f.write("<%s:%s>data", prefix, localname, prefix, localname) Very easy and straightforward. It is much more untuitive than that code to split up the prefix/localname. Cheers, -g On Thu, Jul 06, 2000 at 11:18:33AM -0500, Paul Prescod wrote: > Uche Ogbuji wrote: > > > > 4XPath and 4XSLT are absolutely littered with SplitQName() calls > > that would be somewhat reduced in this case. > > I claim that it is not SAX's fault that you need to do all of these > SplitQName's. > > SAX can only force you to code two SplitQName calls: one for elements > and one for attributes. Once you have the prefix you never lose it > unless you throw it away! > > The DOM does require you to do SplitQName() calls in createElementNS, > but SAX can't help you avoid those. > > XSLT will also force you to do SplitQName when you see prefixes in > XPath select attribute values (not names). Once again, SAX won't > help there. > > Let's also consider the argument that gluing is easier than splitting. > The code for gluing looks something like: > > if prefix: > rawname=prefix+":"+localname > else: > rawname=localname > > > > The code for splitting is (something like): > > pair=rawname.split( ":" ) > if len( pair>1 ): > prefix=pair[0] > else: > prefix="" > > > Not a big difference in my mind. > > -- > Paul Prescod - Not encumbered by corporate consensus > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Greg Stein, http://www.lyra.org/ From wunder@ultraseek.com Fri Jul 7 00:03:40 2000 From: wunder@ultraseek.com (Walter Underwood) Date: Thu, 06 Jul 2000 16:03:40 -0700 Subject: [XML-SIG] FW: pyexpat compilation errors - Python 2.0b1 In-Reply-To: <3962235E.758F10F8@prescod.net> Message-ID: <826745907.962899420@serrano.infoseek.com> I'd prefer seeing forward declarations for the functions, then a declare the array in one place. Then there is no need for null termination, because you can get the length with this idiom: sizeof(handler_info_array)/sizeof(handler_info_array[0]) It's a compile-time constant, so the optimizer has maximum fun. wunder --On Tuesday, July 04, 2000 12:48 PM -0500 Paul Prescod wrote: > I like this solution. Work for you Mark? > > Juergen Hermann wrote: >> >> statichere struct HandlerInfo* handler_info = 0; >> >> ... >> >> statichere struct HandlerInfo handler_info_array[]= >> {{"StartElementHandler", >> ... >> }; >> >> void >> initpyexpat(){ >> handler_info = handler_info_array; >> ... >> } > > -- > Paul Prescod - Not encumbered by corporate consensus > The distinction between the real twentieth century (1914-1999) and the > calenderical one (1900-2000) is based on the convincing idea that the > century's bouts of unprecented violence, both within nations and > between them, possess a definite historical coherence -- that they > constitute, to > put it simply, a single story. > - The Unfinished Twentieth Century, Jonathan Schell > Harper's Magazine, January 2000 > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig > -- Walter R. Underwood Senior Staff Engineer, Ultraseek Corp. http://www.ultraseek.com/ From tpassin@home.com Fri Jul 7 02:31:29 2000 From: tpassin@home.com (tpassin@home.com) Date: Thu, 6 Jul 2000 21:31:29 -0400 Subject: [XML-SIG] SAX Namespaces References: <3961488E.68099597@prescod.net><20000703202458.J29590@lyra.org><20000704025655.X29590@lyra.org><20000706030104.E29590@lyra.org><3964ADF0.8148FF9B@prescod.net><14692.47010.265123.429300@cj42289-a.reston1.va.home.com><3964E3AD.65779914@prescod.net> <14692.60229.384642.526502@cj42289-a.reston1.va.home.com> Message-ID: <005e01bfe7b3$1449b400$7cac1218@reston1.va.home.com> Fred L. Drake, Jr wrote - > > The unexpected part was that you'd *ever* want to iterate over a > list in "normal" applications! Unless the order of the attributes in > the source instance is important, I don't see why. > The more I think about it, the more I think a dict-like approach is > the only useful way. > Well, if you don't know what attributes to expect - like you don't have a DTD and you want to display an more-or-less unknown document - you want to iterate over the attributes. If you know what attributes you want, especially if you know they are in there, you'd like a dictionary. As Paul and Greg have mentioned, it's easy to get a list if you have a dictionary. I've changed my mind on this one - I used to think a list was the right idea. But now I say a dictionary would be good. I don't see it as being bad either way, though. Regards, Tom Passin From fdrake@beopen.com Fri Jul 7 02:39:22 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Thu, 6 Jul 2000 21:39:22 -0400 (EDT) Subject: [XML-SIG] SAX Namespaces In-Reply-To: <005e01bfe7b3$1449b400$7cac1218@reston1.va.home.com> References: <3961488E.68099597@prescod.net> <20000703202458.J29590@lyra.org> <20000704025655.X29590@lyra.org> <20000706030104.E29590@lyra.org> <3964ADF0.8148FF9B@prescod.net> <14692.47010.265123.429300@cj42289-a.reston1.va.home.com> <3964E3AD.65779914@prescod.net> <14692.60229.384642.526502@cj42289-a.reston1.va.home.com> <005e01bfe7b3$1449b400$7cac1218@reston1.va.home.com> Message-ID: <14693.13514.654867.361086@cj42289-a.reston1.va.home.com> tpassin@home.com writes: > Well, if you don't know what attributes to expect - like you don't have a > DTD and you want to display an more-or-less unknown document - you want to > iterate over the attributes. If you know what attributes you want, When iterating over these unknown attributes, do you care that you see them in their original order? If so, then a sequence is *required*, and anything else can be derived (somewhat annoying but possible). If you don't care about the *original* order, .items() on a mapping is sufficient. Paul: Are there reasonable cases where you need the original order? (You seem to be the strongest proponent of this.) Would this be something where modality in the interface isn't so bad? You get sequences if you ask for them, but mappings by default? > As Paul and Greg have mentioned, it's easy to get a list if you have a > dictionary. I've changed my mind on this one - I used to think a list was > the right idea. But now I say a dictionary would be good. I don't see it > as being bad either way, though. I think there are a couple of issues: is it reasonable to require the original order of atttributes, and, if not, what's the right dictionary key? (I'd say (URI,localname) in namespace mode, or rawname otherwise, with all other information available in the value.) xml.sax.AttributeList can provide all the appropriate query methods according to the SAX spec at that point. -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From paul@prescod.net Fri Jul 7 02:42:33 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 06 Jul 2000 20:42:33 -0500 Subject: [XML-SIG] SAX Namespaces References: <3961488E.68099597@prescod.net> <20000703202458.J29590@lyra.org> <20000704025655.X29590@lyra.org> <20000706030104.E29590@lyra.org> <3964ADF0.8148FF9B@prescod.net> <20000706152155.N29590@lyra.org> Message-ID: <39653589.E559E202@prescod.net> Greg Stein wrote: > > ... > > > Minidom uses the same first line, and so do a bunch of our other sample > > programs. > > That is an improper basis for your claim. I do the .items() because qp_xml > does the namespace processing itself. If Expat does the processing, then I > would no longer need a lot of the work in qp_xml.Parser.start. And I > certainly would never do the .items() any more. If the interface you want to expose to your users is ((uri,localname)->(rawname,value)) then qp_xml is rather unique in that regard. Attribute-using code in such an environment is likely to be pretty ugly. The DOM says that the referent value is either a node or a string, not a rawname/value or prefix/value tuple. XPath, XPointer, XSLT say the same. > It is easy to transform a dictionary to a list. The other direction is much > harder. Either is easy. It's just a question of which will happen more. My guess is that most of the time we will transform Expat's list into a dictionary and then convert that to a list (to iterate over it) and then convert that back to a DOM-specific, Pyxie-specific etc. data structure. Fred wants to prove me wrong by making a data structure we'll all like. You just want to say that the provided data structure is "good enough" when it simply isn't enough for most APIs. I feel pretty confident that ((uri,localname):(rawname,value)) is not going to be a popular representation in higher level APIs, and perhaps even among SAX programmers. > How could anybody do a lookup based on a qname? There is no way to know the > prefix. I have megabytes of documents where I know the locations of every line-feed. Prefixes are not that mysterious. The W3C has decided that it is appropriate in specs "above XML" to query and navigate based on the prefix even if namespace processing is turned on. Even if we decided that that decision is questionable here, there is nothing we can do about it. Minidom (for one) indexes on both qname and uri/localname pair. The user may use this facility to blow their feet off but they might also have good reason for doing so. > If you're talking about the "xml:" prefix, then you also know the > URI, so the lookup on a (URI, name) is a cakewalk. Actually, it didn't occur to me until you mentioned it, but that isn't true. The string you mention is not by definition bound to the xml prefix, on the other hand the xml:* attributes are defined based on their rawnames, not their URI. > > 3. Dictionary building and populating is more expensive than list > > building. > > Eh? How is that? That was what my tests from Python code showed but on further testing I see that minor variations in the code can shift it around. In particular, lists were slower if you use "append" instead of precomputing the length of the list. > And we are talking mostly about convenience for the Python programmer here. > Shaving a few cycles of C code is moot w.r.t. what the Python result is. There is no convenient built-in data structure. I certainly don't think having "values" of (prefix,value) is convenient. Actually, the first version of minidom code did something like that but the code was pretty ugly. My point is that if we stick to Python's primitive data types then copying the attributes out will be the rule, not the exception. > We're talking about delivering the right semantic to the Python user. Expat > doesn't have dictionaries, so it must deliver them that way. > We are under no requirement to match it exactly. I didn't claim we were. I said that among other things, one benefit of doing it this way is mirroring Expat. > It should be the prefix, not the qname. But yes: it isn't as intuitive as it > could be. But the (URI, name) key is definitely intuitive. It also stresses > the simple fact that you can only have one key/value for a particular > attribute. The semantics are a much better match. Several specs in the XML family say that the right semantic is double indexing. I need to support all of the specs in the family. Even ignoring that, I don't believe that there is a single existing API that uses the mapping structure you propose. We would have to copy the values out or "wrap" if only to be backwards-compatible. It wouldn't be the end of the world if I had to do a .items() for every element, but I would be annoyed to find six months from now that most apps are doing the items() in which case the list should have been the data structure in the first case. -- Paul Prescod - Not encumbered by corporate consensus Pop stars come and pop stars go, but amid all this change there is one eternal truth: Whenever Bob Dylan writes a song about a guy, the guy is guilty as sin. - http://www.nj.com/page1/ledger/e2efc7.html From tpassin@home.com Fri Jul 7 03:00:00 2000 From: tpassin@home.com (tpassin@home.com) Date: Thu, 6 Jul 2000 22:00:00 -0400 Subject: [XML-SIG] SAX Namespaces References: <3961488E.68099597@prescod.net><20000703202458.J29590@lyra.org><20000704025655.X29590@lyra.org><20000706030104.E29590@lyra.org><3964ADF0.8148FF9B@prescod.net><14692.47010.265123.429300@cj42289-a.reston1.va.home.com><3964E3AD.65779914@prescod.net><14692.60229.384642.526502@cj42289-a.reston1.va.home.com><005e01bfe7b3$1449b400$7cac1218@reston1.va.home.com> <14693.13514.654867.361086@cj42289-a.reston1.va.home.com> Message-ID: <008401bfe7b7$0f192200$7cac1218@reston1.va.home.com> Fred L. Drake, Jr. wrote - > > tpassin@home.com writes: > > Well, if you don't know what attributes to expect - like you don't have a > > DTD and you want to display an more-or-less unknown document - you want to > > iterate over the attributes. If you know what attributes you want, > > When iterating over these unknown attributes, do you care that you > see them in their original order? If so, then a sequence is > *required*, and anything else can be derived (somewhat annoying but > possible). If you don't care about the *original* order, .items() on > a mapping is sufficient. Well, the XML Rec says that attribute order is not significant. That makes me think that no one should count on it. But there are all these variant processing methods on special subsets of XML where it could maybe be useful. The trouble is, if someone else wants to process your XML, and doesn't use your tools, he can't count on getting the order. So I don't favor coding information into the order. This is different from writing line-oriented XML for grepping or whatever. Someone with a full parser would still get the same results as you with your line-oriented tools. For that matter, does expat even preserve attribute order anyway? Maybe it's gone by the time Python gets the attributes. (I don't know enough about expat to know). The upshot - don't let attribute order be a significant issue. Regards, Tom Passin From paul@prescod.net Fri Jul 7 02:56:23 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 06 Jul 2000 20:56:23 -0500 Subject: [XML-SIG] SAX Namespaces References: <3961488E.68099597@prescod.net> <20000703202458.J29590@lyra.org> <20000704025655.X29590@lyra.org> <20000706030104.E29590@lyra.org> <3964ADF0.8148FF9B@prescod.net> <14692.47010.265123.429300@cj42289-a.reston1.va.home.com> <3964E3AD.65779914@prescod.net> <14692.60229.384642.526502@cj42289-a.reston1.va.home.com> <005e01bfe7b3$1449b400$7cac1218@reston1.va.home.com> <14693.13514.654867.361086@cj42289-a.reston1.va.home.com> Message-ID: <396538C7.8FDA1104@prescod.net> "Fred L. Drake, Jr." wrote: > > ... > > When iterating over these unknown attributes, do you care that you > see them in their original order? If so, then a sequence is > *required*, and anything else can be derived (somewhat annoying but > possible). If you don't care about the *original* order, .items() on > a mapping is sufficient. Original order is not a requirement. If it happens to work, it would be a pleasent side effect. My main deal is that I see most of my code converting PyExpat's list to a dictionary (for SAX) and then converting the dictionary back to a list (for iterating) and then (maybe) converting the list back to a pair of dictionaries or a dictionary with different index and values. list->dict->list->(dict,dict) I proposed to reduce it to: list->list->(dict,dict) You proposed to make it list->FredsDataStructure which I would use directly. > I think there are a couple of issues: is it reasonable to require > the original order of atttributes, and, if not, what's the right > dictionary key? No, that's not a requirement. > (I'd say (URI,localname) in namespace mode, or > rawname otherwise, with all other information available in the value.) So is value always a tuple? Always the same size? Or is sometimes a string and sometimes a tuple? > xml.sax.AttributeList can provide all the appropriate query methods > according to the SAX spec at that point Above you seem to talk about Python's built-in dictionary type. Here you seem to be talking about a new type or class. -- Paul Prescod - Not encumbered by corporate consensus Pop stars come and pop stars go, but amid all this change there is one eternal truth: Whenever Bob Dylan writes a song about a guy, the guy is guilty as sin. - http://www.nj.com/page1/ledger/e2efc7.html From tpassin@home.com Fri Jul 7 03:15:32 2000 From: tpassin@home.com (tpassin@home.com) Date: Thu, 6 Jul 2000 22:15:32 -0400 Subject: [XML-SIG] SAX Namespaces References: <3961488E.68099597@prescod.net><20000703202458.J29590@lyra.org><20000704025655.X29590@lyra.org><20000706030104.E29590@lyra.org><3964ADF0.8148FF9B@prescod.net><14692.47010.265123.429300@cj42289-a.reston1.va.home.com><3964E3AD.65779914@prescod.net><14692.60229.384642.526502@cj42289-a.reston1.va.home.com><005e01bfe7b3$1449b400$7cac1218@reston1.va.home.com> <14693.13514.654867.361086@cj42289-a.reston1.va.home.com> Message-ID: <008c01bfe7b9$3bd760c0$7cac1218@reston1.va.home.com> Fred L. Drake, Jr. wrote - > > > I think there are a couple of issues: is it reasonable to require > the original order of atttributes, and, if not, what's the right > dictionary key? (I'd say (URI,localname) in namespace mode, or > rawname otherwise, with all other information available in the value.) > xml.sax.AttributeList can provide all the appropriate query methods > according to the SAX spec at that point. > > The prefix is not supposed to have absolute significance. You could use a prefix of "ZMBW" and bind it to the namespace for "html" and get the same results, according to the recs. You could bind both prefixes in the same document, too, and still get the same results (leaving aside the question about whether all current tools really work that way, I guess). So if you are working with namespaces, the only count-on-able thing is the NS plus the local name. But if you want to recreate a document, you need the prefix. Sigh. Well, there's no requirement in the recs to be able to round-trip a document and end up with the same prefixes. I conclude that - * We should concentrate on having NS/localname be easy to understand and use. (like (NS,localname)). * Passing along the prefix could be very useful for some people, but it is basically a convenience. It shouldn't be the driver. BTW, if you are not in NS mode, but the name includes a prefix anyway, should the rawname equal prefix:localname? Or should localname=rawname=the whole thing? I **think** the latter is best. Anyone else have a thought here? Cheers, Tom Passin From paul@prescod.net Fri Jul 7 06:24:50 2000 From: paul@prescod.net (Paul Prescod) Date: Fri, 07 Jul 2000 00:24:50 -0500 Subject: [XML-SIG] SAX Namespaces References: <3961488E.68099597@prescod.net><20000703202458.J29590@lyra.org><20000704025655.X29590@lyra.org><20000706030104.E29590@lyra.org><3964ADF0.8148FF9B@prescod.net><14692.47010.265123.429300@cj42289-a.reston1.va.home.com><3964E3AD.65779914@prescod.net><14692.60229.384642.526502@cj42289-a.reston1.va.home.com><005e01bfe7b3$1449b400$7cac1218@reston1.va.home.com> <14693.13514.654867.361086@cj42289-a.reston1.va.home.com> <008c01bfe7b9$3bd760c0$7cac1218@reston1.va.home.com> Message-ID: <396569A2.E503DBF5@prescod.net> tpassin@home.com wrote: > > ... > > * Passing along the prefix could be very useful for some people, but it is > basically a convenience. **** I want to stress the following point. **** The prefix is necessary for the DOM and for XPath as well as for all specs that use XPath: XPointer, XLink, XSLT and Schematron. Let me say it another way: it is the right of a person writing an application based on the DOM, XPath or XSLT to work entirely based on rawnames or even just prefixes. It is an exaggeration to say that there is "no way to know the prefix." It is relatively easy to hard-code prefixes in your DTDs, and thus enforce consistent usage of them in documents. -- Paul Prescod - Not encumbered by corporate consensus Pop stars come and pop stars go, but amid all this change there is one eternal truth: Whenever Bob Dylan writes a song about a guy, the guy is guilty as sin. - http://www.nj.com/page1/ledger/e2efc7.html From tan@pobox.com Sat Jul 8 23:20:34 2000 From: tan@pobox.com (tan@pobox.com) Date: Sat, 8 Jul 2000 15:20:34 -0700 Subject: [XML-SIG] curses xml browser/editor released Message-ID: <20000708152034.G9825@tan.powerandlove.com> folks, I've put a python version of my (formerly perl) XML browser/editor at http://www.powerandlove.com/software/xml_browser I wrote this because I wanted a character-cell (text user interface) XML browser (with node editor) that worked like Lynx in its arrow key navigation mode. It uses PyNcurses and 4DOM-XML-Sig (which I believe is is the same as 4DOM 0.9.3). constraints on the xml files this browser works with This browser will display, and allow the user to browse all of, any XML file which contains only element, attribute and text nodes. (The perl version displays the attributes - I'll soon add that feature to the python version.) Example: the browser will display the following XML document (minus the annotations): living things
    animate objects <-- the title of a branch
      plants <-- the title of a branch
    • trees
    • <-- leaf node
      animals
    • dogs
    • cats
as: living things (black text) animate objects (red text) If the user presses the right arrow s/he'll see: * animate objects (black text) plants (red text - currently selected link) animals (blue text - unselected link) If s/he presses the down arrow once , s/he will see: * animate objects plants (blue text - unselected link) animals (red text - currently selected link) If the user presses the right arrow s/he'll see: * animals (black text - the name of the parent) * dogs (black text - the value of a text node) * cats (black text - the value of a text node) To go up to the parent (animals), the user presses the left arrow. I.e., you navigate using your arrow keys, as in Lynx: left go up one level (to the current node's parent) up move the cursor to the next higher link down move the cursor to the next lower link right go to the node the current link points to paging p previous page n next page Pressing 'e' at any level will allow you to edit a node in that level. branch nodes - the browser uses the value of a branch's text node as the branch's title if it exists, else it uses the branch's tag name tags - one tag is the same as the next to the browser - e.g., the value of a text node of an element whose tag is 'title' will not be treated as a title I'd love to have help with this project. For instance, it'd be nice to have a version for MS-DOS. (I know nothing about DOS character-cell programming.) Tom -- Tom Newman tan@pobox.com From paul@prescod.net Sun Jul 9 17:45:08 2000 From: paul@prescod.net (Paul Prescod) Date: Sun, 09 Jul 2000 11:45:08 -0500 Subject: [XML-SIG] XPath in Python 2 Message-ID: <3968AC14.E961B271@prescod.net> Python is delayed and we don't know how long it will be so. It would not hurt to look at the feasibility of adding XPath support while we wait. A decent XPath implementation should work on any DOM implementation and probably would not require changes to the implementation. In other words, XPath implementation should not interfere with our readiness to build Python 1.6 (or 2.0) at the drop of a hat. If we finish it and test it before the beta period starts, then we could put it in. Why XPath? XPath is the W3C-provided mechanism for navigating XML documents in a declarative way. That means that rather than specifying an exact path to a node, you describe the relationship between the node you are on and the node you want to get to. This makes the creation of complex applications much easier and allows for more efficiency "under the hood" of the XPath implementation. The two basic features of an XPath implementation are selecting and matching. Selecting takes a node or nodelist and an XPath and returns a nodelist of related nodes. Matching takes a node and an XPath and returns a boolean if the node "matches" the nodelist (i.e. if there is a node in the document such that the nodelist succeeds). We already have two XPath implementations: PyXpath and 4XPath. PyXPath's code is very complicated and I have had trouble in the past optimizing it and using it for matching (as opposed to selecting). I don't know if anyone is interested enough to go in and add those features. 4XPath is cleaner from a user's point of view, but it requires a lot bit of C/lex code for parsing the XPaths. I don't know if we would have to go back to the BDFL to get permission for that code to go into Python. We also have the option of creating a new XPath implementation also. The primary virtue of doing so would be the opportunity to implement a tiny subset of XPath in a much smaller amount of code. The two existing implementations probably have more code than the rest of the Python 1.6 XML package. And in 4XPath's case, a lot of that is C code. --- My feeling is that implementing 10% of XPath in 10% of the code would get us 80% of the benefit. Those that need the rest can download 4XPath. I also think that 4XPath should be part of the pyxml distribution. The 10% that is most interesting: * a/b/c * a//b * ../ Actually, that's probably not even 10% and it can be "parsed" mostly with a "string.split" on "/". Things like positional predicates can be implemented with Python sequence syntax. Attribute access can use DOM syntax. All in all, this looks like an afternoon's work, if we agree that it should go into Python. -- Paul Prescod - Not encumbered by corporate consensus "Computer Associates is expected to come in with better than expected earnings." Bob O'Brien, quoted in - http://www.fool.com/news/2000/foth000316.htm From paul@prescod.net Sun Jul 9 17:45:28 2000 From: paul@prescod.net (Paul Prescod) Date: Sun, 09 Jul 2000 11:45:28 -0500 Subject: [XML-SIG] Attribute handling Message-ID: <3968AC28.472F8984@prescod.net> I think that we have enough feedback to allow Lars to choose the right data structures for callbacks, if he is willing. Here is a summary of the attribute choices that have been discussed: a) [(uri,localname):(rawname/prefix,value),...] Pro: in some case, it is directly usable for "application-level" code. Con: in many cases, it is not, and must be immediately converted to another data structure -- which would be more efficient if it were a sequence, not a dict b) [((uri,localname),rawname/prefix),value),...] Pro: easier (and more efficient) to directly copy into a more convenient data structure Con: less often directly useful for application-level code. Requires a wrapper or conversion more often. c) (some optimized attribute-specific datatype) Pro: optimized for what we need and does not require copying Con: who writes it? who maintains it? complicates things too much? d) AttributeList Pro: already implemented Con: least efficient because it is written in Python It is possible to migrate from any of the solutions to c) without breaking code but other migrations are more tricky. -- Paul Prescod - Not encumbered by corporate consensus "Computer Associates is expected to come in with better than expected earnings." Bob O'Brien, quoted in - http://www.fool.com/news/2000/foth000316.htm From tpassin@home.com Mon Jul 10 02:52:02 2000 From: tpassin@home.com (tpassin@home.com) Date: Sun, 9 Jul 2000 21:52:02 -0400 Subject: [XML-SIG] SAX Namespaces References: <3961488E.68099597@prescod.net><20000703202458.J29590@lyra.org><20000704025655.X29590@lyra.org><20000706030104.E29590@lyra.org><3964ADF0.8148FF9B@prescod.net><14692.47010.265123.429300@cj42289-a.reston1.va.home.com><3964E3AD.65779914@prescod.net><14692.60229.384642.526502@cj42289-a.reston1.va.home.com><005e01bfe7b3$1449b400$7cac1218@reston1.va.home.com> <14693.13514.654867.361086@cj42289-a.reston1.va.home.com> <008c01bfe7b9$3bd760c0$7cac1218@reston1.va.home.com> <396569A2.E503DBF5@prescod.net> Message-ID: <001e01bfea11$7608e020$7cac1218@reston1.va.home.com> Paul Prescod wrote - > tpassin@home.com wrote: > > * Passing along the prefix could be very useful for some people, but it is > > basically a convenience. > > **** I want to stress the following point. **** > > The prefix is necessary for the DOM and for XPath as well as for all > specs that use XPath: XPointer, XLink, XSLT and Schematron. > > Let me say it another way: it is the right of a person writing an > application based on the DOM, XPath or XSLT to work entirely based on > rawnames or even just prefixes. It is an exaggeration to say that there > is "no way to know the prefix." It is relatively easy to hard-code > prefixes in your DTDs, and thus enforce consistent usage of them in > documents. > I'm not really disputing that current apps may use the prefix as the preferred NS designator. I am pointing out that the Recs call for URI/localname pairs as the key identifiers. From the Candidate Rec for Level 2 DOM: "On the contrary, the DOM Level 2 methods related to namespaces, identify attribute nodes by their namespace URI and localName." From the XPath CR: "Some types of node also have an expanded-name, which is a pair consisting of a local part and a namespace URI. The local part is a string. The namespace URI is either null or a string. ...Two expanded-names are equal if they have the same local part, and either both have a null namespace URI or both have non-null namespace URIs that are equal." From the Namespace Rec: "Note that the prefix functions only as a placeholder for a namespace name. Applications should use the namespace name, not the prefix, in constructing names whose scope extends beyond the containing document." The XLink CR does not mention prefixes or namespaces at all, except with regard to the meaning of the xlink: namespace. From the XSLT rec: "If it has a prefix, then the prefix is expanded into a URI reference using the namespace declarations in effect on the attribute in which the name occurs. The expanded-name consisting of the local part of the name and the possibly null URI reference is used as the name of the object." From the XPointer CR: "For example, if there is an XML document containing an element ex:y that is in the scope of a namespace declaration xmlns:ex="http://example.com/foo", then the following XPointer will work properly if it appears in the scope of this declaration: xpointer(//ex:y) If this XPointer is moved or copied to an XML document where such a namespace declaration is not in force or to a non-XML document, it can still address the desired element properly if it is transformed to the following form: xpointer(//*[local-name()='y' and namespace-uri()='http://example.com/bar'])" This example shows that a prefix **can** be used in a path expression, but it doesn't **have** to be used. The URI can evidently always be used. So if we want to directly support the DOM (level 2, at least), XPATH, XSLT, and XPointer the way these recs say they are supposed to work, the primary emphasis will be on (URI,localname). I can't speak for Schematron. If our current apps don't work like this, maybe we want to look at them again. Please note, I'm only addressing tuning the apps to directly support the current recs/CRs. I'm not saying they shouldn't be able to provide or use prefixes. Cheers, Tom Passin From paul@prescod.net Mon Jul 10 04:38:27 2000 From: paul@prescod.net (Paul Prescod) Date: Sun, 09 Jul 2000 22:38:27 -0500 Subject: [XML-SIG] SAX Namespaces References: <3961488E.68099597@prescod.net><20000703202458.J29590@lyra.org><20000704025655.X29590@lyra.org><20000706030104.E29590@lyra.org><3964ADF0.8148FF9B@prescod.net><14692.47010.265123.429300@cj42289-a.reston1.va.home.com><3964E3AD.65779914@prescod.net><14692.60229.384642.526502@cj42289-a.reston1.va.home.com><005e01bfe7b3$1449b400$7cac1218@reston1.va.home.com> <14693.13514.654867.361086@cj42289-a.reston1.va.home.com> <008c01bfe7b9$3bd760c0$7cac1218@reston1.va.home.com> <396569A2.E503DBF5@prescod.net> <001e01bfea11$7608e020$7cac1218@reston1.va.home.com> Message-ID: <39694533.40D76403@prescod.net> tpassin@home.com wrote: > > ... > > I'm not really disputing that current apps may use the prefix as the > preferred NS designator. I am pointing out that the Recs call for > URI/localname pairs as the key identifiers. Current RECs call for URI/localname pairs as identifiers. They also call for rawnames as identifiers. > >From the Candidate Rec for Level 2 DOM: > > "On the contrary, the DOM Level 2 methods related to namespaces, identify > attribute nodes by their namespace URI and localName." Right. And the methods unrelated to namespaces identify them by rawname. > >From the XPath CR: > "Some types of node also have an expanded-name, which is a pair consisting > of a local part and a namespace URI. The local part is a string. The > namespace URI is either null or a string. ...Two expanded-names are equal if > they have the same local part, and either both have a null namespace URI or > both have non-null namespace URIs that are equal." That's the definition of expanded name. There is also a definition of "name" which is equivalent to "rawname". We need both. XSLT, XPointer, XLink, etc. inherit the behavior from XPath. > Please note, I'm only addressing tuning the apps to directly support the > current recs/CRs. I'm not saying they shouldn't be able to provide or use > prefixes. I'm not sure what you are suggesting concretely. You and I agree that most of the current applications allow you to work based on the rawname or the URI/localname pair. Therefore we need three pieces of information. How do you suggest we should represent them? -- Paul Prescod - Not encumbered by corporate consensus "Computer Associates is expected to come in with better than expected earnings." Bob O'Brien, quoted in - http://www.fool.com/news/2000/foth000316.htm From tpassin@home.com Mon Jul 10 05:35:53 2000 From: tpassin@home.com (tpassin@home.com) Date: Mon, 10 Jul 2000 00:35:53 -0400 Subject: [XML-SIG] SAX Namespaces References: <3961488E.68099597@prescod.net><20000703202458.J29590@lyra.org><20000704025655.X29590@lyra.org><20000706030104.E29590@lyra.org><3964ADF0.8148FF9B@prescod.net><14692.47010.265123.429300@cj42289-a.reston1.va.home.com><3964E3AD.65779914@prescod.net><14692.60229.384642.526502@cj42289-a.reston1.va.home.com><005e01bfe7b3$1449b400$7cac1218@reston1.va.home.com> <14693.13514.654867.361086@cj42289-a.reston1.va.home.com> <008c01bfe7b9$3bd760c0$7cac1218@reston1.va.home.com> <396569A2.E503DBF5@prescod.net> <001e01bfea11$7608e020$7cac1218@reston1.va.home.com> <39694533.40D76403@prescod.net> Message-ID: <00f401bfea28$55382ec0$7cac1218@reston1.va.home.com> Paul Prescod said - (does this look like a put-up-or-shut-up??? :) ) > I'm not sure what you are suggesting concretely. You and I agree that > most of the current applications allow you to work based on the rawname > or the URI/localname pair. Therefore we need three pieces of > information. How do you suggest we should represent them? > -- OK, I'll take a shot at it. The DOM level 2 NS-specific calls want to see URI and localname. We should use a tuple (uri,localname). Some of the NS calls also want to see a prefix or a rawname (same as 'qualified name', as I understand it), and sometimes both. I propose that the rawname always be supplied, and the prefix always be computed. This seems to fit best the pattern of the DOM calls, and also using DOM 1 calls with no namespaces. Or in other words, I'm guessing :) the prefix won't be needed as often as the rawname, so it is just as well to compute it if needed. So the name could be A) A tuple, ((uri,localname),rawname), B) an object with attributes for each of these things, or C) a dictionary. We've heard support for just about everything. An object could be attractive, since it could have methods to do all the mix-and match we want. But it seems to me that the tuple - approach A - is the simplest. The main thing is to agree whether the prefix (or rawname, or neither) should be computed rather than included explicitly in the name. From the point of view of no redundance, the prefix should be included and the rawname should be omitted. But I think, as Paul suggested, that the usefulness of the rawname says that it should be included instead of the prefix. So my vote is for A). Paul, is this concrete enough? :-) Have at it, blast away. Cheers, Tom Passin From paul@prescod.net Mon Jul 10 07:30:39 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 10 Jul 2000 01:30:39 -0500 Subject: [XML-SIG] Tiny XPath implementation Message-ID: <39696D8F.AB7DD943@prescod.net> Okay, I took my own challenge and implemented a small subset of xpath. I call it tinyxpath (minixpath was taken :) ). It does the 10% of XPath that people use 90% of the time and leaves out many features that are easier done in Python code rather than directly in XPath. It is less than 250 lines of code. Here is an example of tinyxpath code, using Hamlet. Pretty much all of the useful features are shown in these examples. Conspicuously absent are qualifiers and attributes but I propose that Python fetuares should be used for those (as shown) tinyxpath.select( dom, "/PLAY/ACT/TITLE/text()" ) # get a list of text within all ACT TITLEs in Hamlet tinyxpath.select( dom, ".//TITLE/text()" ) # text within all title nodes in Hamlet (no matter how deep) tinyxpath.select( dom, ".//TITLE/text()" )[1] # get first text node in title in Hamlet (no matter how deep) tinyxpath.select( dom, ".//TITLE" )[1].getAttribute( "french_trans" ) # french_trans attr of first title in Hamlet (no matter how deep) tinyxpath.select( dom, ".//TITLE/.." ) # Find titles and then get a list of their parent nodes tinyxpath.select( dom, "PLAY/*/TITLE" ) # all titles once removed from play tinyxpath.select( dom, "PLAY/*/*" ) # all elements once removed from play tinyxpath.select( dom, "PLAY/*/node()" ) # all nodes once removed from play tinyxpath.select( dom, "/" ) # the root node tinyxpath.select( dom, "/PLAY" ) # the root element node = tinyxpath.select( dom, "PLAY/ACT/TITLE" ) # set up variable for the rest assert tinyxpath.match( node, "TITLE" )==1 assert tinyxpath.match( node, "ACT/TITLE" )==1 assert tinyxpath.match( node, "PLAY//TITLE" )==1 assert tinyxpath.match( node, "/PLAY/ACT/TITLE" )==1 assert tinyxpath.match( node, "PLAY/*/TITLE" )==1 assert tinyxpath.match( node, "PLAY/FOO/TITLE" )==0 assert tinyxpath.match( node, "//TITLE" )==1 -- Paul Prescod - Not encumbered by corporate consensus "Computer Associates is expected to come in with better than expected earnings." Bob O'Brien, quoted in - http://www.fool.com/news/2000/foth000316.htm From larsga@garshol.priv.no Mon Jul 10 10:01:16 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 10 Jul 2000 11:01:16 +0200 Subject: [XML-SIG] Attribute handling In-Reply-To: <3968AC28.472F8984@prescod.net> References: <3968AC28.472F8984@prescod.net> Message-ID: * Paul Prescod | | I think that we have enough feedback to allow Lars to choose the | right data structures for callbacks, if he is willing. I agree, and I am willing. I'm working on getting other things out of the way so that I can concentrate on this for a while. Hopefully I can start tomorrow. The summary looks good and corresponds pretty well to my own perception of the debate. --Lars M. From ht@cogsci.ed.ac.uk Mon Jul 10 14:37:14 2000 From: ht@cogsci.ed.ac.uk (Henry S. Thompson) Date: 10 Jul 2000 14:37:14 +0100 Subject: [XML-SIG] XPath in Python 2 In-Reply-To: Paul Prescod's message of "Sun, 09 Jul 2000 11:45:08 -0500" References: <3968AC14.E961B271@prescod.net> Message-ID: Here's an existing implementation of your 10%, plus a bit more (..../@foo). It's open source [1], available as part of the XSV XML Schema validator [2]. It's currently operating against an XML substrate provided by our PyLTXML embedding in Python [3] of our LT XML API [4]. PyLTXML will itself become open source in the next few weeks. Enjoy. ht # Copyright (C) 2000 LTG -- See accompanying COPYRIGHT and COPYING files import string import types import XML class XPath: def __init__(self,str): self.str=str self.pats=self.parse(str) def parse(self,str): disjuncts=map(lambda s:string.split(s,'/'),string.split(str,'|')) # weird result for // return map(lambda d,ss=self:map(lambda p,s=ss:s.patBit(p), d), disjuncts) def patBit(self,part): # TODO: handle namespaces if part=='': # // in string return None elif part=='.': return lambda e:[e] elif part[0]=='@': return lambda e,y=None,s=self,a=part[1:]:s.attrs(e,a,y) else: b=string.find(part,'[') if b>-1: f=string.find(part,']') return lambda e,y=None,s=self,n=part[0:b],m=self.patBit(part[b+1:f]):s.children(e,n,y,m) else: return lambda e,y=None,s=self,n=part:s.children(e,n,y) def find(self,element): res=[] for pat in self.pats: sub=self.process(element,pat) if sub: res=res+sub if res: return res else: return None def find1(self,nodelist,pat): res=[] for e in nodelist: sub=self.process(e,pat) if sub: res=res+sub if res: return res else: return None def process(self,element,pat): pe=pat[0] if pe: res=pe(element) else: # None means descendant, side effect of split is two Nones in first place if pat[1]: pat=pat[1:] else: pat=pat[2:] res=pat[0](element,1) if not res: return None if len(pat)>1: return self.find1(res,pat[1:]) else: return res def attrs(self,element,aname,anywhere): # assume this is the end of the line if element.attrs.has_key(aname): res=[element.attrs[aname].value] else: res=None if anywhere: for c in element.children: if isinstance(c,XML.Element): sr=self.attrs(c,aname,1) if sr: if res: res=res+sr else: res=sr return res def children(self,element,cname,anywhere,subPat=None): # trickier, we need to stay in control # TODO: handle namespaces!!! res=[] for c in element.children: if isinstance(c,XML.Element): if c.local==cname: if (not subPat) or subPat(c): res.append(c) if anywhere: sr=self.children(c,cname,1,subPat) if sr: if res: res=res+sr else: res=sr if res: return res else: return None ht [1] http://dev.w3.org/cvsweb/xmlschema/xpath.py [2] http://www.ltg.ed.ac.uk/~ht/xsv-status.html [3] ftp://ftp.cogsci.ed.ac.uk/pub/ht/PyLTXML12.EXE [4] http://www.ltg.ed.ac.uk/software/xml/ -- Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh W3C Fellow 1999--2001, part-time member of W3C Team 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk URL: http://www.ltg.ed.ac.uk/~ht/ From faassen@vet.uu.nl Mon Jul 10 15:49:44 2000 From: faassen@vet.uu.nl (Martijn Faassen) Date: Mon, 10 Jul 2000 16:49:44 +0200 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: <00a701bfdfc7$7fcb9e80$7cac1218@reston1.va.home.com> References: <3953A717.5289DCC8@digicool.com> <14675.49532.375238.979659@mailhost.beopen.com> <3953E20C.6239D946@prescod.net> <39576435.36751D54@digicool.com> <395788A6.8C815B57@FourThought.com> <395798C5.682588A@digicool.com> <3957A920.C8A6CB76@FourThought.com> <3957B720.9C6768D6@digicool.com> <00a701bfdfc7$7fcb9e80$7cac1218@reston1.va.home.com> Message-ID: <20000710164944.A1262@vet.uu.nl> Hi folks, Sorry for the late followup, but since I'm interested in DOM in Zope, and am implementing a DOM (on top of MetaKit) as well, I think I can jump in. tpassin@home.com wrote: [snip] > > > Are you proposing all access through functions? > > Yes. > > I second this. All access through functions would be nice for me too. Dealing with attributes in the implementation is a major pain. I always wondered why 4DOM took the attribute approach when doing my implementation. The caching attributes argument in my case does not at all apply; all data is in fact stored in MetaKit tables; the Python DOM objects come and go. I also avoid circular references this way. Regards, Martijn From faassen@vet.uu.nl Mon Jul 10 15:55:44 2000 From: faassen@vet.uu.nl (Martijn Faassen) Date: Mon, 10 Jul 2000 16:55:44 +0200 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: <39589C3F.79599D64@prescod.net> References: <3953A717.5289DCC8@digicool.com> <14675.49532.375238.979659@mailhost.beopen.com> <3953E20C.6239D946@prescod.net> <39576435.36751D54@digicool.com> <395788A6.8C815B57@FourThought.com> <395798C5.682588A@digicool.com> <3957A920.C8A6CB76@FourThought.com> <3957B720.9C6768D6@digicool.com> <00a701bfdfc7$7fcb9e80$7cac1218@reston1.va.home.com> <39589C3F.79599D64@prescod.net> Message-ID: <20000710165544.B1262@vet.uu.nl> Paul Prescod wrote: Answers from my perspective (MetaKit DOM, also Zope): > Attributes: > * arguably more Pythonic (=easier to use) Agreed, though possibly harder to deal with security-wise. > * faster for non-computed attributes In MetaKit DOM, all attributes are stored in the MetaKit backend. So for my particular DOM implementation this argument doesn't hold. > * slower for computed attributes > * more like Javascript, VB and COM-like languages (C# :) ) > > Methods: > * slower for non-computed attributes In the case of MetaKit dom this argument does not apply. > * faster for computed attributes > * harder to implement I really don't understand this. Methods are really far easier to implement than computed attributes! > * more like Java * Easier to supply existing objects with DOM interface. > There are no killer arguments here, just different weights applied to > the various features. I don't think that we are going to agree to break > code today. Maybe later we'll see that there are more DOM implementors > than clients and their ease of implementation will take precedence. Here's another one. :) Anyway, I already implemented computed attributes through __getattr__() (roughly following 4DOM, though I read in this thread they changed this later). In my case it's not a huge problem, as my python objects are basically proxies on top of the underlying database (so I don't have to integrate with existing objects, like you'd want to do in Zope). Regards, Martijn From faassen@vet.uu.nl Mon Jul 10 15:58:56 2000 From: faassen@vet.uu.nl (Martijn Faassen) Date: Mon, 10 Jul 2000 16:58:56 +0200 Subject: [XML-SIG] Ugh! Why are DOM access methods spelled with a leading '_'? In-Reply-To: <14680.64297.110297.762192@lindm.dm> References: <3953A717.5289DCC8@digicool.com> <3957B720.9C6768D6@digicool.com> <14680.64297.110297.762192@lindm.dm> Message-ID: <20000710165856.C1262@vet.uu.nl> Dieter Maurer wrote: > Jim Fulton writes: > > We seem to be arguing two issues: > > > > - Whether to expose DOM attributes as Python attributes or > > accessor functions, and > > > > - How to spell the accessor functions. > > > > If we go with accessor functions, which I think would be > > a good idea, then the accessor functions should be > > names in a way that is consistent with Python practice. > > Python, unlike Zope, does *not* treat *methods/attributes* > with leading '_' specially. > Only objects in modules with names starting with a '_' > are in some way treated as private. While the Python implementation doesn't treat _ methods and attributes specially, the Python lore definitely does. At least, I use _ in Python code to indicate 'treat this as private'. I certainly was puzzled by the _get_ and _set_ in the DOM when I first saw it (I hadn't heard the IDL explanation). Regards, Martijn From faassen@vet.uu.nl Mon Jul 10 16:06:02 2000 From: faassen@vet.uu.nl (Martijn Faassen) Date: Mon, 10 Jul 2000 17:06:02 +0200 Subject: [XML-SIG] The '_' thingy In-Reply-To: <3958BA2C.92E85930@digicool.com> References: <39578DBA.8FB64449@FourThought.com> <3957DBDF.40792117@digicool.com> <3957F375.B8A79E@FourThought.com> <3958BA2C.92E85930@digicool.com> Message-ID: <20000710170602.D1262@vet.uu.nl> Jim Fulton wrote: > Mike Olson wrote: [snip] > > Jim, I don't see your arguements. > > > > How is n.firstChild less efficent the n.get_firstChild() ? > > It's not if you constrain the implementation to store the first > child. If an implemantaion chooses not to store the first > child indepenent of the chidren, then the implementatin must > implement __getattr__. It's worse for (the few) settable attributes, > because the implementation *must* implement the attributes as stored > attributes or implement __setattr__. Agreed here from the MetaKit DOM perspective. As I pointed out elsewhere, the actual data is stored in tables in my DOM implementation, and therefore if there are attributes, they all have to be computed. And as said elsewhere, I'm for accessor methods. Regards, Martijn From Mike.Olson@fourthought.com Mon Jul 10 17:33:39 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Mon, 10 Jul 2000 10:33:39 -0600 Subject: [XML-SIG] SAX namespaces discussion status References: Message-ID: <3969FAE3.F871AA4@FourThought.com> Lars Marius Garshol wrote: +1 for option #1 for me. Mike > > I feel a need to summarize where the discussion stands and what needs > to be done, hence this posting. Basically, we have a disagreement on > how namespace names should be represented in SAX 2.0. My feeling is > that since the organization of the API is changing anyway because of > the incorporation into Python 1.6/2.0 we should make sure we have at > least rough consensus now before moving on. > > Paul listed four alternatives (the fifth seems to be identical with > #4). Here is my, slightly modified, version of that list. The qname or > prefix discussion we can leave for later, since it is really > orthogonal to the name representation issue. > > #1. def startElement( self, (uri, name), qname, attrs ): > When namespace processing is off, (uri, name) is just the raw > name instead. > > #2. def startElement( self, (uri,localname, qname), attrs ): > > #3. def startElement( self, ((uri, localname), qname), atrs ): > > #4. def startElement( self, name, attrs ): > Depending on whether you have turned on namespace processing, > "name" is # either "string" or (uri,localname,qname) > > #1 is here the current SAX 2.0 interface and #2 is what Paul > implemented for Python 2.0. As near as I can tell, current positions > are: > > - me: #1 > - Paul: #2 > - Greg: #1 or #3 > - Uche: #1, pending further discussion > > The reasons I prefer #1 are that > > - it collects the logical name (in both the namespace view and the > XML 1.0 view) into a single value, which seems like The Right Thing > to me > > - it is easier to understand how to use this API correctly for > novices > > - it is easier for programmers who use the SAX 2.0 interface directly. > I do this all the time, and I believe others will do the same, so > for me this is an important consideration. > > As near as I can tell, these are Paul's arguments against it: > > - it breaks backwards compatibility > > - SAX convenience is not important > > - performance for higher layers > > Below are my responses to his arguments: > > I don't think the backwards compatibility argument carries much > weight. Names have changed anyway, and in rewriting the code adapting > the startElement / endElement methods is very little work. At least > it was for me, and I've rewritten heaps of example code for my book > for just this. > > I think SAX convenience matters, but I agree that convenience > arguments carry less weight. However, to me this is also a matter of > rightness. In the namespace view, element names consist of two parts: > URI and local name. The #1 representation reflects that very clearly, > while #2 obscures it. > > Performance does of course matter, but I don't see how #2 improves it. > The necessary information is available in both #1 and #2, and access > to it is more or less identical. If the problem is that extracting > the information from the Attributes interface is too slow, then let us > look into what is needed and see how we can best provide that. > > Hoping to settle this issue once and for all, > > --Lars M. > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Mike.Olson@fourthought.com Mon Jul 10 17:38:00 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Mon, 10 Jul 2000 10:38:00 -0600 Subject: [XML-SIG] SAX namespaces discussion status References: <200007041347.HAA16154@localhost.localdomain> Message-ID: <3969FBE8.7909DC22@FourThought.com> Uche Ogbuji wrote: > > > I tend to side more with Greg on this matter: I'd rather have the prefix split > out for me. 4XPath and 4XSLT are absolutely littered with SplitQName() calls > that would be somewhat reduced in this case. However, with Dom we would just need to re-create the qname when we need it. If we have it available, why not pass it along as well. #interface #100 def startElement( self, (uri, name), (prefix,qname), attrs ): or some variation of this.... Mike > > So deciding all over again, 5 and 8 both look attractive. As Greg says, 8's > modes can make genericizing SAX handlers (say for filters) tricky. But on the > other hand, there would have to be a raft of conditionals for processing 5 > generically. > > In the end, though, my leaning would be towards 5. > > -- > Uche Ogbuji Principal Consultant > uche.ogbuji@fourthought.com +01 303 583 9900 x 101 > Fourthought, Inc. http://Fourthought.com > 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA > Software-engineering, knowledge-management, XML, CORBA, Linux, Python > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From ken@bitsko.slc.ut.us Mon Jul 10 17:28:18 2000 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 10 Jul 2000 11:28:18 -0500 Subject: [XML-SIG] SAX Namespaces In-Reply-To: 's message of "Mon, 10 Jul 2000 00:35:53 -0400" References: <3961488E.68099597@prescod.net> <20000703202458.J29590@lyra.org> <20000704025655.X29590@lyra.org> <20000706030104.E29590@lyra.org> <3964ADF0.8148FF9B@prescod.net> <14692.47010.265123.429300@cj42289-a.reston1.va.home.com> <3964E3AD.65779914@prescod.net> <14692.60229.384642.526502@cj42289-a.reston1.va.home.com> <005e01bfe7b3$1449b400$7cac1218@reston1.va.home.com> <14693.13514.654867.361086@cj42289-a.reston1.va.home.com> <008c01bfe7b9$3bd760c0$7cac1218@reston1.va.home.com> <396569A2.E503DBF5@prescod.net> <001e01bfea11$7608e020$7cac1218@reston1.va.home.com> <39694533.40D76403@prescod.net> <00f401bfea28$55382ec0$7cac1218@reston1.va.home.com> Message-ID: writes: > Paul Prescod said - (does this look like a put-up-or-shut-up??? :) ) > > > I'm not sure what you are suggesting concretely. You and I agree > > that most of the current applications allow you to work based on > > the rawname or the URI/localname pair. Therefore we need three > > pieces of information. How do you suggest we should represent > > them? > The DOM level 2 NS-specific calls want to see URI and localname. We > should use a tuple (uri,localname). Some of the NS calls also want > to see a prefix or a rawname (same as 'qualified name', as I > understand it), and sometimes both. I propose that the rawname > always be supplied, and the prefix always be computed. This seems > to fit best the pattern of the DOM calls, and also using DOM 1 calls > with no namespaces. Or in other words, I'm guessing :) the prefix > won't be needed as often as the rawname, so it is just as well to > compute it if needed. > > So the name could be > > A) A tuple, ((uri,localname),rawname), > > B) an object with attributes for each of these things, or > > C) a dictionary. > > We've heard support for just about everything. An object could be > attractive, since it could have methods to do all the mix-and match we want. > But it seems to me that the tuple - approach A - is the simplest. > > The main thing is to agree whether the prefix (or rawname, or neither) > should be computed rather than included explicitly in the name. From the > point of view of no redundance, the prefix should be included and the > rawname should be omitted. But I think, as Paul suggested, that the > usefulness of the rawname says that it should be included instead of the > prefix. > > So my vote is for A). Another option that hasn't been mentioned in a while is for SAX events to pass DOM objects (a la EventDOM [nee EasySAX] and pulldom). DOM objects have a very natural place for all the parts of the namespace issue (raw name/qname, namespace URI, localname) for both elements and attributes. I would vote for the DOM-passing interface as the primary promoted interface, as long as I knew there were "performance interfaces" available for those that realy need the performance. -- Ken From dieter@handshake.de Sun Jul 9 22:35:15 2000 From: dieter@handshake.de (Dieter Maurer) Date: Sun, 9 Jul 2000 23:35:15 +0200 (CEST) Subject: [XML-SIG] XPath in Python 2 In-Reply-To: <3968AC14.E961B271@prescod.net> References: <3968AC14.E961B271@prescod.net> Message-ID: <14696.61320.959668.155239@lindm.dm> Paul Prescod writes: > My feeling is that implementing 10% of XPath in 10% of the code would > get us 80% of the benefit. Those that need the rest can download 4XPath. > I also think that 4XPath should be part of the pyxml distribution. > > The 10% that is most interesting: > > * a/b/c > * a//b > * ../ > > Actually, that's probably not even 10% and it can be "parsed" mostly > with a "string.split" on "/". Things like positional predicates can be > implemented with Python sequence syntax. Attribute access can use DOM > syntax. All in all, this looks like an afternoon's work, if we agree > that it should go into Python. I do not share your preference for tiny little things. I would not go for such a thing, but directly use 4XPath. Dieter From Mike.Olson@fourthought.com Mon Jul 10 18:45:12 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Mon, 10 Jul 2000 11:45:12 -0600 Subject: [XML-SIG] XPath in Python 2 References: <3968AC14.E961B271@prescod.net> Message-ID: <396A0BA8.EB2950F0@FourThought.com> Paul Prescod wrote: > > > 4XPath is cleaner from a user's point of view, but it requires a lot bit > of C/lex code for parsing the XPaths. I don't know if we would have to > go back to the BDFL to get permission for that code to go into Python. Or we could re write the parser in C/python. There has been some talk of doing it with SRE.... > > We also have the option of creating a new XPath implementation also. The > primary virtue of doing so would be the opportunity to implement a tiny > subset of XPath in a much smaller amount of code. The two existing > implementations probably have more code than the rest of the Python 1.6 > XML package. And in 4XPath's case, a lot of that is C code. Actually, very little is in C. The generated C is quite large (you've all seen bison output before), but the "other" c is a very small amount. Almost all of the functionality is written in python.... Mike > > --- > > My feeling is that implementing 10% of XPath in 10% of the code would > get us 80% of the benefit. Those that need the rest can download 4XPath. > I also think that 4XPath should be part of the pyxml distribution. > > The 10% that is most interesting: > > * a/b/c > * a//b > * ../ > > Actually, that's probably not even 10% and it can be "parsed" mostly > with a "string.split" on "/". Things like positional predicates can be > implemented with Python sequence syntax. Attribute access can use DOM > syntax. All in all, this looks like an afternoon's work, if we agree > that it should go into Python. > > -- > Paul Prescod - Not encumbered by corporate consensus > "Computer Associates is expected to come in with better than expected > earnings." Bob O'Brien, quoted in > - http://www.fool.com/news/2000/foth000316.htm > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uogbuji@fourthought.com Mon Jul 10 19:02:59 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 10 Jul 2000 12:02:59 -0600 Subject: [XML-SIG] SAX Namespaces In-Reply-To: Message from Paul Prescod of "Thu, 06 Jul 2000 20:42:33 CDT." <39653589.E559E202@prescod.net> Message-ID: <200007101802.MAA02031@localhost.localdomain> Paul Prescod: > The W3C has decided that it is appropriate in specs "above XML" to query > and navigate based on the prefix even if namespace processing is turned > on. Even if we decided that that decision is questionable here, there is > nothing we can do about it. Minidom (for one) indexes on both qname and > uri/localname pair. The user may use this facility to blow their feet > off but they might also have good reason for doing so. Hmm. I'd be careful with this statement. I hardly think the W3C has a coherent idea how to use prefixes: take, for example, the fact that qnames are expanded differently in XSLT and XPath, which were developed by the very same WG. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uogbuji@fourthought.com Mon Jul 10 19:14:00 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 10 Jul 2000 12:14:00 -0600 Subject: [XML-SIG] XPath in Python 2 In-Reply-To: Message from Paul Prescod of "Sun, 09 Jul 2000 11:45:08 CDT." <3968AC14.E961B271@prescod.net> Message-ID: <200007101814.MAA02054@localhost.localdomain> > Python is delayed and we don't know how long it will be so. Bloody hell! That's what happens when gurus get married. Note: I hate smileys, but I guess I'd better throw one in, just in case: %^) > Why XPath? XPath is the W3C-provided mechanism for navigating XML > documents in a declarative way. That means that rather than specifying > an exact path to a node, you describe the relationship between the node > you are on and the node you want to get to. This makes the creation of > complex applications much easier and allows for more efficiency "under > the hood" of the XPath implementation. I agree that XPath is a big nice-to-have. Microsoft's GetByQuery (or something like that) is a very popular addition to their DOM and IBM et al are being forced to imitate, even in advance of DOM Level 3, which might address query. > 4XPath is cleaner from a user's point of view, but it requires a lot bit > of C/lex code for parsing the XPaths. I don't know if we would have to > go back to the BDFL to get permission for that code to go into Python. Would it be good enough for us just to check in ANSI C code from FLEX/Bison? > We also have the option of creating a new XPath implementation also. The > primary virtue of doing so would be the opportunity to implement a tiny > subset of XPath in a much smaller amount of code. The two existing > implementations probably have more code than the rest of the Python 1.6 > XML package. And in 4XPath's case, a lot of that is C code. > > --- > > My feeling is that implementing 10% of XPath in 10% of the code would > get us 80% of the benefit. Those that need the rest can download 4XPath. OK. > I also think that 4XPath should be part of the pyxml distribution. Already in motion. > The 10% that is most interesting: > > * a/b/c > * a//b > * ../ > > Actually, that's probably not even 10% and it can be "parsed" mostly > with a "string.split" on "/". Things like positional predicates can be > implemented with Python sequence syntax. Attribute access can use DOM > syntax. All in all, this looks like an afternoon's work, if we agree > that it should go into Python. I don't know. I agree that most people don't need XPath's zoo of axes, but I think predicates would be sorely missed very quickly. I should note that other benefits of the mini-xpath -> 4XPath migration would be indexing and extension functions. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From paul@prescod.net Mon Jul 10 19:54:25 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 10 Jul 2000 13:54:25 -0500 Subject: [XML-SIG] SAX Namespaces References: <3961488E.68099597@prescod.net><20000703202458.J29590@lyra.org><20000704025655.X29590@lyra.org><20000706030104.E29590@lyra.org><3964ADF0.8148FF9B@prescod.net><14692.47010.265123.429300@cj42289-a.reston1.va.home.com><3964E3AD.65779914@prescod.net><14692.60229.384642.526502@cj42289-a.reston1.va.home.com><005e01bfe7b3$1449b400$7cac1218@reston1.va.home.com> <14693.13514.654867.361086@cj42289-a.reston1.va.home.com> <008c01bfe7b9$3bd760c0$7cac1218@reston1.va.home.com> <396569A2.E503DBF5@prescod.net> <001e01bfea11$7608e020$7cac1218@reston1.va.home.com> <39694533.40D76403@prescod.net> <00f401bfea28$55382ec0$7cac1218@reston1.va.home.com> Message-ID: <396A1BE1.2A7176B6@prescod.net> tpassin@home.com wrote: > > Paul Prescod said - (does this look like a put-up-or-shut-up??? :) ) I meant it more as "what are you really saying?" :) > Paul, is this concrete enough? :-) Have at it, blast away. Could you describe the signature for a SAX startElement under you plan? -- Paul Prescod - Not encumbered by corporate consensus "Computer Associates is expected to come in with better than expected earnings." Bob O'Brien, quoted in - http://www.fool.com/news/2000/foth000316.htm From paul@prescod.net Mon Jul 10 19:54:31 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 10 Jul 2000 13:54:31 -0500 Subject: [XML-SIG] SAX Namespaces References: <3961488E.68099597@prescod.net><20000703202458.J29590@lyra.org><20000704025655.X29590@lyra.org><20000706030104.E29590@lyra.org><3964ADF0.8148FF9B@prescod.net><14692.47010.265123.429300@cj42289-a.reston1.va.home.com><3964E3AD.65779914@prescod.net><14692.60229.384642.526502@cj42289-a.reston1.va.home.com><005e01bfe7b3$1449b400$7cac1218@reston1.va.home.com> <14693.13514.654867.361086@cj42289-a.reston1.va.home.com> <008c01bfe7b9$3bd760c0$7cac1218@reston1.va.home.com> <396569A2.E503DBF5@prescod.net> <001e01bfea11$7608e020$7cac1218@reston1.va.home.com> <39694533.40D76403@prescod.net> <00f401bfea28$55382ec0$7cac1218@reston1.va.home.com> Message-ID: <396A1BE7.BA20A6A1@prescod.net> tpassin@home.com wrote: > > Paul Prescod said - (does this look like a put-up-or-shut-up??? :) ) I meant it more as "what are you really saying?" :) > Paul, is this concrete enough? :-) Have at it, blast away. Could you describe the signature for a SAX startElement under your plan? -- Paul Prescod - Not encumbered by corporate consensus "Computer Associates is expected to come in with better than expected earnings." Bob O'Brien, quoted in - http://www.fool.com/news/2000/foth000316.htm From andy@reportlab.com Mon Jul 10 20:03:43 2000 From: andy@reportlab.com (Andy Robinson) Date: Mon, 10 Jul 2000 20:03:43 +0100 Subject: [XML-SIG] Wanted - Heroes In-Reply-To: <39451E810014742A@mail.ngi.de> (added by postmaster@mail.ngi.de) Message-ID: > On Wed, 5 Jul 2000 06:25:54 +0100, Andy Robinson wrote: > > >1. write a basic HTML-to-Flowables filter... > > filter == a python script, or a XSLT stylesheet? Python script. WE don't have an XML import format yet; input is minimal HTML, output is a list of reportlab "Flowable" objects (mostly paragraphs). > >6. Finally, the HTML to PDF filter will go in our standard > >library, with your name on it - maybe even next week! > > Do you know of any project using ReportLab to write a FO processor? (FO > == XSL flow objects) > > >From what I see, ReportLab is very nice and would be an ideal base for > a pythonic FO processor, which in turn would be a nice showcase for > Python (the only serious stuff in FO processing is either Java or C++). No, not yet. We're only just finishing documenting our paragraph API this week, so it is a bit early. I personally have my doubts about when FO will stabilise and how practical an object model it is, and a proper implementation imvolves modelling rectangles down to the character level which is not Python's forte. > > Sorry for not using the list, but I currently get it only at work. You > may quote this email there, but please include a CC to this address. Sorry too - I got so busy last week I ignored the XML-sig and only just saw this. - Andy Robinson From paul@prescod.net Mon Jul 10 22:17:36 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 10 Jul 2000 16:17:36 -0500 Subject: [XML-SIG] XPath in Python 2 References: <3968AC14.E961B271@prescod.net> <396A0BA8.EB2950F0@FourThought.com> Message-ID: <396A3D70.B6A3E513@prescod.net> Mike Olson wrote: > > ... > > Or we could re write the parser in C/python. There has been some talk > of doing it with SRE.... I would personally benefit from having a full XPath engine in Python. Let me play the devil's advocate and point out the potential problems. * building: can we add a FLEX/Bison dependency to the Python source? * maintenance: if the FourtThought guys retire to Tahiti after selling out for a million to Red Hat, will someone else be willing to maintain all of that code? In general, Guido seems skittish about taking on large modules/packages. I can't blame him because as BDFL he agrees to keep that code in sync with the rest of Python forever. Expat is only sort of "halfway" in there (in binary, not source distributions) and that took some arm twisting. Plus, ~150K is not a lot of disk space, but if you aren't using XPath at all, you might resent it... I wrote my 250 line tinyxpath to get around these problems. My philosophy was that limited XPath support is better than none. In fact, it was the same motivation for minidom! If I was being unnecessarily conservative, then we should put in the full 4DOM and 4XPath. Let me suggest a parallel approach. We can agree on an API so that whether we put in a tiny version or a full version, the API doesn't change. Meanwhile, we can try to groom 4XPath for inclusion (in 1.6 or 1.7) if the powers that be agree. My personal feeling is that the Flex/Bison dependency is a show stopper. I wouldn't propose it for core status while it is dependent (but I encourage anyone else to do it, if they feel differently). -- Paul Prescod - Not encumbered by corporate consensus "Computer Associates is expected to come in with better than expected earnings." Bob O'Brien, quoted in - http://www.fool.com/news/2000/foth000316.htm From paul@prescod.net Mon Jul 10 22:26:53 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 10 Jul 2000 16:26:53 -0500 Subject: [XML-SIG] XPath in Python 2 References: <3968AC14.E961B271@prescod.net> Message-ID: <396A3F9D.23DA1562@prescod.net> "Henry S. Thompson" wrote: > > Here's an existing implementation of your 10%, plus a bit more > (..../@foo). Cool. Our messages passed each other in the night so that we now have two mini-implementations. I will port some of your ideas into mine (which already works with the DOM). In particular, it seems not to have taken much code to implement qualifiers and attributes. I'll probably do so also. -- Paul Prescod - Not encumbered by corporate consensus "Computer Associates is expected to come in with better than expected earnings." Bob O'Brien, quoted in - http://www.fool.com/news/2000/foth000316.htm From paul@prescod.net Mon Jul 10 22:27:40 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 10 Jul 2000 16:27:40 -0500 Subject: [XML-SIG] XPath in Python 2 References: <200007101814.MAA02054@localhost.localdomain> Message-ID: <396A3FCC.280E610@prescod.net> Uche Ogbuji wrote: > > .. > > > 4XPath is cleaner from a user's point of view, but it requires a lot bit > > of C/lex code for parsing the XPaths. I don't know if we would have to > > go back to the BDFL to get permission for that code to go into Python. > > Would it be good enough for us just to check in ANSI C code from FLEX/Bison? I'm not sure that Guido would go for that because it means that everyone must go back to you for changes, right? > I don't know. I agree that most people don't need XPath's zoo of axes, but I > think predicates would be sorely missed very quickly. It depends on the target audience. You seldom miss what you've never used and most Python users have never used XPath or XSLT. Anyhow, the more important point is that much of what predicates do could be done in Python code. For you and I, that would grate, but if you compare it to what they've got now (even in DOM, Pyxie, qp_xml, etc.) it's not like they are losing something. Positional and attribute existence qualifiers are easy. I could implement those. I just don't want to get dragged into implementing the full expression language! > I should note that other benefits of the mini-xpath -> 4XPath migration would > be indexing and extension functions. Totally. I would expect to use 4XPath for my personal work and in fact I've designed the module so that it can detect and use other XPath engines, especially those that may be tightly integrated with a particular DOM implementation. -- Paul Prescod - Not encumbered by corporate consensus "Computer Associates is expected to come in with better than expected earnings." Bob O'Brien, quoted in - http://www.fool.com/news/2000/foth000316.htm From paul@prescod.net Mon Jul 10 23:01:10 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 10 Jul 2000 17:01:10 -0500 Subject: [XML-SIG] SAX Namespaces References: <3961488E.68099597@prescod.net> <20000703202458.J29590@lyra.org> <20000704025655.X29590@lyra.org> <20000706030104.E29590@lyra.org> <3964ADF0.8148FF9B@prescod.net> <14692.47010.265123.429300@cj42289-a.reston1.va.home.com> <3964E3AD.65779914@prescod.net> <14692.60229.384642.526502@cj42289-a.reston1.va.home.com> <005e01bfe7b3$1449b400$7cac1218@reston1.va.home.com> <14693.13514.654867.361086@cj42289-a.reston1.va.home.com> <008c01bfe7b9$3bd760c0$7cac1218@reston1.va.home.com> <396569A2.E503DBF5@prescod.net> <001e01bfea11$7608e020$7cac1218@reston1.va.home.com> <39694533.40D76403@prescod.net> <00f401bfea28$55382ec0$7cac1218@reston1.va.home.com> Message-ID: <396A47A6.F79ABA26@prescod.net> Ken MacLeod wrote: > > ... > > I would vote for the DOM-passing interface as the primary promoted > interface, as long as I knew there were "performance interfaces" > available for those that realy need the performance. I see this as a future direction also. I tried to get pulldom into the distribution for this reason but haven't got much feedback one way or another. For now, pulldom remains an undocumented implementation detail of minidom. Nevertheless, we can't avoid the job of designing the performance interfaces and that's what we're trying to do with all of this SAX debate. You reminded me though, that at one time I had a plan to build DOM objects write in a C library like PyExpat and bypass SAX altogether. I think I'd rather not do that if SAX can be made fast enough. -- Paul Prescod - Not encumbered by corporate consensus "Computer Associates is expected to come in with better than expected earnings." Bob O'Brien, quoted in - http://www.fool.com/news/2000/foth000316.htm From paul@prescod.net Mon Jul 10 22:59:59 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 10 Jul 2000 16:59:59 -0500 Subject: [XML-SIG] Attribute handling References: <3968AC28.472F8984@prescod.net> Message-ID: <396A475F.EADBC54B@prescod.net> Lars Marius Garshol wrote: > > ... > > I agree, and I am willing. I'm working on getting other things out of > the way so that I can concentrate on this for a while. Hopefully I > can start tomorrow. > > The summary looks good and corresponds pretty well to my own > perception of the debate. Is there another option? We could have completely separate startElement and startElementNS methods? That might be cleanest after all. -- Paul Prescod - Not encumbered by corporate consensus "Computer Associates is expected to come in with better than expected earnings." Bob O'Brien, quoted in - http://www.fool.com/news/2000/foth000316.htm From Mike.Olson@fourthought.com Mon Jul 10 23:57:20 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Mon, 10 Jul 2000 16:57:20 -0600 Subject: [XML-SIG] XPath in Python 2 References: <3968AC14.E961B271@prescod.net> <396A0BA8.EB2950F0@FourThought.com> <396A3D70.B6A3E513@prescod.net> Message-ID: <396A54D0.3B93DD11@FourThought.com> Paul Prescod wrote: > > Mike Olson wrote: > > > > ... > > > > Or we could re write the parser in C/python. There has been some talk > > of doing it with SRE.... > > I would personally benefit from having a full XPath engine in Python. > Let me play the devil's advocate and point out the potential problems. > > * building: can we add a FLEX/Bison dependency to the Python source? We can always include bison and flex generated code so there is no dependnecy on these tools. > * maintenance: if the FourtThought guys retire to Tahiti after selling > out for a million to Red Hat, will someone else be willing to maintain > all of that code? Hey, I thought those talks were confidential :) I vote for supporting both. Too different drivers. If you want Bison/flex it will work, if not you can do it all in python.... I think Bison and Flex will have better performance, atleast for a while... > > In general, Guido seems skittish about taking on large modules/packages. > I can't blame him because as BDFL he agrees to keep that code in sync > with the rest of Python forever. Expat is only sort of "halfway" in > there (in binary, not source distributions) and that took some arm > twisting. > > > Let me suggest a parallel approach. We can agree on an API so that > whether we put in a tiny version or a full version, the API doesn't > change. Meanwhile, we can try to groom 4XPath for inclusion (in 1.6 or > 1.7) if the powers that be agree. My personal feeling is that the > Flex/Bison dependency is a show stopper. I wouldn't propose it for core > status while it is dependent (but I encourage anyone else to do it, if > they feel differently). Then maybe we need to work on the pure python version. That is what I originally started out with, but it was dog slow...... I'll play with it again in my spare time :) I'm not too picky about API, as its quite small. I picture something regexish.... Mike > > -- > Paul Prescod - Not encumbered by corporate consensus > "Computer Associates is expected to come in with better than expected > earnings." Bob O'Brien, quoted in > - http://www.fool.com/news/2000/foth000316.htm > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tpassin@home.com Tue Jul 11 00:05:24 2000 From: tpassin@home.com (tpassin@home.com) Date: Mon, 10 Jul 2000 19:05:24 -0400 Subject: [XML-SIG] Attribute handling References: <3968AC28.472F8984@prescod.net> <396A475F.EADBC54B@prescod.net> Message-ID: <005201bfeac3$54809b20$7cac1218@reston1.va.home.com> Paul Prescod asked - > Lars Marius Garshol wrote: > > > > ... > > > > I agree, and I am willing. I'm working on getting other things out of > > the way so that I can concentrate on this for a while. Hopefully I > > can start tomorrow. > > > > The summary looks good and corresponds pretty well to my own > > perception of the debate. > > Is there another option? We could have completely separate startElement > and startElementNS methods? That might be cleanest after all. > That would match the DOM level 2 approach. I don't recall whether SAX2 ever did it, though. Tom Passin From Mike.Olson@fourthought.com Tue Jul 11 00:04:47 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Mon, 10 Jul 2000 17:04:47 -0600 Subject: [XML-SIG] XPath in Python 2 References: <200007101814.MAA02054@localhost.localdomain> <396A3FCC.280E610@prescod.net> Message-ID: <396A568F.9DAFB572@FourThought.com> Paul Prescod wrote: > > Uche Ogbuji wrote: > > > > .. > > > > > I don't know. I agree that most people don't need XPath's zoo of axes, but I > > think predicates would be sorely missed very quickly. > > It depends on the target audience. You seldom miss what you've never > used and most Python users have never used XPath or XSLT. Anyhow, the > more important point is that much of what predicates do could be done in > Python code. For you and I, that would grate, but if you compare it to > what they've got now (even in DOM, Pyxie, qp_xml, etc.) it's not like > they are losing something. Sure, but you could also do all of it in python. Some of the tings you can do with predicates will be much more efficent in XPath/XPattern, then do a bit in XPath, and a bit in python. however, you wouldn't have access to the internal steps. Imagine /EMPLOYEES/EMPLOYEE[@position="Manager"]/SPOUSE[size(CHILDREN) > 3]/CHILDREN[@age > 18] Without predicates, you would get back a list of children, then you need to see which of those has a parent whose married to a manager, then remove all children who are from families with less then 3 siblings, and then all children with an age less then 18. Somewhat of a contrived example, but hopefully you see my point Mike > -- > Paul Prescod - Not encumbered by corporate consensus > "Computer Associates is expected to come in with better than expected > earnings." Bob O'Brien, quoted in > - http://www.fool.com/news/2000/foth000316.htm > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From stuart.hungerford@zveno.com Tue Jul 11 00:59:05 2000 From: stuart.hungerford@zveno.com (Stuart Hungerford) Date: Tue, 11 Jul 2000 09:59:05 +1000 Subject: [XML-SIG] Re: XPath in Python 2 References: <20000710160113.06A981CF31@dinsdale.python.org> Message-ID: <396A6349.59D49FF5@zveno.com> xml-sig-request@python.org wrote: > Message: 1 > Date: Sun, 09 Jul 2000 11:45:08 -0500 > From: Paul Prescod > To: "xml-sig@python.org" > Subject: [XML-SIG] XPath in Python 2 > > [...] > > Why XPath? XPath is the W3C-provided mechanism for navigating XML > documents in a declarative way. That means that rather than specifying > an exact path to a node, you describe the relationship between the node > you are on and the node you want to get to. This makes the creation of > complex applications much easier and allows for more efficiency "under > the hood" of the XPath implementation. Has anyone looked at Matt Seargent's "XPathScript" (sp?) -- part of the Apache XML delivery stuff? I can see it as a much easier alternative to using pure XSLT except it's all based on Perl. Could something similar be done for the Python community? Maybe it could be based on 4XSLT? Stu From fdrake@beopen.com Tue Jul 11 01:54:39 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Mon, 10 Jul 2000 20:54:39 -0400 (EDT) Subject: [XML-SIG] XPath in Python 2 In-Reply-To: <396A54D0.3B93DD11@FourThought.com> References: <3968AC14.E961B271@prescod.net> <396A0BA8.EB2950F0@FourThought.com> <396A3D70.B6A3E513@prescod.net> <396A54D0.3B93DD11@FourThought.com> Message-ID: <14698.28751.421768.304942@cj42289-a.reston1.va.home.com> Mike Olson writes: > We can always include bison and flex generated code so there is no > dependnecy on these tools. Probably the right thing is to check in both the generated code and the flex/bison specs. This is what we do for the configure script and some other autoconf/autoheader related things. -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From paul@prescod.net Tue Jul 11 03:39:50 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 10 Jul 2000 21:39:50 -0500 Subject: [XML-SIG] XPath in Python 2 References: <3968AC14.E961B271@prescod.net> <396A0BA8.EB2950F0@FourThought.com> <396A3D70.B6A3E513@prescod.net> <396A54D0.3B93DD11@FourThought.com> <14698.28751.421768.304942@cj42289-a.reston1.va.home.com> Message-ID: <396A88F6.8BFDCBE0@prescod.net> "Fred L. Drake, Jr." wrote: > > Mike Olson writes: > > We can always include bison and flex generated code so there is no > > dependnecy on these tools. > > Probably the right thing is to check in both the generated code and > the flex/bison specs. This is what we do for the configure script and > some other autoconf/autoheader related things. Okay, but is it doable? Do you want to ask the BDFL? I would be overjoyed to have a full XPath and even a full DOM in there, if we have that much code "allowance." -- Paul Prescod - Not encumbered by corporate consensus "Computer Associates is expected to come in with better than expected earnings." Bob O'Brien, quoted in - http://www.fool.com/news/2000/foth000316.htm From paul@prescod.net Tue Jul 11 03:43:31 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 10 Jul 2000 21:43:31 -0500 Subject: [XML-SIG] Re: XPath in Python 2 References: <20000710160113.06A981CF31@dinsdale.python.org> <396A6349.59D49FF5@zveno.com> Message-ID: <396A89D3.593977D2@prescod.net> Stuart Hungerford wrote: > > xml-sig-request@python.org wrote: > ... > > I can see it as a much easier alternative to using pure XSLT except it's > all based on Perl. Could something similar be done for the Python > community? Maybe it could be based on 4XSLT? Yes, that would be relatively easy. I think 4XSLT is specifically designed to allow that sort of usage. My "EventDOM" is a similar concept but it has the benefit of allowing you to state which parts of the tree should be fully expanded instead of presuming you want the whole thing. That's temporarily on hold while we get Python 2 worked out. Little by little the pieces are need are migrating into Python 1.6 so it becomes a trivial "glue" job to implement the API itself. -- Paul Prescod - Not encumbered by corporate consensus "Computer Associates is expected to come in with better than expected earnings." Bob O'Brien, quoted in - http://www.fool.com/news/2000/foth000316.htm From gstein@lyra.org Tue Jul 11 04:03:27 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 10 Jul 2000 20:03:27 -0700 Subject: [XML-SIG] auto-support for PyXML Message-ID: <20000710200326.S29590@lyra.org> I just posted to the Patch Manager regarding patch #100705 ("Supporting PyXML"). I think that messing with __path__ is the wrong approach and can lead to problems down the road. A couple weeks ago, I posted an alternative. You can view that at: http://www.python.org/pipermail/xml-sig/2000-June/004512.html Can we get closure/consensus on this and get it implemented? Cheers, -g -- Greg Stein, http://www.lyra.org/ From tpassin@home.com Tue Jul 11 04:13:57 2000 From: tpassin@home.com (tpassin@home.com) Date: Mon, 10 Jul 2000 23:13:57 -0400 Subject: [XML-SIG] SAX Namespaces References: <3961488E.68099597@prescod.net><20000703202458.J29590@lyra.org><20000704025655.X29590@lyra.org><20000706030104.E29590@lyra.org><3964ADF0.8148FF9B@prescod.net><14692.47010.265123.429300@cj42289-a.reston1.va.home.com><3964E3AD.65779914@prescod.net><14692.60229.384642.526502@cj42289-a.reston1.va.home.com><005e01bfe7b3$1449b400$7cac1218@reston1.va.home.com> <14693.13514.654867.361086@cj42289-a.reston1.va.home.com> <008c01bfe7b9$3bd760c0$7cac1218@reston1.va.home.com> <396569A2.E503DBF5@prescod.net> <001e01bfea11$7608e020$7cac1218@reston1.va.home.com> <39694533.40D76403@prescod.net> <00f401bfea28$55382ec0$7cac1218@reston1.va.home.com> <396A1BE1.2A7176B6@prescod.net> Message-ID: <01bc01bfeae6$0e7b4300$7cac1218@reston1.va.home.com> Paul Prescod wrote - > tpassin@home.com wrote: > > Paul, is this concrete enough? :-) Have at it, blast away. > > Could you describe the signature for a SAX startElement under you plan? > -- OK, see below. There was also this post: > Paul listed four alternatives (the fifth seems to be identical with > #4). Here is my, slightly modified, version of that list. The qname or > prefix discussion we can leave for later, since it is really > orthogonal to the name representation issue. > > #1. def startElement( self, (uri, name), qname, attrs ): > When namespace processing is off, (uri, name) is just the raw > name instead. > My suggestion was really the same as this #1 (qname =rawname=prefix:localname). I was just trying to give more support for it. But what should be the value of "uri" when there is no prefix (but namespaces are in use)? It could be either None or an empty string. I recommend "None" because the XPath Rec says that the uri is null if there is no prefix. Cheers, Tom Passin From Mike.Olson@fourthought.com Tue Jul 11 05:00:39 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Mon, 10 Jul 2000 22:00:39 -0600 Subject: [XML-SIG] XPath in Python 2 References: <3968AC14.E961B271@prescod.net> <396A0BA8.EB2950F0@FourThought.com> <396A3D70.B6A3E513@prescod.net> <396A54D0.3B93DD11@FourThought.com> <14698.28751.421768.304942@cj42289-a.reston1.va.home.com> <396A88F6.8BFDCBE0@prescod.net> Message-ID: <396A9BE7.BAF84871@FourThought.com> Paul Prescod wrote: > > "Fred L. Drake, Jr." wrote: > > > > Mike Olson writes: > > Okay, but is it doable? Do you want to ask the BDFL? I would be > overjoyed to have a full XPath and even a full DOM in there, if we have > that much code "allowance." Gotta ask, whats a BDFL? Mike > > -- > Paul Prescod - Not encumbered by corporate consensus > "Computer Associates is expected to come in with better than expected > earnings." Bob O'Brien, quoted in > - http://www.fool.com/news/2000/foth000316.htm > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From gstein@lyra.org Tue Jul 11 10:17:26 2000 From: gstein@lyra.org (Greg Stein) Date: Tue, 11 Jul 2000 02:17:26 -0700 Subject: [XML-SIG] XPath in Python 2 In-Reply-To: <396A9BE7.BAF84871@FourThought.com>; from Mike.Olson@fourthought.com on Mon, Jul 10, 2000 at 10:00:39PM -0600 References: <3968AC14.E961B271@prescod.net> <396A0BA8.EB2950F0@FourThought.com> <396A3D70.B6A3E513@prescod.net> <396A54D0.3B93DD11@FourThought.com> <14698.28751.421768.304942@cj42289-a.reston1.va.home.com> <396A88F6.8BFDCBE0@prescod.net> <396A9BE7.BAF84871@FourThought.com> Message-ID: <20000711021726.Y29590@lyra.org> On Mon, Jul 10, 2000 at 10:00:39PM -0600, Mike Olson wrote: > Paul Prescod wrote: > > > > "Fred L. Drake, Jr." wrote: > > > > > > Mike Olson writes: > > > > Okay, but is it doable? Do you want to ask the BDFL? I would be > > overjoyed to have a full XPath and even a full DOM in there, if we have > > that much code "allowance." > > Gotta ask, whats a BDFL? Benevolent Dictator For Life. aka Guido. :-) -- Greg Stein, http://www.lyra.org/ From paul@prescod.net Tue Jul 11 18:20:33 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 11 Jul 2000 12:20:33 -0500 Subject: [XML-SIG] XPath in Python 2 References: <200007101814.MAA02054@localhost.localdomain> <396A3FCC.280E610@prescod.net> <396A568F.9DAFB572@FourThought.com> Message-ID: <396B5761.7E43E6E7@prescod.net> Mike Olson wrote: > > ... Imagine > > /EMPLOYEES/EMPLOYEE[@position="Manager"]/SPOUSE[size(CHILDREN) > > 3]/CHILDREN[@age > 18] > > Without predicates, you would get back a list of children, then you need > to see which of those has a parent whose married to a manager, then > remove all children who are from families with less then 3 siblings, and > then all children with an age less then 18. > > Somewhat of a contrived example, but hopefully you see my point Yes, but it is basically an issue of cost/benefit. Sometimes it would be cool to have the full power of SQL in Python also, but Gadfly isn't part of the main distribution. I still maintain that the first 10% of XPath gives you a greater than proportional benefit. If we can't figure out how to get the full 100% in, then I think we should go with 10%. -- Paul Prescod - Not encumbered by corporate consensus "Computer Associates is expected to come in with better than expected earnings." Bob O'Brien, quoted in - http://www.fool.com/news/2000/foth000316.htm From paul@prescod.net Tue Jul 11 20:11:32 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 11 Jul 2000 14:11:32 -0500 Subject: [XML-SIG] Proposed XPath API Message-ID: <396B7164.D8CA9111@prescod.net> Here's a start.... import xpath obj=xpath.compile( "....", nsbindings={prefix:uri,prefix:uri}, **flags ) obj.select( node, **flags ) #returns a nodelist obj.select( nodelist, **flags ) #also returns a nodelist obj.match( node, **flags ) #returns a boolean xpath.select( "...", node, nsbindings={prefix:uri,prefix:uri}, **flags ) #convenience method xpath.match( "...", node, nsbindings={prefix:uri,prefix:uri}, **flags ) #convenience method All methods take a **flags parameter as an extension mechanism. For instance 4XPath would use that to pass in extension functions. A particular DOM can also provide an optimized XPath engine. The xpath module will look for "_xpath_compile", "_xpath_select" and "_xpath_match" methods on node objects and delegate calls those methods if they are available. When they are not available, it does the query "externally". -- Paul Prescod - Not encumbered by corporate consensus Simplicity does not precede complexity, but follows it. - http://www.cs.yale.edu/~perlis-alan/quotes.html From Mike.Olson@fourthought.com Tue Jul 11 23:08:18 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Tue, 11 Jul 2000 16:08:18 -0600 Subject: [XML-SIG] Proposed XPath API References: <396B7164.D8CA9111@prescod.net> Message-ID: <396B9AD2.7BA52804@FourThought.com> Paul Prescod wrote: > > Here's a start.... > > import xpath > > obj=xpath.compile( "....", nsbindings={prefix:uri,prefix:uri}, **flags ) > > obj.select( node, **flags ) #returns a nodelist > obj.select( nodelist, **flags ) #also returns a nodelist > > obj.match( node, **flags ) #returns a boolean I think these will have to be 2 different objects. In XPath the difference is Path vs expression. Its a matter of where you start in the EBNF. A path is used to select, and an expression is used to match. What about" obj = xpath.compilePath(....) obj.select(...) and obj = xpath.compileExpression(....) obj.match(....) > > xpath.select( "...", node, nsbindings={prefix:uri,prefix:uri}, **flags ) > #convenience method > xpath.match( "...", node, nsbindings={prefix:uri,prefix:uri}, **flags ) > #convenience method > > All methods take a **flags parameter as an extension mechanism. For > instance 4XPath would use that to pass in extension functions. > > A particular DOM can also provide an optimized XPath engine. The xpath > module will look for "_xpath_compile", "_xpath_select" and > "_xpath_match" methods on node objects and delegate calls those methods > if they are available. When they are not available, it does the query > "externally". Do we want to do this, or hold off a bit till DOM III and things like PAX are finalized? Mike > > -- > Paul Prescod - Not encumbered by corporate consensus > Simplicity does not precede complexity, but follows it. > - http://www.cs.yale.edu/~perlis-alan/quotes.html > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From paul@prescod.net Tue Jul 11 23:36:04 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 11 Jul 2000 17:36:04 -0500 Subject: [XML-SIG] Proposed XPath API References: <396B7164.D8CA9111@prescod.net> <396B9AD2.7BA52804@FourThought.com> Message-ID: <396BA154.EB948981@prescod.net> Mike Olson wrote: > >... > > I think these will have to be 2 different objects. In XPath the > difference is Path vs expression. Its a matter of where you start in > the EBNF. > > A path is used to select, and an expression is used to > match. Does it require two APIs, though? You can compile anything as an expression, right? So let's say you do that. Then you could throw an exception in select() if the expression doesn't return a nodelist. Or else you could just return the evaluated result and not worry about it. > Do we want to do this, or hold off a bit till DOM III and things like > PAX are finalized? I'd rather implement today and deprecate tomorrow. There is very little cost and doing so and a larger cost in waiting and falling behind. Microsoft's DOM has had this for a couple of years now. We can implement DOM III or PAX when and if they are even finalized. -- Paul Prescod - Not encumbered by corporate consensus Simplicity does not precede complexity, but follows it. - http://www.cs.yale.edu/~perlis-alan/quotes.html From dieter@handshake.de Tue Jul 11 18:49:07 2000 From: dieter@handshake.de (Dieter Maurer) Date: Tue, 11 Jul 2000 19:49:07 +0200 (CEST) Subject: [XML-SIG] XPath in Python 2 In-Reply-To: <396A3FCC.280E610@prescod.net> References: <200007101814.MAA02054@localhost.localdomain> <396A3FCC.280E610@prescod.net> Message-ID: <14699.23900.547367.772602@lindm.dm> Paul Prescod writes: > Uche Ogbuji wrote: > > > 4XPath is cleaner from a user's point of view, but it requires a lot bit > > > of C/lex code for parsing the XPaths. I don't know if we would have to > > > go back to the BDFL to get permission for that code to go into Python. > > > > Would it be good enough for us just to check in ANSI C code from FLEX/Bison? > > I'm not sure that Guido would go for that because it means that everyone > must go back to you for changes, right? The usual approach is to have generated C sources *and* the flex/bison sources. Thus, if someone wants to change at that level (few will), they will need a C development system. With that, they can easily build flex and bison as well. Dieter From fdrake@beopen.com Wed Jul 12 02:46:57 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Tue, 11 Jul 2000 21:46:57 -0400 (EDT) Subject: [XML-SIG] Proposed XPath API In-Reply-To: <396BA154.EB948981@prescod.net> References: <396B7164.D8CA9111@prescod.net> <396B9AD2.7BA52804@FourThought.com> <396BA154.EB948981@prescod.net> Message-ID: <14699.52753.24012.915966@cj42289-a.reston1.va.home.com> Paul Prescod writes: > I'd rather implement today and deprecate tomorrow. There is very little > cost and doing so and a larger cost in waiting and falling behind. I certainly favor this for the PyXML package; we can decide later for the standard library -- I need to catch up on the meatier XML-SIG traffic before passing any technical judgements. -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From rob@hooft.net Wed Jul 12 07:13:41 2000 From: rob@hooft.net (Rob W. W. Hooft) Date: Wed, 12 Jul 2000 08:13:41 +0200 (CEST) Subject: [XML-SIG] auto-support for PyXML In-Reply-To: <20000710200326.S29590@lyra.org> References: <20000710200326.S29590@lyra.org> Message-ID: <14700.3221.99024.869737@temoleh.chem.uu.nl> >>>>> "GS" == Greg Stein writes: GS> I just posted to the Patch Manager regarding patch #100705 GS> ("Supporting PyXML"). I think that messing with __path__ is the GS> wrong approach and can lead to problems down the road. GS> A couple weeks ago, I posted an alternative. You can view that GS> at: GS> http://www.python.org/pipermail/xml-sig/2000-June/004512.html I was very surprised that this did not get any response. I am very positive about your idea. It looks very clean, since it leaves all the real work to the "extra" package (which can be more easily replaced). I don't think the __init__ file should import many real modules: we could think of an approach with stub modules that auto-load themselves on invocation. Rob -- ===== rob@hooft.net http://www.hooft.net/people/rob/ ===== ===== R&D, Nonius BV, Delft http://www.nonius.nl/ ===== ===== PGPid 0xFA19277D ========================== Use Linux! ========= From paul@prescod.net Wed Jul 12 10:34:37 2000 From: paul@prescod.net (Paul Prescod) Date: Wed, 12 Jul 2000 04:34:37 -0500 Subject: [XML-SIG] auto-support for PyXML References: <20000710200326.S29590@lyra.org> Message-ID: <396C3BAD.5BCED691@prescod.net> Greg Stein wrote: > > I just posted to the Patch Manager regarding patch #100705 ("Supporting > PyXML"). I think that messing with __path__ is the wrong approach and can > lead to problems down the road. > > A couple weeks ago, I posted an alternative. You can view that at: > > http://www.python.org/pipermail/xml-sig/2000-June/004512.html > > Can we get closure/consensus on this and get it implemented? I like it. Fred and Andrew have the final say of course. -- Paul Prescod - Not encumbered by corporate consensus Simplicity does not precede complexity, but follows it. - http://www.cs.yale.edu/~perlis-alan/quotes.html From m.favas@per.dem.csiro.au Thu Jul 13 22:09:18 2000 From: m.favas@per.dem.csiro.au (Mark Favas) Date: Fri, 14 Jul 2000 05:09:18 +0800 Subject: [XML-SIG] FW: pyexpat compilation errors - Python 2.0b1 References: <200007041537.RAA05474@statistik.cinetic.de> <3962235E.758F10F8@prescod.net> Message-ID: <396E2FFE.EC17FC5B@per.dem.csiro.au> Sorry for the delay in replying, Paul - work intervened , and some changes to Python killed the build on my machine. Anyway - no, this doesn't work for me, because we're left with the same problem. In initpyexpat, the compiler still needs to know the type of handler_info_array, and it doesn't at this time... cc: Error: ./pyexpat.c, line 857: In this statement, "handler_info_array" is not declared. (undeclared) handler_info = handler_info_array; -------------------^ The only solution to this is to move the initpyexpat() function to the end of the file (moving the definition of handler_info_array to before initpyexpat doesn't work, because the definition of handler_info_array requires some declarations that appear after initpyexpat()... So this approach would work if the code were re-arranged. BTW, the current Python 2.0b1 CVS version of pyexpat.c will not compile on my machine without the patch below: The problem is (basically): cc: Error: ./pyexpat.c, line 427: An unexpected newline character is present in a string literal. (nlstring) "Parse(data[, isfinal]) cc: Error: ./pyexpat.c, line 428: An unexpected newline character is present in a character constant. (nlchar) Parse XML data. `isfinal' should be true at end of input."; -------------------------^ cc: Warning: ./pyexpat.c, line 428: A character constant value requires more than sizeof(int) bytes of storage. (charoverfl) Parse XML data. `isfinal' should be true at end of input."; -------------------------^ ^ Same around lines 503-504, 558-559 and 576-577 Patch is: *** pyexpat.c.orig Thu Jul 13 19:02:16 2000 --- pyexpat.c.compiles Thu Jul 13 04:30:23 2000 *************** *** 424,430 **** /* ---------------------------------------------------------------- */ static char xmlparse_Parse__doc__[] = ! "Parse(data[, isfinal]) Parse XML data. `isfinal' should be true at end of input."; static PyObject * --- 424,430 ---- /* ---------------------------------------------------------------- */ static char xmlparse_Parse__doc__[] = ! "Parse(data[, isfinal])\n\ Parse XML data. `isfinal' should be true at end of input."; static PyObject * *************** *** 500,506 **** } static char xmlparse_ParseFile__doc__[] = ! "ParseFile(file) Parse XML data from file-like object."; static PyObject * --- 500,506 ---- } static char xmlparse_ParseFile__doc__[] = ! "ParseFile(file)\n\ Parse XML data from file-like object."; static PyObject * *************** *** 555,561 **** } static char xmlparse_SetBase__doc__[] = ! "SetBase(base_url) Set the base URL for the parser."; static PyObject * --- 555,561 ---- } static char xmlparse_SetBase__doc__[] = ! "SetBase(base_url)\n\ Set the base URL for the parser."; static PyObject * *************** *** 573,579 **** } static char xmlparse_GetBase__doc__[] = ! "GetBase() -> url Return base URL string for the parser."; static PyObject * --- 573,579 ---- } static char xmlparse_GetBase__doc__[] = ! "GetBase() -> url\n\ Return base URL string for the parser."; static PyObject * Paul Prescod wrote: > > I like this solution. Work for you Mark? > > Juergen Hermann wrote: > > > > statichere struct HandlerInfo* handler_info = 0; > > > > ... > > > > statichere struct HandlerInfo handler_info_array[]= > > {{"StartElementHandler", > > ... > > }; > > > > void > > initpyexpat(){ > > handler_info = handler_info_array; > > ... > > } > > -- > Paul Prescod - Not encumbered by corporate consensus > The distinction between the real twentieth century (1914-1999) and the > calenderical one (1900-2000) is based on the convincing idea that the > century's bouts of unprecented violence, both within nations and between > them, possess a definite historical coherence -- that they constitute, > to > put it simply, a single story. > - The Unfinished Twentieth Century, Jonathan Schell > Harper's Magazine, January 2000 -- Email - m.favas@per.dem.csiro.au Mark C Favas Phone - +61 8 9333 6268, 0418 926 074 CSIRO Exploration & Mining Fax - +61 8 9383 9891 Private Bag No 5, Wembley WGS84 - 31.95 S, 115.80 E Western Australia 6913 From willey@protectix.com Fri Jul 14 01:57:05 2000 From: willey@protectix.com (Mark Willey) Date: Thu, 13 Jul 2000 17:57:05 -0700 Subject: [XML-SIG] Python XML DOM: problem accessing length attribute in NodeList Message-ID: <20000714005707.5BB841CFBC@dinsdale.python.org> Hi, all. I am new to Python and XML and DOM: caveat. I think I have found an undocumented deviation from the spec in core.py that I had to UTSL to get around. I am trying to 'print "element with ", list.length, " children."' but had to 'print "element with ", list.get_length(), " children."' because NodeList appears to have .get_length(), but not .length. I hope this is the right place to place this report. It was the address in the README for "XML package v0.5.1". I did look back at the mailing list and saw the discussion about the attributes vs methods. I grokked that both were to be supported, and the reality of the class differs from the documentation, so I am assuming this is a real bug. class NodeList(UserList.UserList): """An ordered collection of nodes, equivalent to a Python list. The only difference is that an .item() method and a .length attribute are added. """ Mark From Juergen Hermann" On Fri, 14 Jul 2000 05:09:18 +0800, Mark Favas wrote: > static char xmlparse_Parse__doc__[] =3D >! "Parse(data[, isfinal])\n\ > Parse XML data. `isfinal' should be true at end of input."; Please do not use \ to escape a line break in string literals, that is K&R1 C and long forgotten. The Right Way(tm) is to do this: char spam[] =3D "line1\n" "line2\n" ... "end"; That is ANSI-C and should work everywhere. Ciao, J=FCrgen -- J=FCrgen Hermann (jhe@webde-ag.de) WEB.DE AG, Amalienbadstr.41, D-76227 Karlsruhe Tel.: 0721/94329-0, Fax: 0721/94329-22 From fdrake@beopen.com Fri Jul 14 16:56:00 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Fri, 14 Jul 2000 11:56:00 -0400 (EDT) Subject: [XML-SIG] Re: [Python-Dev] Request for Opinions In-Reply-To: <396EE66F.59D0D45F@prescod.net> References: <200007140516.HAA28633@python.inrialpes.fr> <396EE66F.59D0D45F@prescod.net> Message-ID: <14703.14352.411905.934970@cj42289-a.reston1.va.home.com> [Note that I've simplified the headers, but cross-posted to python-dev and xml-sig. *Please* follow up to xml-sig!] Paul Prescod writes: > Why aren't we ready to dicuss it? I think the XML crew should discuss these things first. PyXML has generally been treated as a catch all for various neat XML stuff, but looking at it as a careful extension of the core XML support means we need to think about it that way, including the line between what's in the core and what isn't. I think it's very valuable to listen to the experts on this topic (which I think is predominantly you, the FourThought crew, and Lars, with Sean playing a secondary role since he's usually too busy to participate in the discussions). I'd like to see this discussed in the SIG with an eye to creating two non-experimental packages: 1. XML support for the Python standard library, and 2. an XML extension package that adds support for more recommendations and candidate recommendations. There should still be something like the current PyXML, which contains all the neat stuff that doesn't fall in one of the other two categories. I think a PEP and further discussions in the XML-SIG are in order before we add more material into the standard library. I'm firmly committed to getting the "right stuff" in there, but I don't want to rush headlong into adding things before they're ready and agreed upon. I'd love to see you or one of the other Python+XML leaders be editor for a PEP on this topic. -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From paul@prescod.net Fri Jul 14 17:56:54 2000 From: paul@prescod.net (Paul Prescod) Date: Fri, 14 Jul 2000 11:56:54 -0500 Subject: [XML-SIG] Re: [Python-Dev] Request for Opinions References: <200007140516.HAA28633@python.inrialpes.fr> <396EE66F.59D0D45F@prescod.net> <14703.14352.411905.934970@cj42289-a.reston1.va.home.com> Message-ID: <396F4656.6396D7C8@prescod.net> "Fred L. Drake, Jr." wrote: > > I think a PEP and further discussions in the XML-SIG are in order > before we add more material into the standard library. I'm firmly > committed to getting the "right stuff" in there, but I don't want to > rush headlong into adding things before they're ready and agreed > upon. My impression was that we discussed this in the SIG last week but the discussion petered out when we realized that none of us knew whether it was even feasible to add something the size of 4XPath to Python. Nobody could say it would or would not be a problem which makes planning impossible and further discussion a useless cycle of "wouldn't it be nice, but it may or may not be possible, but wouldn't it be nice, but it may or may not be ...". -- Paul Prescod - Not encumbered by corporate consensus It's difficult to extract sense from strings, but they're the only communication coin we can count on. - http://www.cs.yale.edu/~perlis-alan/quotes.html From fdrake@beopen.com Fri Jul 14 18:06:43 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Fri, 14 Jul 2000 13:06:43 -0400 (EDT) Subject: [XML-SIG] Re: [Python-Dev] Request for Opinions In-Reply-To: <396F4656.6396D7C8@prescod.net> References: <200007140516.HAA28633@python.inrialpes.fr> <396EE66F.59D0D45F@prescod.net> <14703.14352.411905.934970@cj42289-a.reston1.va.home.com> <396F4656.6396D7C8@prescod.net> Message-ID: <14703.18595.926991.596182@cj42289-a.reston1.va.home.com> Paul Prescod writes: > My impression was that we discussed this in the SIG last week but the > discussion petered out when we realized that none of us knew whether it > was even feasible to add something the size of 4XPath to Python. Nobody > could say it would or would not be a problem which makes planning Then this falls into the "Fred's falling behind on his email again" category, and my remarks can be safely ignored until I've done my homework. Sorry for the confusion. ;( But I still think a PEP documenting the rationale for any decision is valuable. ;) -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From uche.ogbuji@fourthought.com Fri Jul 14 20:03:23 2000 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Fri, 14 Jul 2000 13:03:23 -0600 Subject: [XML-SIG] [Fwd: [ANN] 4XSLT 0.9.2beta for Windows] Message-ID: <396F63FB.4276081E@fourthought.com> -------- Original Message -------- Subject: [ANN] 4XSLT 0.9.2beta for Windows Date: Fri, 14 Jul 2000 11:05:26 -0600 From: "Jeremy J Kloth" To: "XSL List" CC: "Consultants" Do to an external need, we have released a beta of our upcoming 4XSLT package. Please note that this is a binary distribution for Windows. However, the source is included, so it can be built elsewhere. 4DOM Changes in 0.10.2 (R200007) -------------------------------- - Support wide range of output encodings via wstring - Updated conformance to 20000510 DOM CR - Changed internals to use Node as the clone manager, using a pickle- style interface. - Changed many classes to be generated in the HTML Extension - Other bug-fixes 4XSLT Changes in 0.9.2 (R200007) -------------------------------- - Cleaned up Processor API - Restructured for cleanliness of stylesheet objects - implement full range of encoding support (really done in 4DOM) - implement extension elements and fallback (yay! XSLT 1.0 feature complete!) - simplify and document extension functions - Fix function-available - BaseUri support - Fixes to xsl:import - Fix some performance bugs - Better exception handling from XPatternParser - Misc bug-fixes This is a *BETA* release. -- Jeremy Kloth Consultant jeremy.kloth@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://www.fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From bjrubble@dagobah.com Fri Jul 14 23:42:33 2000 From: bjrubble@dagobah.com (Adam Clark) Date: Fri, 14 Jul 2000 15:42:33 -0700 (PDT) Subject: [XML-SIG] No parents in cloned node? Message-ID: Hi, Please let me know if this is the wrong list for this. So you clone a node, it has no parent. That makes sense. But the cloned node's children also have no parents. Is there a reason for this? >>> print node.toxml()