From paul@prescod.net Tue Aug 1 08:41:06 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 01 Aug 2000 03:41:06 -0400 Subject: [XML-SIG] Permission request Message-ID: <39867F12.FBD44F26@prescod.net> When my publisher told me "not to worry about permissions" I didn't know that they would end up spamming our public list. Forgive them, they know not what they do. Thanks for giving them a warm fuzzy, though, Ken. (now I remember why I don't usually take vacations!) -------------------------------------------------------------------------------- TO: Charles F. Goldfarb and Paul Prescod c/o Peter S. Snell - Prentice Hall Publishers I(we) grant you permission to include documentation and code for Python XML package in the book "The XML Handbook, Third Edition" to be published by Prentice Hall publishers in the Charles F. Goldfarb Series on Open Information Management, and grant you non-exclusive worldwide distribution rights. I(we) also grant you permission to include documentation and code for Python XML package on any future book in the Charles F. Goldfarb Series on Open Information Management published by Prentice Hall Publishers, and grant you non-exclusive worldwide distribution rights. -- Ken MacLeod -- Paul Prescod - Not encumbered by corporate consensus "I don't want you to describe to me -- not ever -- what you were doing to that poor boy to make him sound like that; but if you ever do it again, please cover his mouth with your hand," Grandmother said. -- John Irving, "A Prayer for Owen Meany" From zanasi@inwind.it Tue Aug 1 07:38:40 2000 From: zanasi@inwind.it (WIN ZANASI) Date: Tue, 1 Aug 2000 08:38:40 +0200 Subject: [XML-SIG] Setup Error Message-ID: <002d01bffb83$2252c300$0500a8c0@ugo> This is a multi-part message in MIME format. ------=_NextPart_000_0028_01BFFB93.E46BD660 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Under Win(98) setup.py does not work. # XXX is this correct? dest_dir =3D (sys.prefix + '\\lib\\python' + sys.version[:3] + '\\site-packages\\xml\\' ) I beleave that's better: =20 dest_dir =3D (sys.prefix + '/lib/xml' ) =20 Zanasi =20 ------=_NextPart_000_0028_01BFFB93.E46BD660 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Under Win(98) setup.py does not = work.
 
# XXX is this correct?
dest_dir =3D = (sys.prefix +=20 '\\lib\\python' + sys.version[:3]=20 +
           =20 '\\site-packages\\xml\\' )
I beleave that's better:
      =
dest_dir =3D=20 (sys.prefix + '/lib/xml' )
 
Zanasi

 
------=_NextPart_000_0028_01BFFB93.E46BD660-- From Juergen Hermann" Message-ID: <200008010837.KAA30311@statistik.cinetic.de> --Original Message Text--- # XXX is this correct? dest_dir =3D (sys.prefix + '\\lib\\python' + sys.version[:3] + '\\site-packages\\xml\\' ) I beleave that's better: dest_dir =3D (sys.prefix + '/lib/xml' ) ---------- BTW, an option to set this via a cmd line parameter would be nice (--destdir=3Dxxx). I'd be happy to provide & commit the change if you people agree. Ciao, J=FCrgen -- J=FCrgen Hermann, Developer (jhe@webde-ag.de) WEB.DE AG, http://webde-ag.de/ From larsga@garshol.priv.no Tue Aug 1 10:26:03 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 01 Aug 2000 11:26:03 +0200 Subject: [XML-SIG] Permission request In-Reply-To: <39867F12.FBD44F26@prescod.net> References: <39867F12.FBD44F26@prescod.net> Message-ID: * Paul Prescod | | When my publisher told me "not to worry about permissions" I didn't | know that they would end up spamming our public list. Forgive them, | they know not what they do. Uh, actually this is my fault. I auto-generated the list of email addresses they use to gather permissions, and didn't consider that for the XML-SIG package I'd put the XML-SIG mailing list address. --Lars M. From paul@prescod.net Tue Aug 1 17:53:42 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 01 Aug 2000 12:53:42 -0400 Subject: [XML-SIG] current status of the xml-sig References: Message-ID: <39870096.3601E635@prescod.net> Robin Becker wrote: > > Can someone explain what the status of the XML-sig is. All of the sigs > at www.python.org are reported as terminating in June 2000. The SAX > tutorial is at 0.5 Jan 2000 and neither it nor the other docs mention > minidom or pulldom which are apparently going into 1.6/2.0. Is there a > definitive python xml way now and if so where is it? I don't think we have committed to a definitive Python way. There is SAX, which is still supposed to go under revision. If Lars can't get to it soon, Fred will probably have to look at it. There is minidom, which is done and awaiting some feedback. Pulldom is currently an undocumented "implementation detail" of minidom. I'd like to expose it but nobody here is really willing to say one way or another. -- Paul Prescod - Not encumbered by corporate consensus "I don't want you to describe to me -- not ever -- what you were doing to that poor boy to make him sound like that; but if you ever do it again, please cover his mouth with your hand," Grandmother said. -- John Irving, "A Prayer for Owen Meany" From fdrake@beopen.com Tue Aug 1 18:35:55 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Tue, 1 Aug 2000 13:35:55 -0400 (EDT) Subject: [XML-SIG] current status of the xml-sig In-Reply-To: <39870096.3601E635@prescod.net> References: <39870096.3601E635@prescod.net> Message-ID: <14727.2683.3648.936194@cj42289-a.reston1.va.home.com> Paul Prescod writes: > There is SAX, which is still supposed to go under revision. If Lars > can't get to it soon, Fred will probably have to look at it. > > There is minidom, which is done and awaiting some feedback. > > Pulldom is currently an undocumented "implementation detail" of minidom. > I'd like to expose it but nobody here is really willing to say one way > or another. Let me just go on record as saying that I've not gotten back to these because I'm completely swamped. ;-{( I should be able to free up a *small* amount of time soon, but since I'm the release manager for Python 1.6, it won't be today, and probably not this week. ;(( -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From gstein@lyra.org Wed Aug 2 03:00:44 2000 From: gstein@lyra.org (Greg Stein) Date: Tue, 1 Aug 2000 19:00:44 -0700 Subject: [XML-SIG] Extending the xml package In-Reply-To: <14719.51498.429119.458996@cj42289-a.reston1.va.home.com>; from fdrake@beopen.com on Thu, Jul 27, 2000 at 01:31:22AM -0400 References: <14719.51498.429119.458996@cj42289-a.reston1.va.home.com> Message-ID: <20000801190044.U19525@lyra.org> I saw that you already checked this in :-( This solution is definitely sub-optimal. Specifically: sys.modules['xml'].__name__ != 'xml' Monkeying around with sys.modules is a clue that you are depending on "magic" in the import and module-handling mechanisms. Given some of the import stuff that people are trying to do (various packaging and archiving and stuff), this is dangerous stuff. Here is my suggestion again (with some merging of your code): xml.__init__: __version__ = 1 if __name__ == 'xml': try: import _xmlplus except ImportError: pass else: _xmlplus._setup_package(__version__) _xmlplus.__init__: def _setup_package(version): import xml # note: available, but empty from _xmlplus.parsers import xmlproc, xmlsre xml.parsers.xmlproc = xmlproc xml.parsers.xmlsre = xmlsre # etc The above code relies on nothing "magic" in the module or import handling. The _xmlplus package can extend the xml package in any way that it sees fit. This solution is much more robust in the fact of "funny" import/packaging scenarios. _xmlplus can override stuff in the xml package and it can extend it. But it doesn't override *everything* which your suggested code does. Cheers, -g On Thu, Jul 27, 2000 at 01:31:22AM -0400, Fred L. Drake, Jr. wrote: > > As promised, I brought up the package extension issue at today's > PythonLabs meeting. We decided that there are two interesting cases > for package importing involved here. > The first is package extension -- allowing one package to extend > another. We basically agreed that the Java model got this right, with > the issue of multiple __init__ modules being a serious problem for > Python (it's not clear what the right way to deal with multiple > __init__ modules; you want to execute all of them, and the current > implementation doesn't lend itself to this). This is the approach > we've discussed here before. > Another possibility is providing an extended replacement for the > standard package. This doesn't sound like it makes sense given that > using the same name creates order dependencies for sys.path, and the > current setup would be wrong for overriding a standard package with a > package installed in site-packages or a user's or application's > private library. > However, this actually appears to be the most reasonable if we want > to be able to include bug fixes in the "enhanced" version of the > package, and doesn't require weird hacking on distutils. It does > require that the package that can be overridden in this way be written > to support this. > Here's how to do it: > Deploy the "xml" package in the standard library. Create an > "_xmlplus" package (PyXML) which provides all of the facilities from > the standard library and any extensions. The "_xmlplus" package can > be treated as any other package with distutils. > In the __init__.py for the "xml" package, include the following > code: > > ------------------------------------------------------------ > if __name__ == "xml": > try: > import _xmlplus > except ImportError: > pass > else: > import sys > sys.modules[__name__] = _xmlplus > ------------------------------------------------------------ > > Yes, this works. > The leading test for __name__ is useful to allow the same file to be > used for both the xml and _xmlplus packages. > The PyXML package (providing _xmlplus) could continue to be the > leading-edge development package with all the bells and whistles, and > portions adopted into the standard library could be updated before a > Python release. > Guido also suggested looking at the version-handling code from Pmw, > but I don't know how valuable that would be. > Comments? > > > -Fred > > -- > Fred L. Drake, Jr. > BeOpen PythonLabs Team Member > > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Greg Stein, http://www.lyra.org/ From fdrake@beopen.com Wed Aug 2 04:07:50 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Tue, 1 Aug 2000 23:07:50 -0400 (EDT) Subject: [XML-SIG] Extending the xml package In-Reply-To: <20000801190044.U19525@lyra.org> References: <14719.51498.429119.458996@cj42289-a.reston1.va.home.com> <20000801190044.U19525@lyra.org> Message-ID: <14727.36998.895676.567623@cj42289-a.reston1.va.home.com> Greg Stein writes: > This solution is definitely sub-optimal. Specifically: > > sys.modules['xml'].__name__ != 'xml' So use this: import xml if xml.__name__ != "xml": # not the standard library... else: # is the standard library... Setting the object referenced in sys.modules is considered as legitimate as it gets, and this solution allows all path searching to be handled by the import machinery rather than the package's __init__ module, and has previously been sanctioned by Guido. (Note that the Pmw package does something similar, but I think the object inserted into sys.modules isn't actually a module object.) This seems to be the approach which has the least interference with pluggable import mechanisms, none of which should have to hack deeply in the handling of sys.modules. > Monkeying around with sys.modules is a clue that you are depending on > "magic" in the import and module-handling mechanisms. Given some of the > import stuff that people are trying to do (various packaging and archiving > and stuff), this is dangerous stuff. > > Here is my suggestion again (with some merging of your code): ... > _xmlplus.__init__: > > def _setup_package(version): > import xml # note: available, but empty > > from _xmlplus.parsers import xmlproc, xmlsre > xml.parsers.xmlproc = xmlproc > xml.parsers.xmlsre = xmlsre ... > The above code relies on nothing "magic" in the module or import handling. The catch being, as I recall someone pointed out earlier, that this doesn't support "lazy" importing, but is very imports very agressively. This is completely unacceptable. > The _xmlplus package can extend the xml package in any way that it sees fit. > This solution is much more robust in the fact of "funny" import/packaging > scenarios. _xmlplus can override stuff in the xml package and it can extend > it. But it doesn't override *everything* which your suggested code does. The complete override is intentional. This also allows the _xmlplus package to provide bug fixes and handle compatibility issues as _xmlplus itself evolves. I wrote: > ------------------------------------------------------------ > if __name__ == "xml": > try: > import _xmlplus > except ImportError: > pass > else: > import sys > sys.modules[__name__] = _xmlplus > ------------------------------------------------------------ This can be simplified, though: the surrounding "if __name__ ..." test can be removed. There will be three different __init__.py files: one from the standard library, one in _xmlplus, and one in an xml package supplied alongside the _xmlplus package (for Python 1.5.2 installations); the later should include only: ------------------------------------------------------------ import _xmlplus import sys sys.modules[__name__] = _xmlplus ------------------------------------------------------------ -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From larsga@garshol.priv.no Wed Aug 2 10:10:05 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 02 Aug 2000 11:10:05 +0200 Subject: [XML-SIG] SAX 2.0 resolution? In-Reply-To: References: Message-ID: * Lars Marius Garshol | | [startElement & startElementNS] | | The only disadvantages I see are that this may cost an extra method | call per callback for some generic filters and that it does not make | it clear that it is not allowed to mix the namespace and | non-namespace methods in a single document. * Ken MacLeod | | How does this affect down-line filters? what happens when a | non-NS-using filter precedes an NS-using filter or handler? I don't think this is the right way to look at it, because whether the filter uses namespaces or not does not matter. If it doesn't use namespaces it should still support them, and if it can't support them, then that's the problem and not that it doesn't use namespaces. So what matters is whether a filter supports namespaces and whether it supports non-NS-mode. Since you can't use filters that require namespaces with filters that do not support namespaces anyway, I don't think this is a problem. That is, you should be able to assemble the filter stack regardless of what the participating filters support and do not support. However, if you try to turn off namespace support with setFeature() the filters that require NS processing should complain. If you try to run with namespace support on the filters that do not support it should complain. So you should get a proper error message if you do something you shouldn't. Otherwise things should work, provided filter writers do their job properly. | With the startElement(namespaceURI, localName, qName, attrs) model, | you'd expect upline filters to pass all the parameters whether or not | they themselves used it. Agreed. In this model it is expected that filters pass on both the startElement and the startElementNS events, provided that this makes sense. If you implement XInclude using a filter, this would require namespace support, and so it would not make sense to pass on startElement events, because if the filter gets startElement events something is wrong. (It should actually raise a SAXNotSupportedException in both startElement and endElement.) If you implement a filter that joins together consecutive characters() events so that you never have character data split into several events this filter should pass on both startElement and startElementNS events, since those are orthogonal to what the filter does. | (And, of course, you wouldn't be expected to mix SAX1 and SAX2 | filters.) You still aren't. A SAX1 filter will not be able to handle the skippedEntity event, for example. --Lars M. From larsga@garshol.priv.no Wed Aug 2 10:11:35 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 02 Aug 2000 11:11:35 +0200 Subject: [XML-SIG] current status of the xml-sig In-Reply-To: <39870096.3601E635@prescod.net> References: <39870096.3601E635@prescod.net> Message-ID: * Paul Prescod | | There is SAX, which is still supposed to go under revision. If Lars | can't get to it soon, Fred will probably have to look at it. To avoid possible confusion: I got to it yesterday (see the 'SAX 2.0 resolution?' thread), and am now waiting for feedback from the XML-SIG. I have three weeks of vacation now, so I expect to be able to do this once we've agreed on just what it is. :-) --Lars M. From tismer@appliedbiometrics.com Wed Aug 2 15:45:35 2000 From: tismer@appliedbiometrics.com (Christian Tismer) Date: Wed, 02 Aug 2000 16:45:35 +0200 Subject: [XML-SIG] Win32 XML installer update 08/02/00 Message-ID: <3988340F.315AD009@appliedbiometrics.com> Hi Python XML users, an updated version of the win32 installer for the current XML-sig CVS tree is available at http://www.tismer.com/xml/PythonXML.EXE Please inform me if you find any glitches. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From ainsinga@infomation.com Thu Aug 3 19:07:23 2000 From: ainsinga@infomation.com (Aron Insinga) Date: Thu, 03 Aug 2000 14:07:23 -0400 Subject: [XML-SIG] xalan/xerces In-Reply-To: <20000802160113.941F41D1A1@dinsdale.python.org> Message-ID: <4.2.2.20000803135705.00ab1100@pop.enterprise-services.com> Is there a/is anyone working on a Python interface to the Apache XML project's xalan XSLT processor? - Aron Insinga From alexandre.fayolle@free.fr Fri Aug 4 08:38:01 2000 From: alexandre.fayolle@free.fr (Alexandre Fayolle) Date: Fri, 04 Aug 2000 09:38:01 +0200 (MEST) Subject: [XML-SIG] problem with validating parser Message-ID: <965374681.398a72d94caa3@imp.free.fr> Hello, I'm trying to use the validating parser, and I run into a problem. However, I don't think that my document is invalid. Could someone help me, please ? the code : from xml.dom.ext.reader import Sax2 tree = Sax2.FromXmlFile('MainFrame.xml',validate=1) The document : I have a file called frame.dtd in the current directory. The stack trace I get : [alf@leo alui]$ python Utils.py Traceback (innermost last): File "Utils.py", line 73, in ? tree = Sax2.FromXmlFile('MainFrame.xml',validate=TRUE) File "/home/alf/xmlSig/repository/xml/xml/dom/ext/reader/Sax2.py", line 266, in FromXmlFile rv = FromXmlStream(fp, ownerDocument, validate, keepAllWs, catName, saxHandlerClass) File "/home/alf/xmlSig/repository/xml/xml/dom/ext/reader/Sax2.py", line 246, in FromXmlStream parser.parseFile(stream) File "/home/alf/xmlSig/repository/xml/xml/sax/drivers/drv_xmlproc.py", line 39, in parseFile self.parser.read_from(file) File "/home/alf/xmlSig/repository/xml/xml/parsers/xmlproc/xmlval.py", line 102, in read_from self.parser.read_from(file) File "/home/alf/xmlSig/repository/xml/xml/parsers/xmlproc/xmlutils.py", line 137, in read_from self.feed(buf) File "/home/alf/xmlSig/repository/xml/xml/parsers/xmlproc/xmlutils.py", line 185, in feed self.do_parse() File "/home/alf/xmlSig/repository/xml/xml/parsers/xmlproc/xmlproc.py", line 101, in do_parse self.parse_doctype() File "/home/alf/xmlSig/repository/xml/xml/parsers/xmlproc/xmlproc.py", line 504, in parse_doctype sys_id)) File "/home/alf/xmlSig/repository/xml/xml/parsers/xmlproc/xmlutils.py", line 77, in parse_resource self.read_from(infile,bufsize) File "/home/alf/xmlSig/repository/xml/xml/parsers/xmlproc/xmlutils.py", line 137, in read_from self.feed(buf) File "/home/alf/xmlSig/repository/xml/xml/parsers/xmlproc/xmlutils.py", line 185, in feed self.do_parse() File "/home/alf/xmlSig/repository/xml/xml/parsers/xmlproc/dtdparser.py", line 249, in do_parse self.parse_elem_type() File "/home/alf/xmlSig/repository/xml/xml/parsers/xmlproc/dtdparser.py", line 525, in parse_elem_type self.report_error(3004,("EMPTY, ANY","(")) File "/home/alf/xmlSig/repository/xml/xml/parsers/xmlproc/xmlutils.py", line 372, in report_error self.err.fatal(msg) File "/home/alf/xmlSig/repository/xml/xml/sax/drivers/drv_xmlproc.py", line 96, in fatal self.err_handler.fatalError(saxlib.SAXParseException(msg,None,self)) File "/home/alf/xmlSig/repository/xml/xml/dom/ext/reader/Sax2.py", line 226, in fatalError raise exception xml.sax.saxlib.SAXParseException: One of EMPTY, ANY or '(' expected at Unknown:2:42 Column 42 is just after the closing '>' on line 2, I cannot see why I should open a bracket or put EMPTY or ANY at this point of my file. What am I missing ? Thanks for your help. Alexandre Fayolle From larsga@garshol.priv.no Fri Aug 4 08:45:27 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 04 Aug 2000 09:45:27 +0200 Subject: [XML-SIG] problem with validating parser In-Reply-To: <965374681.398a72d94caa3@imp.free.fr> References: <965374681.398a72d94caa3@imp.free.fr> Message-ID: * Alexandre Fayolle | | xml.sax.saxlib.SAXParseException: One of EMPTY, ANY or '(' expected at | Unknown:2:42 | | Column 42 is just after the closing '>' on line 2, I cannot see why | I should open a bracket or put EMPTY or ANY at this point of my | file. What am I missing ? Whatever the real problem may be, the parser claims to have found a syntax error there. Could you show us the first 3-4 lines of your DTD? --Lars M. From cogbuji@fourthought.com Fri Aug 4 17:34:05 2000 From: cogbuji@fourthought.com (Chime Thomas-Ogbuji) Date: Fri, 4 Aug 2000 10:34:05 -0600 (MDT) Subject: [XML-SIG] XSL Template proposal Message-ID: A proposal for an XSL Template product in Zope has been posted on the Zope developer site. We would appreciate, feedback suggestions, additions, etc. to the proposal. If you would like to participate with the process and you aren't already a member you can join here http://dev.zope.org/Register/register.html, and you will then be able to participate in the collaborative process. The proposal can be found here: http://dev.zope.org/Wikis/DevSite/Proposals/XSLTMethod Chimezie Thomas-Ogbuji Consultant Fourthought Inc. (303) 583 9900 ext 104 cogbuji@fourthought.com From fwang2@yahoo.com Fri Aug 4 20:18:42 2000 From: fwang2@yahoo.com (fwang2@yahoo.com) Date: Fri, 4 Aug 2000 15:18:42 -0400 (EDT) Subject: [XML-SIG] build error on PyXML 0.5.5 package In-Reply-To: <965374681.398a72d94caa3@imp.free.fr> Message-ID: Hi, I tried to build PyXML 0.5.5 (also 0.5.4) on a RH 6.2 machine, but failed, something to do pyexpat.so. I am not an experienced user, could you please help me? Thanks oliver ------------ make -f ./Makefile.pre.in VPATH=. srcdir=. \ VERSION=$VERSION \ installdir=$installdir \ exec_installdir=$exec_installdir \ Makefile make[1]: Entering directory `/root/PyXML-0.5.5/extensions' make[1]: *** No rule to make target `/usr/lib/python1.5/config/Makefile', needed by `sedscript'. Stop. make[1]: Leaving directory `/root/PyXML-0.5.5/extensions' make: *** [boot] Error 2 Running command: make make: *** No targets. Stop. Traceback (innermost last): File "setup.py", line 185, in ? func() File "setup.py", line 155, in build_unix shutil.copy('extensions/' + filename, 'build/xml/parsers/') File "/usr/lib/python1.5/shutil.py", line 52, in copy copyfile(src, dst) File "/usr/lib/python1.5/shutil.py", line 17, in copyfile fsrc = open(src, 'rb') IOError: [Errno 2] No such file or directory: 'extensions/pyexpat.so' From alexandre.fayolle@free.fr Sat Aug 5 16:08:33 2000 From: alexandre.fayolle@free.fr (Alexandre Fayolle) Date: Sat, 05 Aug 2000 17:08:33 +0200 (MEST) Subject: [XML-SIG] problem with validating parser In-Reply-To: References: <965374681.398a72d94caa3@imp.free.fr> <965377809.398a7f11e25cc@imp.free.fr> Message-ID: <965488113.398c2df1f4072@imp.free.fr> En réponse à Lars Marius Garshol : > No problem. This is a bug in the XML parser: it really should refer to > the line in the DTD. Which xmlproc version are you using? > > If you don't know, do this: > > from xml.parsers.xmlproc.xmlproc import * > print version 0.70 Alexandre Fayolle From fwang2@yahoo.com Sat Aug 5 17:43:49 2000 From: fwang2@yahoo.com (oliver) Date: Sat, 5 Aug 2000 09:43:49 -0700 (PDT) Subject: [XML-SIG] build error on PyXML 0.5.5 package Message-ID: <20000805164349.14149.qmail@web208.mail.yahoo.com> Thanks for reply, after install dev rpm, I can build PyXML 0.5.4, but compiling PyXML 0.5.5 still gave me the following errors. Executing 'build' action... Running command: make gcc -fPIC -Iexpat/xmlparse -g -O2 -I/usr/include/python1.5 -I/usr/include/python1.5 -DHAVE_CONFIG_H -c ./pyexpat.c ./pyexpat.c: In function `newxmlparseobject': ./pyexpat.c:474: parse error before `xmlparseobject' make: *** [pyexpat.o] Error 1 Traceback (innermost last): File "setup.py", line 185, in ? func() File "setup.py", line 155, in build_unix shutil.copy('extensions/' + filename, 'build/xml/parsers/') File "/usr/lib/python1.5/shutil.py", line 52, in copy copyfile(src, dst) File "/usr/lib/python1.5/shutil.py", line 17, in copyfile fsrc = open(src, 'rb') IOError: [Errno 2] No such file or directory: 'extensions/pyexpat.so' --- Dieter Maurer wrote: > fwang2@yahoo.com writes: > > make -f ./Makefile.pre.in VPATH=. srcdir=. \ > > VERSION=$VERSION \ > > installdir=$installdir \ > > exec_installdir=$exec_installdir \ > > Makefile > > make[1]: Entering directory > `/root/PyXML-0.5.5/extensions' > > make[1]: *** No rule to make target > `/usr/lib/python1.5/config/Makefile', > > needed by `sedscript'. Stop. > > make[1]: Leaving directory > `/root/PyXML-0.5.5/extensions' > > make: *** [boot] Error 2 > You need to install the Python-Dev (Python > Development) RPM. > > > Dieter __________________________________________________ Do You Yahoo!? Kick off your party with Yahoo! Invites. http://invites.yahoo.com/ From alexandre.fayolle@free.fr Sun Aug 6 16:28:33 2000 From: alexandre.fayolle@free.fr (Alexandre Fayolle) Date: Sun, 06 Aug 2000 17:28:33 +0200 (MEST) Subject: [XML-SIG] Gui description in Xml Message-ID: <965575713.398d8421e8089@imp.free.fr> Hello, I've started putting up a small python programm that builds wxPython graphical interfaces described in XML. There is an early release available on http://fang.free.fr (it is currently possible to play with menus and static text controls, but other controls should come quickly). I posted here since I thought some of you might be interested, my appologizes if such annouces are not welcome on this mailing list. Comments welcome. Alexandre Fayolle From tpassin@home.com Sun Aug 6 17:55:07 2000 From: tpassin@home.com (tpassin@home.com) Date: Sun, 6 Aug 2000 12:55:07 -0400 Subject: [XML-SIG] Gui description in Xml References: <965575713.398d8421e8089@imp.free.fr> Message-ID: <001301bfffc7$139a0920$7cac1218@reston1.va.home.com> Alexandre Fayolle wrote, > I've started putting up a small python programm that builds wxPython graphical > interfaces described in XML. There is an early release available on > http://fang.free.fr (it is currently possible to play with menus and static text > controls, but other controls should come quickly). > GUI definition work like this is a real service. I urge Alexandre to continue extending it. When we have a good system, maybe we will be able to constuct a visual GUI builder for wxPython. Even a rudimentary builder could be a great convenience. Alexandre, you might also look at ZopeEdit, where Jim Bag (I hope I've got his name right) is defining wxPython menus with XML, and Glade, the GUI bulder used for many Linux GUIs, which also descibes GUI elements in XML. ZopeEdit is at http://www.zope.org/Members/jimbag/ZopeEDIT/, and the XML menu definitions are in ZopeEDIT.pyw. Regards, Tom Passin From alexandre.fayolle@free.fr Mon Aug 7 13:10:10 2000 From: alexandre.fayolle@free.fr (Alexandre Fayolle) Date: Mon, 07 Aug 2000 14:10:10 +0200 (MEST) Subject: [XML-SIG] XPath and namespaces Message-ID: <965650210.398ea722258fc@imp.free.fr> What is the syntax to get a namespaced node using xpath. Eg. With 4XPath, I tried "some-node/transform" and "some-node/xsl:transform" both returned an empty list. Thanks for your help. Alexandre Fayolle From mal@lemburg.com Mon Aug 7 14:49:44 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 07 Aug 2000 15:49:44 +0200 Subject: [XML-SIG] xmlpickle.py ?! Message-ID: <398EBE78.A01DE1D6@lemburg.com> I'm currently looking into writing a xmlpickle.py module with the intent to be able to pickle (and unpickle) arbitrary Python objects in a way that makes the objects editable through a XML editor or convertible to some other format using the existing XML tools. After looking at the archives of this SIG, I found that the idea was already tossed around a few times, but I couldn't find any downloadble outcome. I've looked at pickle.py a bit and realized that the extensible nature of the pickle mechanism would probably cause trouble because the DTD would have to be generated as well (not a good idea). There are two alternatives to this though: 1. add an element which handles all non-core Python object types (the ones registered through copy_reg) 2. use an abstract DTD altogheter Example for 1: abcdef 10 abc value 2000 8 6 Example for 2: abcdef 10 abc value 2000 8 6 Variant 1 has the nice feature of making the basic Python explicit and the DTD could also carry some context sensitive information. Variant 2 is much easier to extend and has a very simple DTD, but conversion tools would have to know much about the mechanism behind it to generate correct documents. BTW, I'm very new to XML... what's the general rule in XML on where to put the object value ? ... into an attribute or the tag content ? Thoughts ? Thanks, -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Mike.Olson@fourthought.com Mon Aug 7 16:36:59 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Mon, 07 Aug 2000 09:36:59 -0600 Subject: [XML-SIG] xmlpickle.py ?! References: <398EBE78.A01DE1D6@lemburg.com> Message-ID: <398ED79B.F6E49C65@FourThought.com> "M.-A. Lemburg" wrote: > You may want to look at Zope in Zope/lib/python/ZODB/ImportExport.py. they do some XML pickling here. However, I think they call back to each object for help in the pickle processes (each object writes thier own chunk of XML). Though I'm not 100% sure. Mike > I'm currently looking into writing a xmlpickle.py module > with the intent to be able to pickle (and unpickle) arbitrary > Python objects in a way that makes the objects editable through > a XML editor or convertible to some other format using the > existing XML tools. > > After looking at the archives of this SIG, I found that the > idea was already tossed around a few times, but I couldn't > find any downloadble outcome. > > I've looked at pickle.py a bit and realized that the extensible > nature of the pickle mechanism would probably cause trouble > because the DTD would have to be generated as well (not a good > idea). There are two alternatives to this though: > > 1. add an element which handles all non-core Python object > types (the ones registered through copy_reg) > > 2. use an abstract DTD altogheter > > Example for 1: > > > > abcdef > > 10 > abc > > > value > > > > 2000 > 8 > 6 > > > > > > Example for 2: > > > > abcdef > > 10 > abc > > > value > > > > 2000 > 8 > 6 > > > > > > Variant 1 has the nice feature of making the basic Python > explicit and the DTD could also carry some context sensitive > information. Variant 2 is much easier to extend and has > a very simple DTD, but conversion tools would have to know > much about the mechanism behind it to generate correct > documents. > > BTW, I'm very new to XML... what's the general rule in XML > on where to put the object value ? ... into an attribute > or the tag content ? > > Thoughts ? > > Thanks, > -- > Marc-Andre Lemburg > ______________________________________________________________________ > Business: http://www.lemburg.com/ > Python Pages: http://www.lemburg.com/python/ > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From jim@digicool.com Mon Aug 7 19:39:24 2000 From: jim@digicool.com (Jim Fulton) Date: Mon, 07 Aug 2000 14:39:24 -0400 Subject: Fwd: [XML-SIG] xmlpickle.py ?! References: <00080711113300.02547@quadra.teleo.net> Message-ID: <398F025C.A9726E23@digicool.com> I don't normally have time to follow the xml sig. Someone kindly forwarded Marc-Andre's note to me. I haven't seen the rest of this thread. "M.-A. Lemburg" wrote: > > I'm currently looking into writing a xmlpickle.py module > with the intent to be able to pickle (and unpickle) arbitrary > Python objects in a way that makes the objects editable through > a XML editor or convertible to some other format using the > existing XML tools. I wonder whether a tool that generated XML for arbitrary Python objects would really be that useful for transfer to other applications. I suspect not. > After looking at the archives of this SIG, I found that the > idea was already tossed around a few times, but I couldn't > find any downloadble outcome. Zope has a facility that I've been meaning to make more generally available but haven't had time to. :/ In my case, I wanted to be able to convert to/from binary pickles and xml, so I had an intern write something that works from pickles, rather than from objects. It can be used to look at existing pickles and can be used, in conjunction with pickle or cPickle to convert objects to and from XML. If your interested, let me know and I'll provide more details. > I've looked at pickle.py a bit and realized that the extensible > nature of the pickle mechanism would probably cause trouble > because the DTD would have to be generated as well (not a good > idea). Why would a DTD have to be generated? > There are two alternatives to this though: > > 1. add an element which handles all non-core Python object > types (the ones registered through copy_reg) > > 2. use an abstract DTD altogheter > > Example for 1: > > > > abcdef > > 10 > abc > > > value > > > > 2000 > 8 > 6 > > > > This is the route I took. Here's an example that's probably alot bigger than you want.... title raw \n

\n \n

\n This is the Document \n in the Folder.\n

\n ]]>
__ac_local_roles__ jim Owner globals __name__ m2 _vars
Note that this is pretty much a straight translation of the Python pickle "schema". :) Note the id attributes and reference tags, which allow cyclical data structures. (I recently discovered that there is a problem with my id values. Does anyone know what it is? ;) One other note. I found the XML spec to be a little ambigouos (or maybe I'm just too dense) wrt binary data and newlines, so I decided to punt and escape newlines and binary data. I encode strings as either "repr" which is a repr like encoding that escapes things in a way that is just a tad more terse than repr. I switch to base64 when the escaping penalty exceeds 40%. Since alot of our pickles have marked up text, I automatically use CDATA sections when I can and where it would help. See the example above. I really need to write down a DTD for this...... Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From Fredrik Lundh" Message-ID: <004901c000a7$869dc980$f2a6b5d4@hagrid> walter wrote: > page [('test', 'test'), ('test', 'test'), ('nohome', 'nohome')] > Segmentation fault. >=20 > (I installed the Fredrik's patch from the 5th July. the patch was flawed; you need to add an extra INCREF to avoid running out of references... --- 1284,1291 ---- else value =3D key; /* in SGML mode, default is same as key */ + Py_INCREF(value); + while (p < end && ISSPACE(*p)) p++; From kens@sightreader.python.org Mon Aug 7 20:19:06 2000 From: kens@sightreader.python.org (Ken) Date: Mon, 7 Aug 2000 15:19:06 -0400 Subject: Fwd: [XML-SIG] xmlpickle.py ?! References: <00080711113300.02547@quadra.teleo.net> <398F025C.A9726E23@digicool.com> Message-ID: <004401c000a4$5df44520$14fdff0a@ellipsys.net> I've been working on a somewhat similar project called SPOM (Sax-Pythonic Object Model). It's not quite as automatic as pickling, and it is somewhat less general (it's really designed for tree structures that map nicely to XML), but it requires very little work per element to implement. The advantage over picking is that the output is a natural looking XML file that looks as if it were designed manually (i.e. python class instances map to xml elements and python attributes map to xml attributes, and type information is external to the xml file). You couldn't tell by looking at the xml file that it was created using Spom, and you could even use Spom to work with existing xml files created by other applications and vice-versa. Beta coming soon. Search for SPOM on parnassus in about a week. - Ken ----- Original Message ----- From: "Jim Fulton" To: "M.-A. Lemburg" Cc: Sent: Monday, August 07, 2000 2:39 PM Subject: Re: Fwd: [XML-SIG] xmlpickle.py ?! > I don't normally have time to follow the xml sig. > Someone kindly forwarded Marc-Andre's note to me. > I haven't seen the rest of this thread. > > > "M.-A. Lemburg" wrote: > > > > I'm currently looking into writing a xmlpickle.py module > > with the intent to be able to pickle (and unpickle) arbitrary > > Python objects in a way that makes the objects editable through > > a XML editor or convertible to some other format using the > > existing XML tools. > > I wonder whether a tool that generated XML for arbitrary Python > objects would really be that useful for transfer to > other applications. I suspect not. > > > After looking at the archives of this SIG, I found that the > > idea was already tossed around a few times, but I couldn't > > find any downloadble outcome. > > Zope has a facility that I've been meaning to make more > generally available but haven't had time to. :/ > In my case, I wanted to be able to convert to/from binary > pickles and xml, so I had an intern write something that > works from pickles, rather than from objects. It can be used > to look at existing pickles and can be used, in conjunction with > pickle or cPickle to convert objects to and from XML. > > If your interested, let me know and I'll provide more details. > > > I've looked at pickle.py a bit and realized that the extensible > > nature of the pickle mechanism would probably cause trouble > > because the DTD would have to be generated as well (not a good > > idea). > > Why would a DTD have to be generated? > > > There are two alternatives to this though: > > > > 1. add an element which handles all non-core Python object > > types (the ones registered through copy_reg) > > > > 2. use an abstract DTD altogheter > > > > Example for 1: > > > > > > > > abcdef > > > > 10 > > abc > > > > > > value > > > > > > > > 2000 > > 8 > > 6 > > > > > > > > > > This is the route I took. Here's an example that's > probably alot bigger than you want.... > > > > > title > > > > raw > > \n >

\n > \n >

\n > This is the Document \n > in the Folder.\n >

\n > > > ]]>
>
> > __ac_local_roles__ > > > > jim > > > Owner > > > > > > > > globals > > > > > > __name__ > m2 > > > _vars > > > > >
>
> > > Note that this is pretty much a straight translation of > the Python pickle "schema". :) Note the id attributes > and reference tags, which allow cyclical data structures. > (I recently discovered that there is a problem with my id > values. Does anyone know what it is? ;) > > One other note. I found the XML spec to be a little > ambigouos (or maybe I'm just too dense) wrt binary data > and newlines, so I decided to punt and escape newlines and > binary data. I encode strings as either "repr" which is a > repr like encoding that escapes things in a way that is > just a tad more terse than repr. I switch to base64 when > the escaping penalty exceeds 40%. Since alot of our pickles > have marked up text, I automatically use CDATA sections when > I can and where it would help. See the example above. > > I really need to write down a DTD for this...... > > Jim > > -- > Jim Fulton mailto:jim@digicool.com Python Powered! > Technical Director (888) 344-4332 http://www.python.org > Digital Creations http://www.digicool.com http://www.zope.org > > Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email > address may not be added to any commercial mail list with out my > permission. Violation of my privacy with advertising or SPAM will > result in a suit for a MINIMUM of $500 damages/incident, $1500 for > repeats. > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig > From travish@realtime.net Tue Aug 8 00:52:44 2000 From: travish@realtime.net (travish) Date: Mon, 7 Aug 2000 18:52:44 -0500 (CDT) Subject: [XML-SIG] parsers and XML Message-ID: <200008072352.SAA98730@sullivan.realtime.net> Hi... I was taking a look at some of the docs, code, and examples, and was a bit surprised about a number of things. Below are some comments, problems, diffs, etc. You may already know some of this. a) most of the XML "parsers" act appear to be lexers b) none of the examples are of sufficient/substantial complexity (e.g. recursive nesting, deep/complex hierarchy) If anyone has suggestions on what kind of parser to use as a back end (yapps? kjParsing? etc.) I'd be interested to hear it. c) SGMLOP's description is substantially misleading: http://www.garshol.priv.no/download/xmltools/prod/sgmlop.html sgmlop is meant to behave identically with the sgmllib and xmllib modules and replace them invisibly if it is present, so that one does not have to change any code to use them. The saxlib package has a SAX 1.0 driver for sgmlop. This does not appear to be correct. See diffs below. In addition, it passes in a complete string of text rather than string, offset, length to the character data callback. You may thank Zope for the info leading to the working SGMLOP driver. d) xmltok: no driver drv_xmltok e) XMLtoolkit: no module named XMLFactory f) xmldc: no module named xml_dc g) Relative speeds on my hardware: sgmlop (C module) 4.15 pyexpat (C module) 2.60 xmlproc (python) 1.21 xmllib (python) 1.00 Here you go: --- drv_pyexpat.py.orig Mon Jul 17 11:17:56 2000 +++ drv_pyexpat.py Mon Jul 17 12:39:25 2000 @@ -34,6 +34,7 @@ self.parser.EndElementHandler = self.endElement self.parser.CharacterDataHandler = self.characters self.parser.ProcessingInstructionHandler = self.processingInstruction + self.unfed_so_far = 1 def startElement(self,name,attrs): # Backward compatibility code, for older versions of the @@ -118,12 +119,18 @@ self.parser.EndElementHandler = self.endElement self.parser.CharacterDataHandler = self.characters self.parser.ProcessingInstructionHandler = self.processingInstruction + self.unfed_so_far = 1 def feed(self,data): + if self.unfed_so_far: + self.doc_handler.startDocument() + self.unfed_so_far = 0 + if not self.parser.Parse(data): self.__report_error() def close(self): + self.doc_handler.endDocument() if not self.parser.Parse("",1): self.__report_error() self.parser = None --- drv_sgmlop.py.orig Mon Jul 17 12:57:21 2000 +++ drv_sgmlop.py Mon Jul 17 16:39:39 2000 @@ -7,28 +7,20 @@ from xml.parsers import sgmlop from xml.sax import saxlib,saxutils import urllib - -class DHWrapper: - - def __init__(self,real_dh): - self.real_dh=real_dh - - def __getattr__(self,attr): - return getattr(self.real_dh,attr) - - def startElement(self,name,attrs): - self.real_dh.startElement(name,saxutils.AttributeMap(attrs)) class Parser(saxlib.Parser): def __init__(self): saxlib.Parser.__init__(self) self.parser = sgmlop.XMLParser() + self.unfed_so_far = 1 - def setDocumentHandler(self, dh): - #self.parser.register(DHWrapper(dh), 1) - #self.parser.register(DHWrapper(dh)) - self.parser.register(dh) + def setDocumentHandler(self, dh): + # setup callbacks + self.finish_starttag = dh.startElement + self.finish_endtag = dh.endElement + self.handle_data = dh.data + self.parser.register(self) self.doc_handler=dh def parse(self, url): @@ -64,8 +56,13 @@ def reset(self): self.parser=sgmlop.XMLParser() + self.unfed_so_far = 1 def feed(self,data): + if self.unfed_so_far: + self.doc_handler.startDocument() + self.unfed_so_far = 0 + self.parser.feed(data) def close(self): From uogbuji@fourthought.com Tue Aug 8 03:06:44 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 07 Aug 2000 20:06:44 -0600 Subject: [XML-SIG] XPath and namespaces In-Reply-To: Message from Alexandre Fayolle of "Mon, 07 Aug 2000 14:10:10 +0200." <965650210.398ea722258fc@imp.free.fr> Message-ID: <200008080206.UAA00922@localhost.localdomain> > What is the syntax to get a namespaced node using xpath. > > Eg. > > > > > > With 4XPath, I tried "some-node/transform" and "some-node/xsl:transform" both > returned an empty list. "some-node/xsl:transform". But if you're using 4XPath outside of 4XSLT, you'll have to provide the prefix -> namespace mapping in the context, for example my_xpath_expr = xpath.Compile('some-node/xsl:transform') ctx = xpath.Context.Context(doc, 1, 1, processorNss={'xsl': 'http://www.w3.org/1999/XSL/Transform'}) result_node_set = my_xpath_expr.evaluate(ctx) -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From rob@hooft.net Tue Aug 8 07:18:47 2000 From: rob@hooft.net (Rob W. W. Hooft) Date: Tue, 8 Aug 2000 08:18:47 +0200 (CEST) Subject: [XML-SIG] xmlpickle.py ?! In-Reply-To: <398EBE78.A01DE1D6@lemburg.com> References: <398EBE78.A01DE1D6@lemburg.com> Message-ID: <14735.42567.691887.481072@temoleh.chem.uu.nl> >>>>> "M-L" == M -A Lemburg writes: M-L> BTW, I'm very new to XML... what's the general rule in XML on M-L> where to put the object value ? ... into an attribute or the tag M-L> content ? The subject of many heated debates. The rule I like is if you have a piece of data that is never going to contain any structure, you can make it an attribute. I happen to use attributes a lot for numeric values. I would personally never use 20, since you might need more structure later. It is very difficult to add more sub-elements once there is character data (DTD issue). You can use 20 to be more extensible (e.g. 20 10 30 %2d; who knows when we'll have structured integers or subclassed integers like that....) Rob -- ===== rob@hooft.net http://www.hooft.net/people/rob/ ===== ===== R&D, Nonius BV, Delft http://www.nonius.nl/ ===== ===== PGPid 0xFA19277D ========================== Use Linux! ========= From mal@lemburg.com Tue Aug 8 09:21:08 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 08 Aug 2000 10:21:08 +0200 Subject: [XML-SIG] xmlpickle.py ?! References: <398EBE78.A01DE1D6@lemburg.com> <398ED79B.F6E49C65@FourThought.com> Message-ID: <398FC2F4.1DB5EDE4@lemburg.com> Mike Olson wrote: > > "M.-A. Lemburg" wrote: > > > > You may want to look at Zope in Zope/lib/python/ZODB/ImportExport.py. > they do some XML pickling here. However, I think they call back to each > object for help in the pickle processes (each object writes thier own > chunk of XML). Though I'm not 100% sure. I've had a look at that code, but it doesn't seem to take the same direction as I intend: they sort of convert Python pickles into something readable by XML keeping e.g. the encodings used by pickle. I would like to make the xmlpickle have some truly editable format, e.g. objects are converted to string representations which do their best at not loosing precision while still using a common format (repr() does wonders here for Python's basic types). BTW, I only have a vague idea about what Xpath et al. do except that they are intended to address certain parts in an XML file. Is there anything to watch out for when designing a DTD in order to make addressability simple with Xpath ? Ideally the xmlpickle data should be addressable using standard Python notations, e.g. a.b, a['b'] and a[0]. Which of the two possibilities I posted would fit this model w/r to Xpath ? Thanks, -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Tue Aug 8 09:56:09 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 08 Aug 2000 10:56:09 +0200 Subject: Fwd: [XML-SIG] xmlpickle.py ?! References: <00080711113300.02547@quadra.teleo.net> <398F025C.A9726E23@digicool.com> Message-ID: <398FCB29.991FC2F6@lemburg.com> Jim Fulton wrote: > > I don't normally have time to follow the xml sig. > Someone kindly forwarded Marc-Andre's note to me. > I haven't seen the rest of this thread. > The thread is just starting... thanks for chiming in. > "M.-A. Lemburg" wrote: > > > > I'm currently looking into writing a xmlpickle.py module > > with the intent to be able to pickle (and unpickle) arbitrary > > Python objects in a way that makes the objects editable through > > a XML editor or convertible to some other format using the > > existing XML tools. > > I wonder whether a tool that generated XML for arbitrary Python > objects would really be that useful for transfer to > other applications. I suspect not. I'm not sure either, but given that XML is becoming an industry standard and that more and more tools are becoming available, I have a feeling that xmlpickle is a good idea in the sense of making Python buzz word forward compatible ;-) Of course, a third party tool won't be able to handle arbitrary Python pickles, but for a quick transfer of object data or together with a semantic style sheet xmlpickles should make a good inter-application transport encoding between closely related software, e.g. Python on one side, C++ on the other. I wonder how well SOAP would handle pickling arbitrary Python objects... > > After looking at the archives of this SIG, I found that the > > idea was already tossed around a few times, but I couldn't > > find any downloadble outcome. > > Zope has a facility that I've been meaning to make more > generally available but haven't had time to. :/ > In my case, I wanted to be able to convert to/from binary > pickles and xml, so I had an intern write something that > works from pickles, rather than from objects. It can be used > to look at existing pickles and can be used, in conjunction with > pickle or cPickle to convert objects to and from XML. > > If your interested, let me know and I'll provide more details. I've had a look at ppml.py in Zope, but didn't really grok the idea behind it -- it's completely undocumented and contains some really weird callbacks :-/ My general idea for xmlpickle is to come up with a format that is human readable and editable, i.e. literal representations should be used in favour of binary ones (size is not a problem; speed can later be added via a C extension). > > I've looked at pickle.py a bit and realized that the extensible > > nature of the pickle mechanism would probably cause trouble > > because the DTD would have to be generated as well (not a good > > idea). > > Why would a DTD have to be generated? If you take the first path (see below; one element per pickle'able type), then you'd have to regenerate the DTD in case new types were registered through copy_reg. > > There are two alternatives to this though: > > > > 1. add an element which handles all non-core Python object > > types (the ones registered through copy_reg) > > > > 2. use an abstract DTD altogheter > > > > Example for 1: > > > > > > > > abcdef > > > > 10 > > abc > > > > > > value > > > > > > > > 2000 > > 8 > > 6 > > > > > > > > > > This is the route I took. Here's an example that's > probably alot bigger than you want.... > > > > > title > > > > raw > > \n >

\n > \n >

\n > This is the Document \n > in the Folder.\n >

\n > > > ]]>
>
> > __ac_local_roles__ > > > > jim > > > Owner > > > > > > > > globals > > > > > > __name__ > m2 > > > _vars > > > > >
>
> > Note that this is pretty much a straight translation of > the Python pickle "schema". :) This looks pretty much like what I had in mind (this is what ppml.py generates, right ?). The only part I don't like about ppml.py's approach is that it pickles e.g. integers to a binary format. > Note the id attributes > and reference tags, which allow cyclical data structures. Way cool, yes :-) > (I recently discovered that there is a problem with my id > values. Does anyone know what it is? ;) > > One other note. I found the XML spec to be a little > ambigouos (or maybe I'm just too dense) wrt binary data > and newlines, so I decided to punt and escape newlines and > binary data. I encode strings as either "repr" which is a > repr like encoding that escapes things in a way that is > just a tad more terse than repr. I switch to base64 when > the escaping penalty exceeds 40%. I don't really care about size... my goal is keeping data editable and human readable -- this also makes writing backends in other languages a lot easier. > Since alot of our pickles > have marked up text, I automatically use CDATA sections when > I can and where it would help. See the example above. How robust is this CDATA wrapping ? What if the data itself is XML and contains a CDATA section ? > I really need to write down a DTD for this...... You should :-) Thanks, -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Tue Aug 8 10:07:34 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 08 Aug 2000 11:07:34 +0200 Subject: [XML-SIG] xmlpickle.py ?! References: <398EBE78.A01DE1D6@lemburg.com> <14735.42567.691887.481072@temoleh.chem.uu.nl> Message-ID: <398FCDD6.740477C6@lemburg.com> "Rob W. W. Hooft" wrote: > > >>>>> "M-L" == M -A Lemburg writes: > > M-L> BTW, I'm very new to XML... what's the general rule in XML on > M-L> where to put the object value ? ... into an attribute or the tag > M-L> content ? > > The subject of many heated debates. The rule I like is if you have a > piece of data that is never going to contain any structure, you can > make it an attribute. I happen to use attributes a lot for numeric > values. > > I would personally never use 20, since > you might need more structure later. It is very difficult to add more > sub-elements once there is character data (DTD issue). You can use > 20 to be more > extensible (e.g. 20 > 10 30 > %2d; who knows when we'll have structured > integers or subclassed integers like that....) Good point (even though I'd put this meta data into that Object tag as attribute :-). Thanks, -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jim@digicool.com Tue Aug 8 12:52:08 2000 From: jim@digicool.com (Jim Fulton) Date: Tue, 08 Aug 2000 07:52:08 -0400 Subject: Fwd: [XML-SIG] xmlpickle.py ?! References: <00080711113300.02547@quadra.teleo.net> <398F025C.A9726E23@digicool.com> <398FCB29.991FC2F6@lemburg.com> Message-ID: <398FF468.DC1B83C9@digicool.com> "M.-A. Lemburg" wrote: > > Jim Fulton wrote: > > > > I don't normally have time to follow the xml sig. > > Someone kindly forwarded Marc-Andre's note to me. > > I haven't seen the rest of this thread. > > > > The thread is just starting... thanks for chiming in. > > > "M.-A. Lemburg" wrote: > > > > > > I'm currently looking into writing a xmlpickle.py module > > > with the intent to be able to pickle (and unpickle) arbitrary > > > Python objects in a way that makes the objects editable through > > > a XML editor or convertible to some other format using the > > > existing XML tools. > > > > I wonder whether a tool that generated XML for arbitrary Python > > objects would really be that useful for transfer to > > other applications. I suspect not. > > I'm not sure either, but given that XML is becoming an > industry standard and that more and more tools are becoming > available, I have a feeling that xmlpickle is a good > idea in the sense of making Python buzz word forward > compatible ;-) I'm not saying that an XML pickle variant isn't useful, but that it's not going to be very useful for interoperability. I think that for interoperability, application-specific XML formats that don't need to be as complete as pickle are more useful. > Of course, a third party tool won't be able to handle arbitrary > Python pickles, but for a quick transfer of object data or > together with a semantic style sheet xmlpickles should make > a good inter-application transport encoding between closely > related software, e.g. Python on one side, C++ on the other. > > I wonder how well SOAP would handle pickling arbitrary > Python objects... In particular, I wonder if it tries to be complete. I haven't really looked at SOAP lately. In my experience, RPC mechanisms don't really need or try to handle arbitrarily complex objects. OTOH, lots of applications don't need complete transfer. > > > After looking at the archives of this SIG, I found that the > > > idea was already tossed around a few times, but I couldn't > > > find any downloadble outcome. > > > > Zope has a facility that I've been meaning to make more > > generally available but haven't had time to. :/ > > In my case, I wanted to be able to convert to/from binary > > pickles and xml, so I had an intern write something that > > works from pickles, rather than from objects. It can be used > > to look at existing pickles and can be used, in conjunction with > > pickle or cPickle to convert objects to and from XML. > > > > If your interested, let me know and I'll provide more details. > > I've had a look at ppml.py in Zope, but didn't really > grok the idea behind it -- it's completely undocumented > and contains some really weird callbacks :-/ Yes, well, if your interested in pusuing it, I'll provide more info. > My general idea for xmlpickle is to come up with a format that > is human readable and editable, i.e. literal representations > should be used in favour of binary ones (size is not a problem; > speed can later be added via a C extension). OK. Obviously, a gif image needs to be encoded. We could certainly modify the algorithm that decides between repr and base64 to give more prference to repr. > > > I've looked at pickle.py a bit and realized that the extensible > > > nature of the pickle mechanism would probably cause trouble > > > because the DTD would have to be generated as well (not a good > > > idea). > > > > Why would a DTD have to be generated? > > If you take the first path (see below; one element per pickle'able > type), then you'd have to regenerate the DTD in case new types > were registered through copy_reg. Yes, but why do you need t DTD. Lots of people don't seem to use DTDs and DTDs don't work very well with namsspaces. (snip) > > > > Note that this is pretty much a straight translation of > > the Python pickle "schema". :) > > This looks pretty much like what I had in mind (this is > what ppml.py generates, right ? Right. >). The only part I don't > like about ppml.py's approach is that it pickles e.g. > integers to a binary format. Nah: 123 > > Note the id attributes > > and reference tags, which allow cyclical data structures. > > Way cool, yes :-) > > > (I recently discovered that there is a problem with my id > > values. Does anyone know what it is? ;) > > > > One other note. I found the XML spec to be a little > > ambigouos (or maybe I'm just too dense) wrt binary data > > and newlines, so I decided to punt and escape newlines and > > binary data. I encode strings as either "repr" which is a > > repr like encoding that escapes things in a way that is > > just a tad more terse than repr. I switch to base64 when > > the escaping penalty exceeds 40%. > > I don't really care about size... my goal is keeping data editable > and human readable -- this also makes writing backends in > other languages a lot easier. So we could add some tuning to this. Note that the goal is not to reduce size, but to detect "binary" data. Python doesn't make a distinction between binary and text, but base64 is probably a much better way to encode truly binary data. > > Since alot of our pickles > > have marked up text, I automatically use CDATA sections when > > I can and where it would help. See the example above. > > How robust is this CDATA wrapping ? What if the data itself > is XML and contains a CDATA section ? Then it's not used. We will only use CDATA if we can. Jim From tpassin@home.com Tue Aug 8 13:09:19 2000 From: tpassin@home.com (tpassin@home.com) Date: Tue, 8 Aug 2000 08:09:19 -0400 Subject: Fwd: [XML-SIG] xmlpickle.py ?! References: <00080711113300.02547@quadra.teleo.net> <398F025C.A9726E23@digicool.com> <398FCB29.991FC2F6@lemburg.com> Message-ID: <003901c00131$7b2f5a20$7cac1218@reston1.va.home.com> M.-A. Lemburg asked - > ... > How robust is this CDATA wrapping ? What if the data itself > is XML and contains a CDATA section ? > ... You can't nest CDATA sections - see sec. 2.7 of the XML Recommendation. CDATA can't contain the ']]>' sequence - it always denotes the end of the CDATA section, hence no nesting. Tom Passin From larsga@garshol.priv.no Tue Aug 8 13:05:52 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 08 Aug 2000 14:05:52 +0200 Subject: [XML-SIG] parsers and XML In-Reply-To: <200008072352.SAA98730@sullivan.realtime.net> References: <200008072352.SAA98730@sullivan.realtime.net> Message-ID: * travish@realtime.net | | a) most of the XML "parsers" act appear to be lexers You mean, since they don't build complete document trees? This is so because XML has a much simpler structure (and potentially much greater sizes) than what parsers traditionally have parsed. This makes an event-based API very useful. In Python we have so far chosen to make tree building separate utilities. If you want a document tree, look at 4DOM or qp_xml. | b) none of the examples are of sufficient/substantial complexity | (e.g. recursive nesting, deep/complex hierarchy) | | If anyone has suggestions on what kind of parser to use as a back | end (yapps? kjParsing? etc.) I'd be interested to hear it. I don't understand this question. | c) SGMLOP's description is substantially misleading: | | http://www.garshol.priv.no/download/xmltools/prod/sgmlop.html | | sgmlop is meant to behave identically with the sgmllib and xmllib | modules and replace them invisibly if it is present, so that one | does not have to change any code to use them. The saxlib package | has a SAX 1.0 driver for sgmlop. | | This does not appear to be correct. See diffs below. The diffs seem to be for the pyexpat driver. This has nothing to do with sgmlop or xmllib. What is the problem with the description? | d) xmltok: no driver drv_xmltok | e) XMLtoolkit: no module named XMLFactory | f) xmldc: no module named xml_dc If you don't have the parsers installed, the drivers won't work. :) | g) Relative speeds on my hardware: | sgmlop (C module) 4.15 | pyexpat (C module) 2.60 | xmlproc (python) 1.21 | xmllib (python) 1.00 Relative speed depends quite a bit on the document being parsed. Also, the speed difference when using sgmlop is probably greater when you don't use SAX. --Lars M. From mal@lemburg.com Tue Aug 8 13:38:12 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 08 Aug 2000 14:38:12 +0200 Subject: Fwd: [XML-SIG] xmlpickle.py ?! References: <00080711113300.02547@quadra.teleo.net> <398F025C.A9726E23@digicool.com> <398FCB29.991FC2F6@lemburg.com> <398FF468.DC1B83C9@digicool.com> Message-ID: <398FFF34.C9863257@lemburg.com> Jim Fulton wrote: > > "M.-A. Lemburg" wrote: > > I wonder how well SOAP would handle pickling arbitrary > > Python objects... > > In particular, I wonder if it tries to be complete. I haven't > really looked at SOAP lately. In my experience, RPC mechanisms > don't really need or try to handle arbitrarily complex objects. > OTOH, lots of applications don't need complete transfer. True, but instead of "rolling your own" every time, I think a simple extensible definition would help. E.g. we could add a callback mechanism (__xml__ method) to aid in converting objects into a certain object defined way. > > I've had a look at ppml.py in Zope, but didn't really > > grok the idea behind it -- it's completely undocumented > > and contains some really weird callbacks :-/ > > Yes, well, if your interested in pusuing it, I'll provide more > info. I'm not sure whether I'll use ppml.py as template or just as source of ideas. The design doesn't look clear to me, e.g. it seems as if you are first pickling an object using the standard Python pickle mechanism and the pass the pickled string to the XML converter. Is that so ? ...I think I'd rather like to go the direct way. Hmm, I should really try to get the module run outside of Zope... for now I've just been looking at the code. > > My general idea for xmlpickle is to come up with a format that > > is human readable and editable, i.e. literal representations > > should be used in favour of binary ones (size is not a problem; > > speed can later be added via a C extension). > > OK. Obviously, a gif image needs to be encoded. We could certainly > modify the algorithm that decides between repr and base64 to give > more prference to repr. I'd say that any string containing at least one \000 character should be considered binary (and encoded in base64 or some other standard format). This should get all typical text strings into readable format. > > > > I've looked at pickle.py a bit and realized that the extensible > > > > nature of the pickle mechanism would probably cause trouble > > > > because the DTD would have to be generated as well (not a good > > > > idea). > > > > > > Why would a DTD have to be generated? > > > > If you take the first path (see below; one element per pickle'able > > type), then you'd have to regenerate the DTD in case new types > > were registered through copy_reg. > > Yes, but why do you need t DTD. Lots of people don't > seem to use DTDs and DTDs don't work very well with namsspaces. Good question ;-) I just thought that having a DTD around would be good to validate input data and perhaps help the XML editor. > >). The only part I don't > > like about ppml.py's approach is that it pickles e.g. > > integers to a binary format. > > Nah: > > 123 Ah, I was seeing all these binary formatting APIs in ppml.py especially for 64-bit ints. Looks as if these are not used anywhere in the code though... > > > Note the id attributes > > > and reference tags, which allow cyclical data structures. > > > > Way cool, yes :-) > > > > > (I recently discovered that there is a problem with my id > > > values. Does anyone know what it is? ;) > > > > > > One other note. I found the XML spec to be a little > > > ambigouos (or maybe I'm just too dense) wrt binary data > > > and newlines, so I decided to punt and escape newlines and > > > binary data. I encode strings as either "repr" which is a > > > repr like encoding that escapes things in a way that is > > > just a tad more terse than repr. I switch to base64 when > > > the escaping penalty exceeds 40%. > > > > I don't really care about size... my goal is keeping data editable > > and human readable -- this also makes writing backends in > > other languages a lot easier. > > So we could add some tuning to this. Note that the goal is not to > reduce size, but to detect "binary" data. Python doesn't make > a distinction between binary and text, but base64 is probably > a much better way to encode truly binary data. See above. I'd rather use the definition above for deciding on binary or not. > > > Since alot of our pickles > > > have marked up text, I automatically use CDATA sections when > > > I can and where it would help. See the example above. > > > > How robust is this CDATA wrapping ? What if the data itself > > is XML and contains a CDATA section ? > > Then it's not used. We will only use CDATA if we can. Ok. Thanks for the feedback, -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Mike.Olson@fourthought.com Tue Aug 8 14:41:17 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Tue, 08 Aug 2000 07:41:17 -0600 Subject: [XML-SIG] xmlpickle.py ?! References: <398EBE78.A01DE1D6@lemburg.com> <398ED79B.F6E49C65@FourThought.com> <398FC2F4.1DB5EDE4@lemburg.com> Message-ID: <39900DFD.9E469F02@FourThought.com> "M.-A. Lemburg" wrote: > > Mike Olson wrote: > > > > "M.-A. Lemburg" wrote: > > > > > > > You may want to look at Zope in Zope/lib/python/ZODB/ImportExport.py. > > they do some XML pickling here. However, I think they call back to each > > object for help in the pickle processes (each object writes thier own > > chunk of XML). Though I'm not 100% sure. > > > BTW, I only have a vague idea about what Xpath et al. do > except that they are intended to address certain parts in > an XML file. Is there anything to watch out for when designing > a DTD in order to make addressability simple with Xpath ? > > Ideally the xmlpickle data should be addressable using > standard Python notations, e.g. a.b, a['b'] and a[0]. > > Which of the two possibilities I posted would fit this model > w/r to Xpath ? No matter how you do it, XPath won't look like python. The "." is not valid in XPath. Attributes are accessed as with the attribute:: axis (or @). Ex. 1234 S West Way To get Mike /Employees/Employee[@name="Mike"] To get Mikes Address with employee mike as the context: Address[0] Mike > > Thanks, > -- > Marc-Andre Lemburg > ______________________________________________________________________ > Business: http://www.lemburg.com/ > Python Pages: http://www.lemburg.com/python/ > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From jim@digicool.com Tue Aug 8 14:58:10 2000 From: jim@digicool.com (Jim Fulton) Date: Tue, 08 Aug 2000 09:58:10 -0400 Subject: Fwd: [XML-SIG] xmlpickle.py ?! References: <00080711113300.02547@quadra.teleo.net> <398F025C.A9726E23@digicool.com> <398FCB29.991FC2F6@lemburg.com> <398FF468.DC1B83C9@digicool.com> <398FFF34.C9863257@lemburg.com> Message-ID: <399011F2.CB5850E4@digicool.com> "M.-A. Lemburg" wrote: > > Jim Fulton wrote: > > (snip) > The design doesn't look clear to me, e.g. it seems as if you > are first pickling an object using the standard Python pickle > mechanism and the pass the pickled string to the XML converter. > Is that so ? Yes. > ...I think I'd rather like to go the direct way. Why? Why is this a goal? By leveraging pickle or cPickle, you can let them do alot of heavy lifting, rather than starting from scratch. In particular, I'd bet $.05 that cPickle+ppml is faster than a "direct" solution. It *was* a goal for me to work from pickles because it allowed me to do database binary<->xml conversion *without* creating objects in memory or even having to have the classes around. This is handy in alot of cases, for example, for ZODB database management and making it easy for humans to read pickles. I don't see the harm in separating the pickling and xml steps. > Hmm, I should really try to get the module run outside of > Zope... for now I've just been looking at the code. It needs to be modified to work with SAX. ppml works with xyap which could easily be modified to work with a sax parser. (snip) > > > like about ppml.py's approach is that it pickles e.g. > > > integers to a binary format. > > > > Nah: > > > > 123 > > Ah, I was seeing all these binary formatting APIs in ppml.py > especially for 64-bit ints. Looks as if these are not used anywhere > in the code though... ppml can convert to and from binary pickles. Jim From Fredrik Lundh" <398F025C.A9726E23@digicool.com> <398FCB29.991FC2F6@lemburg.com> <003901c00131$7b2f5a20$7cac1218@reston1.va.home.com> Message-ID: <008701c00158$ba3291c0$f2a6b5d4@hagrid> tom passin wrote: > > ... > > How robust is this CDATA wrapping ? What if the data itself > > is XML and contains a CDATA section ? > > ... >=20 > You can't nest CDATA sections - see sec. 2.7 of the XML = Recommendation. > CDATA can't contain the ']]>' sequence - it always denotes the end of = the > CDATA section, hence no nesting. CDATA sections cannot nest, but that doesn't mean that you cannot store ]]> as CDATA: output =3D string.replace(data, "]]>", "]]]>") From Fredrik Lundh" Message-ID: <000e01c00161$dd232f60$f2a6b5d4@hagrid> lars marius wrote: > | g) Relative speeds on my hardware: > | sgmlop (C module) 4.15 > | pyexpat (C module) 2.60 > | xmlproc (python) 1.21 > | xmllib (python) 1.00 >=20 > Relative speed depends quite a bit on the document being parsed. > Also, the speed difference when using sgmlop is probably greater when > you don't use SAX. fwiw, xmllib is much faster in 1.6b1: xmllib 1.5.2: 1.00 xmllib 1.6b1: 2.45 (relative speeds) ::: the SREX shallow parser still outperforms anything else, of course: srex 1.6b1: 108.48 (!) From mal@lemburg.com Tue Aug 8 21:42:16 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 08 Aug 2000 22:42:16 +0200 Subject: [XML-SIG] xmlpickle.py ?! References: <398EBE78.A01DE1D6@lemburg.com> <398ED79B.F6E49C65@FourThought.com> <398FC2F4.1DB5EDE4@lemburg.com> <39900DFD.9E469F02@FourThought.com> Message-ID: <399070A8.6CF232CB@lemburg.com> Mike Olson wrote: > > "M.-A. Lemburg" wrote: > > > > Mike Olson wrote: > > > > > > "M.-A. Lemburg" wrote: > > > > > > > > > > You may want to look at Zope in Zope/lib/python/ZODB/ImportExport.py. > > > they do some XML pickling here. However, I think they call back to each > > > object for help in the pickle processes (each object writes thier own > > > chunk of XML). Though I'm not 100% sure. > > > > > > BTW, I only have a vague idea about what Xpath et al. do > > except that they are intended to address certain parts in > > an XML file. Is there anything to watch out for when designing > > a DTD in order to make addressability simple with Xpath ? > > > > Ideally the xmlpickle data should be addressable using > > standard Python notations, e.g. a.b, a['b'] and a[0]. > > > > Which of the two possibilities I posted would fit this model > > w/r to Xpath ? > > No matter how you do it, XPath won't look like python. The "." is not > valid in XPath. Attributes are accessed as with the attribute:: axis > (or @). Ex. > > > > 1234 S West Way > > > > To get Mike > /Employees/Employee[@name="Mike"] > To get Mikes Address with employee mike as the context: > Address[0] Hmm, looks like it would be more useful to map object names to element names in this case... it doesn't really make sense to access information on a type basis, e.g. /dictionary/item[@key="Mike"]. But then, structure is given by type, not object name. Oh well :-/ BTW, how would one access "Mike" in this XML file without reverting to positional indexing ? Mike
1234 Main Street
It seems that these XPath lookups have to be context senstive... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Tue Aug 8 21:52:52 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 08 Aug 2000 22:52:52 +0200 Subject: Fwd: [XML-SIG] xmlpickle.py ?! References: <00080711113300.02547@quadra.teleo.net> <398F025C.A9726E23@digicool.com> <398FCB29.991FC2F6@lemburg.com> <398FF468.DC1B83C9@digicool.com> <398FFF34.C9863257@lemburg.com> <399011F2.CB5850E4@digicool.com> Message-ID: <39907324.ADE579EE@lemburg.com> Jim Fulton wrote: > > "M.-A. Lemburg" wrote: > > > > Jim Fulton wrote: > > > > (snip) > > The design doesn't look clear to me, e.g. it seems as if you > > are first pickling an object using the standard Python pickle > > mechanism and the pass the pickled string to the XML converter. > > Is that so ? > > Yes. > > > ...I think I'd rather like to go the direct way. > > Why? Why is this a goal? By leveraging pickle or cPickle, > you can let them do alot of heavy lifting, rather than starting > from scratch. In particular, I'd bet $.05 that cPickle+ppml > is faster than a "direct" solution. Probably... speed's currently not imporant for my application (it will use XML for structure management purposes rather than actual content storage). > It *was* a goal for me to work from pickles because it allowed > me to do database binary<->xml conversion *without* creating > objects in memory or even having to have the classes around. > This is handy in alot of cases, for example, for ZODB database > management and making it easy for humans to read pickles. > I don't see the harm in separating the pickling and > xml steps. That's a valid argument. My long-term thinking behind this is to store object information in an XML-aware database or database front-end and then accessing it via XPath, so there wouldn't be any conversion to and from pickles. > > Hmm, I should really try to get the module run outside of > > Zope... for now I've just been looking at the code. > > It needs to be modified to work with SAX. ppml works with xyap > which could easily be modified to work with a sax parser. What is xyap ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From uche.ogbuji@fourthought.com Wed Aug 9 04:15:11 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Tue, 08 Aug 2000 21:15:11 -0600 Subject: [XML-SIG] xmlpickle.py ?! In-Reply-To: Message from "M.-A. Lemburg" of "Tue, 08 Aug 2000 22:42:16 +0200." <399070A8.6CF232CB@lemburg.com> Message-ID: <200008090315.VAA03715@localhost.localdomain> > BTW, how would one access "Mike" in this XML file without reverting to > positional indexing ? > > > > Mike >
1234 Main Street
>
>
Hmm? No positional indexing neede, I'd think: xml.xpath.Evaluate('string(/dictionary/item/key)') would return "Mike". Maybe I misunderstand you. > It seems that these XPath lookups have to be context senstive... They do, but how does that imply "positional indexing"? -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From mal@lemburg.com Wed Aug 9 09:01:24 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 09 Aug 2000 10:01:24 +0200 Subject: [XML-SIG] xmlpickle.py ?! References: <200008090315.VAA03715@localhost.localdomain> Message-ID: <39910FD4.8BDD2DA@lemburg.com> uche.ogbuji@fourthought.com wrote: > > > BTW, how would one access "Mike" in this XML file without reverting to > > positional indexing ? > > > > > > > > Mike > >
1234 Main Street
> >
> >
> > Hmm? No positional indexing neede, I'd think: > > xml.xpath.Evaluate('string(/dictionary/item/key)') > > would return "Mike". > > Maybe I misunderstand you. What happens if your dictionary has 100 entries and you want to lookup "Mike" (which is stored as content of key) ? And once you've found it, how would you get at the corresponding value ? Sorry for my ignorance shining through here. I should really read the XPath specs... ALAS, no time for this now. > > It seems that these XPath lookups have to be context senstive... > > They do, but how does that imply "positional indexing"? Well, I guess sometimes you'd have to look ahead (in terms of structure depth) to find the right tag and then back out again to read the container as a whole, e.g. in the above case the item. Thanks, -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Wed Aug 9 09:06:20 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 09 Aug 2000 10:06:20 +0200 Subject: [XML-SIG] xmlpickle.py ?! References: <398EBE78.A01DE1D6@lemburg.com> <398ED79B.F6E49C65@FourThought.com> <398FC2F4.1DB5EDE4@lemburg.com> <39900DFD.9E469F02@FourThought.com> <399070A8.6CF232CB@lemburg.com> <39908DEA.D5437FCE@FourThought.com> Message-ID: <399110FC.69D9B46C@lemburg.com> Mike Olson wrote: > > "M.-A. Lemburg" wrote: > > > > Mike Olson wrote: > > > > > > > > > No matter how you do it, XPath won't look like python. The "." is not > > > valid in XPath. Attributes are accessed as with the attribute:: axis > > > (or @). Ex. > > > > > > > > > > > > 1234 S West Way > > > > > > > > > > > > To get Mike > > > /Employees/Employee[@name="Mike"] > > > To get Mikes Address with employee mike as the context: > > > Address[0] > > > > Hmm, looks like it would be more useful to map object names > > to element names in this case... it doesn't really make sense > > to access information on a type basis, e.g. > > /dictionary/item[@key="Mike"]. But then, structure is given by > > type, not object name. Oh well :-/ > > I agree > > > BTW, how would one access "Mike" in this XML file without reverting to > > positional indexing ? > > > > > > > > Mike > >
1234 Main Street
> >
> >
> > > > It seems that these XPath lookups have to be context senstive... > > Not always. You can have relative or absolute paths. You can also have > paths that don't care about position. Here are three examples. > > #1 Assume you have an unknown context with the exact document use: > /dictionary/item/key[text()='Mike'] > #2 Assume you have a context, lets say at item > key[text()='Mike'] > #3 Assume there are many dictionary objects in your document, all > scattered at different levels and you want to get all of the keys names > mike > //dictionary/key[text()='Mike] The last one looks like a great way to search an XML file. I suppose you can then use the looked up tag as context, right ? In that case finding the item containing "Mike" as key wouldn't be hard. Yah, I really should go and read the specs... Thanks, -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From andy@reportlab.com Wed Aug 9 12:24:07 2000 From: andy@reportlab.com (=?iso-8859-1?q?Andy=20Robinson?=) Date: Wed, 9 Aug 2000 04:24:07 -0700 (PDT) Subject: [XML-SIG] xmlpickle.py ?! Message-ID: <20000809112407.13199.qmail@web1605.mail.yahoo.com> I have a related requirement to Marc-Andre's, which is a need to map XML to (possibly pre-existing) Python object hierarchies and to slurp stuff in quickly. DOM helps, but even after making a DOM tree I still need to walk through it establishing the mapping. I've been considering a two-stage approach. The first stage is a bit like Greg Stein's qp_xml; it uses pyexpat, a Parser class whch constructs a tree, and a base Node class. However, you may pass in a "class map" to the parser. Thus, when it hits an tag, it looks for a class "Invoice" in the class map, then calls various methods on that class: xmlSetAttr(self, attr) xmlAddChild(self, childNode) xmlAddData(self, data) ... If the corresponding class is not found, it uses a plain old Node instance. This should let you push down the knowledge of how an attribute or a child is to be interpreted into the Python class itself. For output, Node provides... xmlWrite(self, output) ...which by default will write out attributes in __dict__, but can easily be overridden. (We could be more finegrained and have xmlGetAttrs(self) -> dict and xmlGetContent(self) -> list of elements. The method names would be chosen not to conflict with anything you are likely to use in your own classes, so it can be used as a mixin. We could also write a clever "dump" routine which would do something sensible with an arbitrary Python object, but would call the xmlWrite hook if it existed. Step Two is inspired by breeze (www.breezefactor.com): use a DTD or schema to generate the Python classes. Or perhaps go the other way, and use some Python data structure to generate a DTD. The xmlpickle suggestion interests me because I think we can integrate these approaches. We have a generic xmlpickle, but the provision for hooks to let classes specify how to serialize and deserialize themselves from XML. Does this make sense? Are there any other ideas around for ways to map XML to Python objects? Andy Robinson ===== Andy Robinson ReportLab, Inc. __________________________________________________ Do You Yahoo!? Kick off your party with Yahoo! Invites. http://invites.yahoo.com/ From jim@digicool.com Wed Aug 9 12:31:59 2000 From: jim@digicool.com (Jim Fulton) Date: Wed, 09 Aug 2000 07:31:59 -0400 Subject: Fwd: [XML-SIG] xmlpickle.py ?! References: <00080711113300.02547@quadra.teleo.net> <398F025C.A9726E23@digicool.com> <398FCB29.991FC2F6@lemburg.com> <398FF468.DC1B83C9@digicool.com> <398FFF34.C9863257@lemburg.com> <399011F2.CB5850E4@digicool.com> <39907324.ADE579EE@lemburg.com> Message-ID: <3991412F.C8BCAA@digicool.com> "M.-A. Lemburg" wrote: > > Jim Fulton wrote: > > > > "M.-A. Lemburg" wrote: > > > > > > Jim Fulton wrote: > > > > > > (snip) > > > The design doesn't look clear to me, e.g. it seems as if you > > > are first pickling an object using the standard Python pickle > > > mechanism and the pass the pickled string to the XML converter. > > > Is that so ? > > > > Yes. > > > > > ...I think I'd rather like to go the direct way. > > > > Why? Why is this a goal? By leveraging pickle or cPickle, > > you can let them do alot of heavy lifting, rather than starting > > from scratch. In particular, I'd bet $.05 that cPickle+ppml > > is faster than a "direct" solution. > > Probably... speed's currently not imporant for my > application (it will use XML for structure management purposes > rather than actual content storage). > > > It *was* a goal for me to work from pickles because it allowed > > me to do database binary<->xml conversion *without* creating > > objects in memory or even having to have the classes around. > > This is handy in alot of cases, for example, for ZODB database > > management and making it easy for humans to read pickles. > > I don't see the harm in separating the pickling and > > xml steps. > > That's a valid argument. > > My long-term thinking behind this is to > store object information in an XML-aware database or > database front-end and then accessing it via XPath, > so there wouldn't be any conversion to and from pickles. You should get involved in the Zope-XML effort then. This is exactly the sort of thing we are working toward. See http://www.zope.org/Wikis/zope-xml/FrontPage. > > > Hmm, I should really try to get the module run outside of > > > Zope... for now I've just been looking at the code. > > > > It needs to be modified to work with SAX. ppml works with xyap > > which could easily be modified to work with a sax parser. > > What is xyap ? Uh, it stands for XML yet another parser (framework). It is one of many attempts to provide a simply framework for parsing XML on top of a lowe-level interface such as SAX, xmllib, or expat. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From tpassin@home.com Wed Aug 9 13:06:39 2000 From: tpassin@home.com (tpassin@home.com) Date: Wed, 9 Aug 2000 08:06:39 -0400 Subject: Fwd: [XML-SIG] xmlpickle.py ?! References: <00080711113300.02547@quadra.teleo.net> <398F025C.A9726E23@digicool.com> <398FCB29.991FC2F6@lemburg.com> <003901c00131$7b2f5a20$7cac1218@reston1.va.home.com> <008701c00158$ba3291c0$f2a6b5d4@hagrid> Message-ID: <002901c001fa$46875560$7cac1218@reston1.va.home.com> Fredrik Lundh wrote this amazing hack - tom passin wrote: > > ... > > How robust is this CDATA wrapping ? What if the data itself > > is XML and contains a CDATA section ? > > ... > > You can't nest CDATA sections - see sec. 2.7 of the XML Recommendation. > CDATA can't contain the ']]>' sequence - it always denotes the end of the > CDATA section, hence no nesting. > CDATA sections cannot nest, but that doesn't mean that you > cannot store ]]> as CDATA: > output = string.replace(data, "]]>", "]]]>") Holy cow, /F! But did you really mean output = string.replace(data, "]]>", "]]]>") I could never write those really tricky batch files, either. Astoundedly-yours-for-sure, Tom Passin From Juergen Hermann" On Wed, 9 Aug 2000 08:06:39 -0400, tpassin@home.com wrote: >Fredrik Lundh wrote this amazing hack - >tom passin wrote: >> output =3D string.replace(data, "]]>", "]]]>") >Holy cow, /F! But did you really mean >output =3D string.replace(data, "]]>", "]]]>") No, he meant output =3D string.replace(data, "]]>", "]]>]") I did not check, but I'd be not surprised if ]] was not allowed, just like -- is not allowed in comments. Ciao, J=FCrgen -- J=FCrgen Hermann, Developer (jhe@webde-ag.de) WEB.DE AG, http://webde-ag.de/ From ken@bitsko.slc.ut.us Wed Aug 9 15:31:14 2000 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 09 Aug 2000 09:31:14 -0500 Subject: Fwd: [XML-SIG] xmlpickle.py ?! In-Reply-To: Jim Fulton's message of "Tue, 08 Aug 2000 07:52:08 -0400" References: <00080711113300.02547@quadra.teleo.net> <398F025C.A9726E23@digicool.com> <398FCB29.991FC2F6@lemburg.com> <398FF468.DC1B83C9@digicool.com> Message-ID: Jim Fulton writes: > "M.-A. Lemburg" wrote: > > I wonder how well SOAP would handle pickling arbitrary > > Python objects... > > In particular, I wonder if it tries to be complete. I haven't really > looked at SOAP lately. In my experience, RPC mechanisms don't really > need or try to handle arbitrarily complex objects. OTOH, lots of > applications don't need complete transfer. In SOAP v1.1, SOAP was logically split in to four components. The four are: message envelope, encoding, HTTP binding, and RPC headers. The part most important for pickling is Section 5, Encoding. SOAP has an extensible typing system, basically you supply some kind of interoperable type name for every object you want to send. "Interoperable" simply means some kind of mapping, say, between the Py module/class name and an XML one (it doesn't somehow magically make all language type systems compatible). Extremely complex object types require someone to sit down and write a model for how to map the object in to SOAP encoding rules, and then publish that so people with similar complex object implementations can share the mapping model. Unfortunately, Py's Mapping type falls in to the "complex object type" as far as the current SOAP spec goes. I don't recall if anyone has created a model for pickling mappings that allow keys of any type. (SOAP _does_ do "basic" mappings well, as long as the keys are strings [like object attributes] and the keys are valid XML element names.) -- Ken From robin@jessikat.fsnet.co.uk Wed Aug 9 15:46:40 2000 From: robin@jessikat.fsnet.co.uk (Robin Becker) Date: Wed, 9 Aug 2000 15:46:40 +0100 Subject: [XML-SIG] qp_xml Message-ID: Is qp_xml.py dead. I tried the version in 0.5.5 and it objects to the attrs list being passed by pyexpat to the start_element method. I guess either qp_xml has changed or pyexpat has. -- Robin Becker From Mike.Olson@fourthought.com Wed Aug 9 18:35:18 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Wed, 09 Aug 2000 11:35:18 -0600 Subject: [XML-SIG] xmlpickle.py ?! References: <200008090315.VAA03715@localhost.localdomain> <39910FD4.8BDD2DA@lemburg.com> Message-ID: <39919656.A4A94AFF@FourThought.com> "M.-A. Lemburg" wrote: > > uche.ogbuji@fourthought.com wrote: > > > > > BTW, how would one access "Mike" in this XML file without reverting to > > > positional indexing ? > > > > > > > > > > > > Mike > > >
1234 Main Street
> > >
> > >
> > > > Hmm? No positional indexing neede, I'd think: > > > > xml.xpath.Evaluate('string(/dictionary/item/key)') > > > > would return "Mike". > > > > Maybe I misunderstand you. > > What happens if your dictionary has 100 entries and you > want to lookup "Mike" (which is stored as content of key) ? > > And once you've found it, how would you get at the corresponding > value ? This will work string(/dictionary/item[key = 'Mike']/value) > > Sorry for my ignorance shining through here. I should > really read the XPath specs... ALAS, no time for this now. > > > > It seems that these XPath lookups have to be context senstive... > > > > They do, but how does that imply "positional indexing"? > > Well, I guess sometimes you'd have to look ahead (in terms of > structure depth) to find the right tag and then back out > again to read the container as a whole, e.g. in the above > case the item. Quick XPath tutorial..... An XPath expression is made up of a series of steps. Each step contains an axis, node test, and a set of predicates. the axis defines a set of nodes based on the context, ie child::, attributes::, ancestors::. the node test is then applied to each in this set, ie node(), text(), item. Each item in the original set that meets the node test is then filtered by each predicate, ie [position() = 1], [key = 'Mike']. The nodes from the axis set that meet all of these filters are then used as context nodes for the next step (or are the results if this is the last step). There a many abbreviations The step "dictionary" is an abbreviated syntax for child::dctionary. So, you can perform any tests at any step then move on. To break down : string(/dictionary/item[key = 'Mike']/value) In more detail 1. All children of the root that have the tag name dictionary are the results of the first step. 2. All children of all results from step 1 that have the tag name item _and_ have a child called key with a string value of Mike are the results of the second step. 3. All children of all results from step 2 that have a tag name of value. The spec obviously explains this is much better detail, but I hope this helps. Mike > > Thanks, > -- > Marc-Andre Lemburg > ______________________________________________________________________ > Business: http://www.lemburg.com/ > Python Pages: http://www.lemburg.com/python/ -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Mike.Olson@fourthought.com Wed Aug 9 18:36:37 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Wed, 09 Aug 2000 11:36:37 -0600 Subject: [XML-SIG] xmlpickle.py ?! References: <398EBE78.A01DE1D6@lemburg.com> <398ED79B.F6E49C65@FourThought.com> <398FC2F4.1DB5EDE4@lemburg.com> <39900DFD.9E469F02@FourThought.com> <399070A8.6CF232CB@lemburg.com> <39908DEA.D5437FCE@FourThought.com> <399110FC.69D9B46C@lemburg.com> Message-ID: <399196A5.E92F6581@FourThought.com> "M.-A. Lemburg" wrote: > > Mike Olson wrote: > > > > "M.-A. Lemburg" wrote: > > > > > > Mike Olson wrote: > > > > > > > > > > > > No matter how you do it, XPath won't look like python. The "." is not > > > > valid in XPath. Attributes are accessed as with the attribute:: axis > > > > (or @). Ex. > > > > > > > > > > > > > > > > 1234 S West Way > > > > > > > > > > > > > > > > To get Mike > > > > /Employees/Employee[@name="Mike"] > > > > To get Mikes Address with employee mike as the context: > > > > Address[0] > > > > > > Hmm, looks like it would be more useful to map object names > > > to element names in this case... it doesn't really make sense > > > to access information on a type basis, e.g. > > > /dictionary/item[@key="Mike"]. But then, structure is given by > > > type, not object name. Oh well :-/ > > > > I agree > > > > > BTW, how would one access "Mike" in this XML file without reverting to > > > positional indexing ? > > > > > > > > > > > > Mike > > >
1234 Main Street
> > >
> > >
> > > > > > It seems that these XPath lookups have to be context senstive... > > > > Not always. You can have relative or absolute paths. You can also have > > paths that don't care about position. Here are three examples. > > > > #1 Assume you have an unknown context with the exact document use: > > /dictionary/item/key[text()='Mike'] > > #2 Assume you have a context, lets say at item > > key[text()='Mike'] > > #3 Assume there are many dictionary objects in your document, all > > scattered at different levels and you want to get all of the keys names > > mike > > //dictionary/key[text()='Mike] > > The last one looks like a great way to search an XML file. I suppose > you can then use the looked up tag as context, right ? In that > case finding the item containing "Mike" as key wouldn't be hard. Yes, but it is not recommended. Imagine a 100 MB file. the "//" step will look at every node. Yikes! If you know the path it is recommed that you use it. Mike > > Yah, I really should go and read the specs... > > Thanks, > -- > Marc-Andre Lemburg > ______________________________________________________________________ > Business: http://www.lemburg.com/ > Python Pages: http://www.lemburg.com/python/ -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From sean@digitome.com Wed Aug 9 17:27:16 2000 From: sean@digitome.com (Sean McGrath) Date: Wed, 09 Aug 2000 17:27:16 +0100 Subject: [XML-SIG] Re: qp_xml In-Reply-To: <20000809161234.E0CAC1D027@dinsdale.python.org> Message-ID: <3.0.6.32.20000809172716.00a606b0@www.digitome.com> [Robin Becker ] >Is qp_xml.py dead. I tried the version in 0.5.5 and it objects to the >attrs list being passed by pyexpat to the start_element method. I guess >either qp_xml has changed or pyexpat has. pyexpat has changed. When I wrote Pyxie, pyexpat used a list for attributes. It now uses a dictionary. I suspect (but don't know) that qp_xml.py is expecting a list just like Pyxie (still) does. regards, http://www.pyxie.org - an Open Source XML Processing library for Python From gstein@lyra.org Wed Aug 9 19:56:30 2000 From: gstein@lyra.org (Greg Stein) Date: Wed, 9 Aug 2000 11:56:30 -0700 Subject: [XML-SIG] qp_xml In-Reply-To: ; from robin@jessikat.fsnet.co.uk on Wed, Aug 09, 2000 at 03:46:40PM +0100 References: Message-ID: <20000809115629.F19525@lyra.org> On Wed, Aug 09, 2000 at 03:46:40PM +0100, Robin Becker wrote: > Is qp_xml.py dead. I tried the version in 0.5.5 and it objects to the > attrs list being passed by pyexpat to the start_element method. I guess > either qp_xml has changed or pyexpat has. qp_xml is alive and may even be going into the 2.0 release. There was a sync issue between qp_xml and pyexpat. I believe it was fixed in 0.5.5.1. At the moment, it looks like nobody has removed duplicate stuff from PyXML after the shift of some functionality into Python itself. I believe Python's pyexpat module is the latest. In any case, 0.5.5.1 or the latest PyXML CVS will get you running. Cheers, -g -- Greg Stein, http://www.lyra.org/ From Fredrik Lundh" <398F025C.A9726E23@digicool.com> <398FCB29.991FC2F6@lemburg.com> <003901c00131$7b2f5a20$7cac1218@reston1.va.home.com> <008701c00158$ba3291c0$f2a6b5d4@hagrid> <002901c001fa$46875560$7cac1218@reston1.va.home.com> Message-ID: <01c701c0023e$2b56a360$f2a6b5d4@hagrid> Tom wrote: > > output =3D string.replace(data, "]]>", "]]]>") >=20 > Holy cow, /F! But did you really mean >=20 > output =3D string.replace(data, "]]>", "]]]>") nope. but I didn't make it clear that the idea was to put the "output" string inside a CDATA section in the first place. here's how it works: 1. the original "]]>" is split into two parts: "]" and "]>". 2. the "]" is put at the end of the first CDATA section, like this: "]" + "]]>" 3. the "]>" is put at the beginning of a second CDATA section, like this: "" the reason this trick works is that "]]>" is the *only* thing that's recognized as markup in a CDATA section (see section 2.7 of the XML spec): /.../ [18] CDSect ::=3D CDStart CData CDEnd=20 [19] CDStart ::=3D '' Char*)) =20 [21] CDEnd ::=3D ']]>'=20 =20 Within a CDATA section, only the CDEnd string is recognized as markup /.../ ::: also note that /.../ CDATA sections cannot nest /.../ doesn't mean that you cannot put a CDStart tag inside another CDATA section (e.g. if you're embedding XML in a CDATA section). once the parser has started parsing the CDATA section, it will simply skip over any embedded CDATA section -- but it will stop at the first CDEnd tag it sees, unless you escape them as shown above. ::: one drawback here is that you may end up with more than one CDATA segment at the receiving end, so a naive reader may mess things up. but if it does, it's broken. From robin@jessikat.fsnet.co.uk Wed Aug 9 21:17:05 2000 From: robin@jessikat.fsnet.co.uk (Robin Becker) Date: Wed, 9 Aug 2000 21:17:05 +0100 Subject: [XML-SIG] Re: qp_xml In-Reply-To: <3.0.6.32.20000809172716.00a606b0@www.digitome.com> References: <20000809161234.E0CAC1D027@dinsdale.python.org> <3.0.6.32.20000809172716.00a606b0@www.digitome.com> Message-ID: In article <3.0.6.32.20000809172716.00a606b0@www.digitome.com>, Sean McGrath writes >[Robin Becker ] >>Is qp_xml.py dead. I tried the version in 0.5.5 and it objects to the >>attrs list being passed by pyexpat to the start_element method. I guess >>either qp_xml has changed or pyexpat has. > >pyexpat has changed. When I wrote Pyxie, pyexpat used a list >for attributes. It now uses a dictionary. >I suspect (but don't know) that qp_xml.py is expecting >a list just like Pyxie (still) does. > >regards, > > ... I'm using an old pyexpat and the new qp_xml duh :( -- Robin Becker From robin@jessikat.fsnet.co.uk Wed Aug 9 21:21:41 2000 From: robin@jessikat.fsnet.co.uk (Robin Becker) Date: Wed, 9 Aug 2000 21:21:41 +0100 Subject: [XML-SIG] qp_xml In-Reply-To: <20000809115629.F19525@lyra.org> References: <20000809115629.F19525@lyra.org> Message-ID: In article <20000809115629.F19525@lyra.org>, Greg Stein writes >On Wed, Aug 09, 2000 at 03:46:40PM +0100, Robin Becker wrote: >> Is qp_xml.py dead. I tried the version in 0.5.5 and it objects to the >> attrs list being passed by pyexpat to the start_element method. I guess >> either qp_xml has changed or pyexpat has. > >qp_xml is alive and may even be going into the 2.0 release. There was a sync >issue between qp_xml and pyexpat. I believe it was fixed in 0.5.5.1. > >At the moment, it looks like nobody has removed duplicate stuff from PyXML >after the shift of some functionality into Python itself. I believe Python's >pyexpat module is the latest. In any case, 0.5.5.1 or the latest PyXML CVS >will get you running. > >Cheers, >-g > Yes I see that I'm using old pyexpat with new qp_xml. Has pyexpat improved? I'm having trouble beating Aaron's pure python recursive descent parser and qp_xml with old pyexpat running on hamlet.xml. His hackery gives 1.65" and qp_xml does 2.2"; I'm fairly sure his parser isn't complete, but it is a bit weird that the C tokenising etc doesn't beat it by a mile. -- Robin Becker From uogbuji@fourthought.com Thu Aug 10 06:14:26 2000 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 9 Aug 2000 23:14:26 -0600 (MDT) Subject: [XML-SIG] xmlpickle.py ?! In-Reply-To: <39910FD4.8BDD2DA@lemburg.com> Message-ID: On Wed, 9 Aug 2000, M.-A. Lemburg wrote: > uche.ogbuji@fourthought.com wrote: > > > > > BTW, how would one access "Mike" in this XML file without reverting to > > > positional indexing ? > > > > > > > > > > > > Mike > > >
1234 Main Street
> > >
> > >
> > > > Hmm? No positional indexing neede, I'd think: > > > > xml.xpath.Evaluate('string(/dictionary/item/key)') > > > > would return "Mike". > > > > Maybe I misunderstand you. > > What happens if your dictionary has 100 entries and you > want to lookup "Mike" (which is stored as content of key) ? I see what you're saying. XPath is no help here because it's not meant to be. It's just a very general tree-query tool independent of higher-order optimizations. Now if you were using XSLT, which does have the "responsibility" of providing tools to make tree transformation proper, you have some help. XSLT provides two "indexing" methodologies on top of XPath: id() and key(). With XSLT you can very easily instruct the processor to index on your example: and later > And once you've found it, how would you get at the corresponding > value ? See above. > Sorry for my ignorance shining through here. I should > really read the XPath specs... ALAS, no time for this now. > > > > It seems that these XPath lookups have to be context senstive... > > > > They do, but how does that imply "positional indexing"? > > Well, I guess sometimes you'd have to look ahead (in terms of > structure depth) to find the right tag and then back out > again to read the container as a whole, e.g. in the above > case the item. No problem with XSLT's key(). See the above example. The node indexed is really the item node, indexed against the string value of its key child. --Uche From keichwa@gmx.net Thu Aug 10 07:17:02 2000 From: keichwa@gmx.net (Karl Eichwalder) Date: 10 Aug 2000 08:17:02 +0200 Subject: Fwd: [XML-SIG] xmlpickle.py ?! In-Reply-To: Jim Fulton's message of "Mon, 07 Aug 2000 14:39:24 -0400" References: <00080711113300.02547@quadra.teleo.net> <398F025C.A9726E23@digicool.com> Message-ID: Jim Fulton writes: > This is the route I took. Here's an example that's > probably alot bigger than you want.... > > > If "id" is considered to be of the type ID it's invalid; IDs are not allowed to start with digets (as long as we're talking about XML). > I really need to write down a DTD for this...... Yes :) and don't forget to validate your XML documents... -- work : ke@suse.de | ------ ,__o : http://www.suse.de/~ke/ | ------ _-\_<, home : keichwa@gmx.net | ------ (*)/'(*) From jim@digicool.com Thu Aug 10 11:54:46 2000 From: jim@digicool.com (Jim Fulton) Date: Thu, 10 Aug 2000 06:54:46 -0400 Subject: Fwd: [XML-SIG] xmlpickle.py ?! References: <00080711113300.02547@quadra.teleo.net> <398F025C.A9726E23@digicool.com> Message-ID: <399289F6.2FDB7B57@digicool.com> Karl Eichwalder wrote: > > Jim Fulton writes: > > > This is the route I took. Here's an example that's > > probably alot bigger than you want.... > > > > > > > > If "id" is considered to be of the type ID it's invalid; IDs are not > allowed to start with digets (as long as we're talking about XML). Yup. I pointed this out (as a puzzle ;) in my original message. I need to fix this. I need to do it a bit gradually for compatibility sake. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From travish@realtime.net Thu Aug 10 20:40:08 2000 From: travish@realtime.net (travish) Date: Thu, 10 Aug 2000 14:40:08 -0500 (CDT) Subject: [XML-SIG] parsers and XML Message-ID: <200008101940.OAA61692@sullivan.realtime.net> > | a) most of the XML "parsers" act appear to be lexers > > You mean, since they don't build complete document trees? I mean since they appear to be lexers: http://nightflight.com/cgi-bin/foldoc.cgi?query=lexer lexer --> lexical analyser (Or "scanner") The initial input stage of a language processor (e.g. a compiler), the part that performs lexical analysis. http://nightflight.com/cgi-bin/foldoc.cgi?lexical+analysis lexical analysis (Or "linear analysis", "scanning") The first stage of processing a language. The stream of characters making up the source program or other input is read one at a time and grouped into lexemes (or "tokens") - word-like pieces such as keywords, identifiers, literals and punctutation. The lexemes are then passed to the parser. ["Compilers - Principles, Techniques and Tools", by Alfred V. Aho, Ravi Sethi and Jeffrey D. Ullman, pp. 4-5] > This is so > because XML has a much simpler structure (and potentially much greater > sizes) than what parsers traditionally have parsed. I'm not so sure; I've compiled very large C files before. > This makes an event-based API very useful. The "event-based API" bears a striking resemblance to a lexer, and is usually only useful if you do a certain amount of state-tracking yourself. (e.g. how many levels of tags deep am I, and which tags are they?) That is the traditional role of a parser, and the "event-driven API" apparently does none of it. > In Python we have so far chosen to make tree building separate utilities. And reasonably so. > If you want a document tree, look at 4DOM or qp_xml. Actually, I want something between the two APIs that appear to be present (lexing and generating an AST). For example, in the reduce phase of a shift-reduce parser like yacc (which corresponds to a close-tag event from an "event driven API"), one is given the ability to 'condense' all of the subtrees of this particular node, requiring neither a full AST nor keeping track of the stack of nested tags you may currently be processing in. This would be extremely handy for (e.g.) converting XML to nested data structures. > | b) none of the examples are of sufficient/substantial complexity > | (e.g. recursive nesting, deep/complex hierarchy) > | > | If anyone has suggestions on what kind of parser to use as a back > | end (yapps? kjParsing? etc.) I'd be interested to hear it. > > I don't understand this question. Meaning, how does one utilize the existing "real" parsers to quickly and robustly do the work which seem to be required by the "event-driven API", namely keeping track of which tags one is in, and correlating those to actions to take. This is a solved problem, and has been so for decades. All of the example I've seen have a fixed, shallow tag hierarchy and so are toy problems which don't encounter these complexities. > The diffs seem to be for the pyexpat driver. This has nothing to do > with sgmlop or xmllib. Perhaps you should look a little more carefully before sending back such a pointed response. > What is the problem with the description? For one thing, it appears that the character accumulation callback has a different signature than the other parsers, passing only one argument instead of three (charstr, start, len). If so, that hardly makes sgmlop replace the other parsers invisibly. -- Those who will not reason, are bigots, those who cannot, are fools, and those who dare not, are slaves. - George Gordon Noel Byron (1788-1824) From tpassin@home.com Thu Aug 10 22:29:39 2000 From: tpassin@home.com (tpassin@home.com) Date: Thu, 10 Aug 2000 17:29:39 -0400 Subject: [XML-SIG] parsers and XML References: <200008101940.OAA61692@sullivan.realtime.net> Message-ID: <001001c00312$173ad480$7cac1218@reston1.va.home.com> travish said - > > | a) most of the XML "parsers" act appear to be lexers > > > > You mean, since they don't build complete document trees? > > I mean since they appear to be lexers: > > http://nightflight.com/cgi-bin/foldoc.cgi?lexical+analysis > lexical analysis > (Or "linear analysis", "scanning") The first stage > of processing a language. The stream of characters making up the > source program or other input is read one at a time and grouped > into lexemes (or "tokens") - word-like pieces such as keywords, > identifiers, literals and punctutation. The lexemes are then passed > to the parser. > > ["Compilers - Principles, Techniques and Tools", by Alfred V. Aho, > Ravi Sethi and Jeffrey D. Ullman, pp. 4-5] > "Parser" seems to be the right idea. Although the xml "parsers" also perform the lexical analysis. If you look at the Dragon Book (that you referenced), or at lex + yacc, you see that a parser (yacc-generated, for example) won't build a syntax tree for you unless you add additional coding to tell it to do so. The parser, at bottom, is responsible for handling the syntax, and possibly doing something with the syntactical elements. A processor that checks for correct syntax and stops there is still a "parser", as is a tree-builder or a event-stream generator. Cheers, Tom Passin From mclay@nist.gov Fri Aug 11 01:35:13 2000 From: mclay@nist.gov (Michael McLay) Date: Thu, 10 Aug 2000 20:35:13 -0400 (EDT) Subject: [XML-SIG] parsers and XML In-Reply-To: <200008101940.OAA61692@sullivan.realtime.net> References: <200008101940.OAA61692@sullivan.realtime.net> Message-ID: <14739.19009.452119.445591@fermi.eeel.nist.gov> DQp0cmF2aXNoIHdyaXRlczoNCiA+IEFjdHVhbGx5LCBJIHdhbnQgc29tZXRoaW5nIGJldHdl ZW4gdGhlIHR3byBBUElzIHRoYXQgYXBwZWFyIHRvIGJlIHByZXNlbnQNCiA+IChsZXhpbmcg YW5kIGdlbmVyYXRpbmcgYW4gQVNUKS4gIEZvciBleGFtcGxlLCBpbiB0aGUgcmVkdWNlIHBo YXNlDQogPiBvZiBhIHNoaWZ0LXJlZHVjZSBwYXJzZXIgbGlrZSB5YWNjICh3aGljaCBjb3Jy ZXNwb25kcyB0byBhIGNsb3NlLXRhZw0KID4gZXZlbnQgZnJvbSBhbiAiZXZlbnQgZHJpdmVu IEFQSSIpLCBvbmUgaXMgZ2l2ZW4gdGhlIGFiaWxpdHkgdG8NCiA+ICdjb25kZW5zZScgYWxs IG9mIHRoZSBzdWJ0cmVlcyBvZiB0aGlzIHBhcnRpY3VsYXIgbm9kZSwgcmVxdWlyaW5nDQog PiBuZWl0aGVyIGEgZnVsbCBBU1Qgbm9yIGtlZXBpbmcgdHJhY2sgb2YgdGhlIHN0YWNrIG9m IG5lc3RlZCB0YWdzDQogPiB5b3UgbWF5IGN1cnJlbnRseSBiZSBwcm9jZXNzaW5nIGluLiAg VGhpcyB3b3VsZCBiZSBleHRyZW1lbHkgaGFuZHkNCiA+IGZvciAoZS5nLikgY29udmVydGlu ZyBYTUwgdG8gbmVzdGVkIGRhdGEgc3RydWN0dXJlcy4NCg0KWy4uLl0NCg0KID4gQWxsIG9m IHRoZSBleGFtcGxlIEkndmUgc2VlbiBoYXZlIGEgZml4ZWQsIHNoYWxsb3cgdGFnIGhpZXJh cmNoeSBhbmQgc28NCiA+IGFyZSB0b3kgcHJvYmxlbXMgd2hpY2ggZG9uJ3QgZW5jb3VudGVy IHRoZXNlIGNvbXBsZXhpdGllcy4NCg0KVGhlcmUgYXJlIG1ham9yIGVmZm9ydHMgdW5kZXJ3 YXkgdG8gZGV2ZWxvcCBYTUwgYmFzZWQgc3RhbmRhcmRzIGZvcg0KZW5naW5lZXJpbmcgZGF0 YSwgc28gaXQgaXMgbGlrZWx5IHRoYXQgdGhpcyBraW5kIG9mIHByb2JsZW0gd2lsbA0KYmVj b21lIHZlcnkgY29tbW9uIGFzIHNvb24uICBUaGluayBvZiBhbnkgcHJvZHVjdCBjYXRlZ29y eSBhbmQgeW91J2xsDQpmaW5kIHNvbWVvbmUgd29ya2luZyBvbiBhbiBYTUwgbWFwcGluZy4N Cg0KSSBhbSB3b3JraW5nIHdpdGggYSBzdGFuZGFyZHMgZ3JvdXAgYW5kIEdlb3JnaWEgVGVj aCBvbiBhbiBYTUwgU2NoZW1hDQpmb3IgcmVwcmVzZW50aW5nIHRoZSBtYW51ZmFjdHVyaW5n IGRhdGEgbmVlZGVkIHRvIHByb2R1Y2UgYSBwcmludGVkDQpjaXJjdWl0IGJvYXJkIGFuZCBh IHByaW50ZWQgY2lyY3VpdCBhc3NlbWJseS4gIFRoaXMgaXMgYSBmYWlybHkgZWFzeQ0KZXhh bXBsZSB0byBncm9rIGZvciBhbnlvbmUgd2hvIGhhcyBldmVyIHNlZW4gYSBwcmludGVkIGNp cmN1aXQgYm9hcmQuDQpJdCByZXF1aXJlcyBhIGRlZXBseSBuZXN0ZWQgWE1MIHRhZyBzZXQg d2l0aCBhIGNvcnJlc3BvbmRpbmcgZGVlcGx5DQpuZXN0ZWQgc2V0IG9mIHN0cnVjdHVyZXMg dGhhdCBtdXN0IGJlIHJlZmVyZW5jZWQgYnkgQ0FEIGFuZCBDQU0NCnNvZnR3YXJlLiAgDQoN ClRoZSBYTUwgU2NoZW1hIGZvciB0aGUgR2VuQ0FNIHN0YW5kYXJkIGlzIGF0Og0KDQogICAg aHR0cDovL3d3dy5maXMubWFyYy5nYXRlY2guZWR1L3htbC9pcGMtc2NoZW1hLmh0bWwjSVBD MjUxMQ0KDQpUaGUgZXhhbXBsZSBmaWxlIGlzIGF0Og0KDQogICBodHRwOi8vd3d3LmdlbmNh bS5vcmcvZXhhbXBsZXMvZGlldGVyNi54bWwNCg0KVGhpcyBpcyBhIHR5cGljYWwgZXhhbXBs ZSBvZiBhIG5lc3RlZCBzdHJ1Y3R1cmUuICBUaGUgR2VuQ0FNDQpkZXNjcmlwdGlvbiBvZiBh IHByaW50ZWQgY2lyY3VpdCBib2FyZHMgY29udGFpbnMgYWJvdXQgMTggdG9wIGxldmVsDQpz ZWN0aW9ucywgZm9yIHRoaXMgZXhhbXBsZSBJJ2xsIGV4cGxhaW4gdGhlIGludGVyYWN0aW9u IGJldHdlZW4gdGhlDQpQUklNSVRJVkVTIGFuZCBST1VURVMgc2VjdGlvbi4gIEEgUFJJTUlU SVZFUyBzZWN0aW9uIGhhcyBhIGxpc3Qgb2YNCkdST1VQIG9iamVjdHMuICBFYWNoIEdST1VQ IGlzIGEgc2VwYXJhdGUgbmFtZS1zcGFjZS4gIEFsbCANCkdST1VQIG5hbWUgc3BhY2UgbmFt ZXMgYXJlIHVuaXF1ZSB0byBhIEdlbkNBTSBmaWxlLiAgT25lIGdyb3VwIG1pZ2h0DQpob2xk IHN0YW5kYXJkIGNvbG9ycyBhbmQgYW5vdGhlciBtaWdodCBob2xkIGxpbmUgZGVzY3JpcHRp b25zLiAgSXQgaXMNCnVwIHRvIHRoZSB2ZW5kb3IgdG8gZGVjaWRlIGhvdyB0byBwYXJ0aXRp b24gdGhlIG5hbWUtc3BhY2VzLiAgVGhlDQpHUk9VUCBjb250YWlucyBhIFBBSU5UREVTQyBv YmplY3QgZGVmaW5pdGlvbi4gIFRoaXMgZGVmaW5lcyB0aGUgZmlsbA0KdXNlZCBpbnNpZGUg b2YgcG9seWdvbiBhbmQgb3RoZXIgY2xvc2VkIHNoYXBlcy4NCg0KVGhlIFJPVVRFUyBzZWN0 aW9uIGZvbGxvd3MgdGhlIFBSSU1JVElWRVMgc2VjdGlvbi4uICBST1VURVMgY29udGFpbnMg YSANCmxpc3Qgb2YgR1JPVVAgb2JqZWN0cy4gIEEgUk9VVEVTIEdST1VQIGNvbnRhaW5zIGEg DQpsaXN0IG9mIFJPVVRFIG9iamVjdHMgYW5kIGEgUk9VVEUgKHdoaWNoIHJlcHJlc2VudHMg YSBjb3BwZXIgdHJhY2UNCmV0Y2hlZCBvbiB0aGUgcHJpbnRlZCBjaXJjdWl0IGJvYXJkKSBj b250YWlucyBhIGxpc3Qgb2YgZ2VvbWV0cnkNCm9iamVjdHMsIHN1Y2ggYXMgUEFUSCwgUExB TkUsIFZJQSwgVEVTVFBBRC4uLiAgDQoNClJlcHJlc2VudGluZyBwb2ludGVycyBiZXR3ZWVu IG9iamVjdHMgaW4gWE1MIGlzIGEgc3BlY2lhbCBjYXNlIHByb2JsZW0gDQp0aGF0IGlzIHZl cnkgY29tbW9uIGluIGVuZ2luZWVyaW5nIGRhdGEgc3RydWN0dXJlcy4gIEEgc21hbGwNCmV4 YW1wbGUgZXh0cmFjdGVkIGZyb20gdGhlIGRpZXRlcjYueG1sIGZpbGUgd2lsbCBpbGx1c3Ry YXRlIHRoZQ0KcHJvYmxlbS4gIFRoZSBleGFtcGxlIGNvbnRhaW5zIG9ubHkgb25lIEdST1VQ IGFuZCBvbmx5IG9uZSBST1VURSBpbg0KdGhhdCBncm91cC4gIEEgUENCIGRlc2lnbiB3b3Vs ZCB0eXBpY2FsbHkgaGF2ZSBiZXR3ZWVuIDEwMCBhbmQgMTAwaw0KdW5pcXVlIHJvdXRlcy4g ICANCg0KPEdFTkNBTT4NCiAgPFBSSU1JVElWRVM+DQogICAgPEdST1VQIHByaW1pdGl2ZV9n cm91cF9pZD0icHJpbTQiID4NCiAgICAgIDxQQUlOVERFU0MgcGFpbnRkZXNjX25hbWU9ImZp bGxlZCIgcGFpbnRfdHlwZT0iRklMTCIgLz4NDQogICAgIDwvR1JPVVA+DQogIDwvUFJJTUlU SVZFUz4NCjwvR0VOQ0FNPg0KPEdFTkNBTT4NCiAgPFJPVVRFUz4NCiAgICA8R1JPVVAgcm91 dGVfZ3JvdXBfaWQ9InJvdXRlMSIgPg0NCiAgICAgIDxST1VURSBuZXRfbmFtZT0iR3JvdW5k IiBuZXRfY2xhc3M9IkdST1VORCIgPg0KICAgICAgICA8UEFUSCBsYXllcnNfcmVmPSJsYXkx OjIiIGxpbmVkZXNjX3JlZj0icHJpbTQ6c2lnbmFsd2lkdGgiID4NCiAgICAgICAgICA8UE9M WUxJTkUgPg0KICAgICAgICAgICAgPFNUQVJUQVQgc3RhcnRfeHk9IigxMzAwLDE0MDApIiAv Pg0KICAgICAgICAgICAgPExJTkVUTyBlbmRfeHk9IigxMjAwLDIyMDApIiAvPg0KICAgICAg ICAgIDwvUE9MWUxJTkU+DQogICAgICAgIDwvUEFUSD4NCiAgICAgICAgPFBMQU5FIGxheWVy c19yZWY9ImxheTE6MiIgcGFpbnRkZXNjX3JlZj0icHJpbTQ6ZmlsbGVkIiA+DQogICAgICAg ICAgPFBPTFlHT04gPg0KICAgICAgICAgICAgPFNUQVJUQVQgc3RhcnRfeHk9IigwLDApIiAv Pg0KICAgICAgICAgICAgPExJTkVUTyBlbmRfeHk9IigxMjAwLDApIiAvPg0KICAgICAgICAg ICAgPExJTkVUTyBlbmRfeHk9IigxMjAwLDI0MDApIiAvPg0KICAgICAgICAgICAgPExJTkVU TyBlbmRfeHk9IigwLDI0MDApIiAvPg0KICAgICAgICAgICAgPEVORExJTkUgLz4NCiAgICAg ICAgICA8L1BPTFlHT04+DQogICAgICAgIDwvUExBTkU+DQogICAgICAgIDxDT01QUElOIGNv bXBvbmVudF9yZWY9ImNtcDE6UjEiIHBhdHRlcm5fcGluX3JlZj0iUGluMSIgLz4NCiAgICAg IDwvUk9VVEU+DQogICAgPC9HUk9VUD4NCg0KSW4gdGhlIFJPVVRFIGRlZmluaXRpb24gdGhl cmUgaXMgYSBQTEFORSBvYmplY3QgZGVmaW5lZCB1c2luZyB0aGUNCnN0YXRlbWVudDogDQoN CiAgICAgICAgPFBMQU5FIGxheWVyc19yZWY9ImxheTE6MiIgcGFpbnRkZXNjX3JlZj0icHJp bTQ6ZmlsbGVkIiA+DQoNClRoZSBwYWludGRlc2NfcmVmIGF0dHJpYnV0ZSBpcyB1c2VkIHRv IHNwZWNpZnkgYSByZWxhdGlvbnNoaXAgYmV0d2Vlbg0KdGhlIFBMQU5FIG9iamVjdCBpbiB0 aGUgUk9VVEUgYW5kIHRoZSBQQUlOVERFU0Mgb2JqZWN0IGRlZmluZWQgaW4gdGhlDQpQUklN SVRJVkVTIHNlY3Rpb24uICBUaGUgcGFpbnRkZXNjX3JlZiBjb250YWlucyBhIHN0cmluZyB0 aGF0IHdoZW4NCnNwbGl0IG9uIHRoZSBmaXJzdCAnOicgaW4gdGhlIHN0cmluZyBpZGVudGlm aWVzIHRoZSBuYW1lIG9mIHRoZSBHUk9VUA0KYW5kIHRoZSBuYW1lIG9mIHRoZSBQQUlOVERF U0MgdGhhdCBpcyB0byBiZSB1c2VkIHRvIGZpbGwgdGhlIFBMQU5FLiAgDQoNClJlc29sdmlu ZyB0aGUgb2JqZWN0IHBvaW50ZXIgYmV0d2VlbiB0aGUgUExBTkUgYW5kIHRoZSBQQUlOVERF U0MgaXNuJ3QgDQpkb25lIGF1dG9tYXRpY2FsbHkgdXNpbmcgYW4gWE1MIHBhcnNlci4gIFdl IG1pZ2h0IG9mIGJlZW4gYWJsZSB0byB1c2UNCm9uZSBvZiB0aGUgc3RhbmRhcmQgZmVhdHVy ZXMgb2YgWE1MLCBzdWNoIGFzIFhQQVRILCB0byBkZWZpbmUgdGhlDQpyZWxhdGlvbnNoaXAs IGJ1dCB0aGlzIHNlZW1lZCBsaWtlIGEgbmF0dXJlIHBvaW50IGZvciBicmVha2luZyBiZXR3 ZWVuIA0KdGhlIHN0YW5kYXJkIG9mZi10aGUtc2hlbGYgWE1MIG9iamVjdCBoYW5kbGluZyBh bmQgdGhlIGN1c3RvbSBjb2RlDQp0aGF0IHdpbGwgYmUgcmVxdWlyZWQgdG8gcG9wdWxhdGUg dGhlIHN0cnVjdHVyZXMgb2YgdGhlIENBRCBhbmQgQ0FNDQp0b29sLiANCg0KVGhpcyBicmVh a2luZyBwb2ludCBiZXR3ZWVuIHN0YW5kYXJkIFhNTCBwYXJzaW5nIGFuZCBidWlsZGluZyBj dXN0b20NCm9iamVjdHMgaXMgcHJvYmFibHkgYSBwYXR0ZXJuIGxhbmd1YWdlLCBidXQgSSB3 b3VsZG4ndCBrbm93IGhvdyB0bw0KZGVmaW5lIGl0LiAgSSdtIHN0aWxsIHN0cnVnZ2xpbmcg d2l0aCB0aGUgaGFuZC1vZmYgYmV0d2VlbiB0aGUgdHdvLg0KDQpUaGVyZSBpcyBhIGh1Z2Ug bGVnYWN5IGNvZGUgYmFzZSBvZiBlbmdpbmVlcmluZyBzb2Z0d2FyZSB0aGF0IHdpbGwNCmV2 ZW50dWFsbHkgYmUgcmVhZGluZyBhbmQgd3JpdGluZyB0aGlzIFhNTCBmb3JtYXQuICBUaGUg aW5kdXN0cnkgSSdtDQp3b3JraW5nIHdpdGggaXMganVzdCBvbmUgb2YgbWFueSB0aGF0IGFy ZSBjb252ZXJ0aW5nIHRoZWlyIGVuZ2luZWVyaW5nDQpkYXRhIHRvIFhNTCBmb3JtYXQuICBN b3N0IGxpa2VseSBlYWNoIHRvb2wgdmVuZG9yIHdpbGwgZG8gdGhpcyBieQ0KYXR0YWNoaW5n IGFuIFhNTCBwYXJzZXIgdG8gdGhlIGV4aXN0aW5nIGNvZGUgYW5kIHBvcHVsYXRpbmcgdGhl cmUNCmV4aXN0aW5nIGRhdGEgc3RydWN0dXJlcyBhcyB0aGUgb2JqZWN0IGRlZmluaXRpb25z IGFzIHRoZSBYTUwgcGFyc2VyDQpyZWFkcyB0aGUgZGF0YS4gIElzIHRoZXJlIGFuIGVhc3kg d2F5IHRvIGFzc29jaWF0ZSB3aGljaCBkYXRhDQpzdHJ1Y3R1cmVzIHNob3VsZCBiZSBwb3B1 bGF0ZWQgYXMgZWFjaCBvZiB0aGUgWE1MIHRhZ3MgYXJlDQplbmNvdW50ZXJlZD8gIEkgd2Fz IHRoaW5raW5nIGl0IG1pZ2h0IGJlIHBvc3NpYmxlIHRvIGF1dG9tYXRpY2FsbHkNCmdlbmVy YXRlIGEgcGFyc2UgdHJlZSBkaXJlY3RseSBmcm9tIHRoZSBYTUwgU2NoZW1hIGRlZmluaXRp b24uICANCg0KSSdkIGFwcHJlY2lhdGUgZmVlZGJhY2sgb24gdGhlIGFwcHJvYWNoIHVzZWQg dG8gZGVmaW5lIHBvaW50ZXJzDQpiZXR3ZWVuIHN0cnVjdHVyZXMuICBJcyB0aGVyZSBhIHN0 YW5kYXJkIHdheSB0aGF0IHdvdWxkIGFsc28gYmUNCmVmZmljaWVudCBmb3IgYSBmaWxlIHRo YXQgbWF5IGNvbnRhaW4gbWlsbGlvbnMgb2YgdGhlc2UgcmVmZXJlbmNlcz8gDQpJIHdvdWxk IGJlIGludGVyZXN0ZWQgaW4gc2VlaW5nIHRoZSBleGFtcGxlIHJld3JpdHRlbiB1c2luZyBh bnkNCmFsdGVybmF0aXZlIG5vdGF0aW9ucywgc3VjaCBhcyBYUEFUSC4NCg0K From mclay@nist.gov Fri Aug 11 02:17:49 2000 From: mclay@nist.gov (Michael McLay) Date: Thu, 10 Aug 2000 21:17:49 -0400 (EDT) Subject: [XML-SIG] Parsing XML for deeply nested structures In-Reply-To: <200008101940.OAA61692@sullivan.realtime.net> References: <200008101940.OAA61692@sullivan.realtime.net> Message-ID: <14739.21565.562415.995307@fermi.eeel.nist.gov> travish writes: > Actually, I want something between the two APIs that appear to be present > (lexing and generating an AST). For example, in the reduce phase > of a shift-reduce parser like yacc (which corresponds to a close-tag > event from an "event driven API"), one is given the ability to > 'condense' all of the subtrees of this particular node, requiring > neither a full AST nor keeping track of the stack of nested tags > you may currently be processing in. This would be extremely handy > for (e.g.) converting XML to nested data structures. [...] > All of the example I've seen have a fixed, shallow tag hierarchy and so > are toy problems which don't encounter these complexities. There are major efforts underway to develop XML based standards for engineering data, so it is likely that this kind of problem will become very common as soon. Think of any product category and you'll find someone working on an XML mapping. I am working with a standards group and Georgia Tech on an XML Schema for representing the manufacturing data needed to produce a printed circuit board and a printed circuit assembly. This is a fairly easy example to grok for anyone who has ever seen a printed circuit board. It requires a deeply nested XML tag set with a corresponding deeply nested set of structures that must be referenced by CAD and CAM software. The XML Schema for the GenCAM standard is at: http://www.fis.marc.gatech.edu/xml/ipc-schema.html#IPC2511 The example file is at: http://www.gencam.org/examples/dieter6.xml This is a typical example of a nested structure. The GenCAM description of a printed circuit boards contains about 18 top level sections, for this example I'll explain the interaction between the PRIMITIVES and ROUTES section. A PRIMITIVES section has a list of GROUP objects. Each GROUP is a separate name-space. All GROUP name space names are unique to a GenCAM file. One group might hold standard colors and another might hold line descriptions. It is up to the vendor to decide how to partition the name-spaces. The GROUP contains a PAINTDESC object definition. This defines the fill used inside of polygon and other closed shapes. The ROUTES section follows the PRIMITIVES section.. ROUTES contains a list of GROUP objects. A ROUTES GROUP contains a list of ROUTE objects and a ROUTE (which represents a copper trace etched on the printed circuit board) contains a list of geometry objects, such as PATH, PLANE, VIA, TESTPAD... Representing pointers between objects in XML is a special case problem that is very common in engineering data structures. A small example extracted from the dieter6.xml file will illustrate the problem. The example contains only one GROUP and only one ROUTE in that group. A PCB design would typically have between 100 and 100k unique routes. In the ROUTE definition there is a PLANE object defined using the statement: The paintdesc_ref attribute is used to specify a relationship between the PLANE object in the ROUTE and the PAINTDESC object defined in the PRIMITIVES section. The paintdesc_ref contains a string that when split on the first ':' in the string identifies the name of the GROUP and the name of the PAINTDESC that is to be used to fill the PLANE. Resolving the object pointer between the PLANE and the PAINTDESC isn't done automatically using an XML parser. We might of been able to use one of the standard features of XML, such as XPATH, to define the relationship, but this seemed like a nature point for breaking between the standard off-the-shelf XML object handling and the custom code that will be required to populate the structures of the CAD and CAM tool. This breaking point between standard XML parsing and building custom objects is probably a pattern language, but I wouldn't know how to define it. I'm still struggling with the hand-off between the two. There is a huge legacy code base of engineering software that will eventually be reading and writing this XML format. The industry I'm working with is just one of many that are converting their engineering data to XML format. Most likely each tool vendor will do this by attaching an XML parser to the existing code and populating there existing data structures as the object definitions as the XML parser reads the data. Is there an easy way to associate which data structures should be populated as each of the XML tags are encountered? I was thinking it might be possible to automatically generate a parse tree directly from the XML Schema definition. I'd appreciate feedback on the approach used to define pointers between structures. Is there a standard way that would also be efficient for a file that may contain millions of these references? I would be interested in seeing the example rewritten using any alternative notations, such as XPATH. From tpassin@home.com Fri Aug 11 00:45:57 2000 From: tpassin@home.com (tpassin@home.com) Date: Thu, 10 Aug 2000 19:45:57 -0400 Subject: [XML-SIG] parsers and XML References: <200008101940.OAA61692@sullivan.realtime.net> <14739.19009.452119.445591@fermi.eeel.nist.gov> Message-ID: <002201c00325$21b5a6c0$7cac1218@reston1.va.home.com> Michael McLay told us about his PCB layout XML schema - [much interesting description elided] > > I'd appreciate feedback on the approach used to define pointers > between structures. Is there a standard way that would also be > efficient for a file that may contain millions of these references? > I would be interested in seeing the example rewritten using any > alternative notations, such as XPATH. > Michael, this is very interesting. Looking at your examples, I didn't think the nesting levels were especially deep. As for the references, you certainly want to use IDs in elements, as you are. But what about the colon in the reference IDs? That's not according to the Namespace Rec, is it? Maybe another character would be better. If you are interested in alternative access methods, I'd suggest getting the following book if you don't already have it: Data on the Web Abiteboul, Buneman, and Suciu Morgan Kaufmann Publishers ISBN 1-55860-622-X This book discusses semi-structured data (and XML data), and storing and accessing it. It also covers dealing with cycles in the data graphs. I think the actual reference ID values would depend on how they structure would be stored. If they are going to be expanded into existing data structures, the numbers would probably have to be translated into some internal form expected by the existing system, so it hardly matters what their exact format is (except for human reviewability, of course). With millions of elements possible,maybe a high-powered object database would be a good thing to look at. In that case, you'd want to see what kind of object IDs the ODBMS wants to use. Since the PCB can be seen as a sort of drawing, I was thinking that using SVG as a base might be interesting. But if you are trying to map to the existing systems as closely as possible, that would be different. Regards, Tom Passin From travish@realtime.net Fri Aug 11 01:22:38 2000 From: travish@realtime.net (travish) Date: Thu, 10 Aug 2000 19:22:38 -0500 (CDT) Subject: [XML-SIG] parsers and XML, note re: xmlpickle Message-ID: <200008110022.TAA72846@sullivan.realtime.net> > A processor that checks for correct syntax and stops there is still a > "parser", as is a tree-builder Validating parsers definitely qualify as "real" parsers. DOM stuff definitely qualifies as a parser. I guess I should have singled out "event-driven XML parsers" as a misnomer. I didn't mean to disparage the others. > or a event-stream generator. Event-stream generators emit events which loosely map to tokens. That is, "foo" comes out as start_tag('HEAD'), character_data('foo'), end_tag('HEAD'). They don't appear to do anything related to the structure of the HTML. A lexer would generate a token stream such as TOK_BEG(HEAD) STR(foo) TOK_END(HEAD). I'm still awaiting an explanation of how these differ in any significant way, but I'm tired of this thread and am not going to clarify again. xmlpickle: Incidentally, the "nested data structure" comments are imminently relevant to XML pickling. Do you really want to re-code a parser stack, reduction dispatch routines and child tree access code for every such application? Nor would you necessarily want to read it all and generate an AST in memory. -- Those who will not reason, are bigots, those who cannot, are fools, and those who dare not, are slaves. - George Gordon Noel Byron (1788-1824) From paul@prescod.net Fri Aug 11 14:20:56 2000 From: paul@prescod.net (Paul Prescod) Date: Fri, 11 Aug 2000 08:20:56 -0500 Subject: [XML-SIG] qp_xml References: <20000809115629.F19525@lyra.org> Message-ID: <3993FDB8.749F9028@prescod.net> Robin Becker wrote: > > ... > > Has pyexpat improved? I'm having trouble beating Aaron's pure python > recursive descent parser and qp_xml with old pyexpat running on > hamlet.xml. I'm having trouble parsing this. Is it ((recursive descent + qp_xml) vs. old pyexpat) or (recursive descent vs.( qp_xml + old pyexpat)). PyExpat has mostly improved to add Unicode support. > I'm fairly > sure his parser isn't complete, but it is a bit weird that the C > tokenising etc doesn't beat it by a mile. PyExpat has the additional cost of having to cross the C->Python line all of the time. Still, I haven't heard that anyone has made a reasonably complete pure-Python parser that is as fast as PyExpat. I don't know anything about Aaron's. -- Paul Prescod - Not encumbered by corporate consensus "I don't want you to describe to me -- not ever -- what you were doing to that poor boy to make him sound like that; but if you ever do it again, please cover his mouth with your hand," Grandmother said. -- John Irving, "A Prayer for Owen Meany" From mclay@nist.gov Fri Aug 11 17:21:08 2000 From: mclay@nist.gov (Michael McLay) Date: Fri, 11 Aug 2000 12:21:08 -0400 (EDT) Subject: [XML-SIG] Using XPATH as references Message-ID: <14740.10228.435967.788239@fermi.eeel.nist.gov> My apologies for a very long message. I think I'm getting closer to understanding how to use XML properly for the problem now. I have a an example of how I think XPATH could be used to reference between objects near the end of the message. Skip to "EXAMPLE:" near the end if you aren't interested in a detailed explaination of the PCB/PCA problem domain. tpassin@home.com writes: > Michael McLay told us about his PCB layout XML schema - > > [much interesting description elided] > > > > > I'd appreciate feedback on the approach used to define pointers > > between structures. Is there a standard way that would also be > > efficient for a file that may contain millions of these references? > > I would be interested in seeing the example rewritten using any > > alternative notations, such as XPATH. > > > Michael, this is very interesting. Looking at your examples, I didn't think > the nesting levels were especially deep. As for the references, you > certainly want to use IDs in elements, as you are. But what about the colon > in the reference IDs? That's not according to the Namespace Rec, is it? > Maybe another character would be better. Thanks for the feedback. I did model the usage of element IDs in the GenCAM format after the Namespaces Rec, however the Namespaces Rec is only concerned about element and attribute names. The namespace concept, AFAIK, are not allowed in element IDs. At least I haven't found a single reference or example that would illustrate how I would use namespaces in an ID. Unfortunately standard IDs from XML work for the information model defined in GenCAM. I've never seen an example or reference to building data namespaces. My problem domain needs an efficient mechanism for resolving millions of name references to names nested inside of "data namespaces" (the "data namespaces" are implemented as GROUP elements in the GenCAM schema). I've give an example and maybe someone can explain how I could apply IDs to the example. The GenCAM file can contain multiple printed circuit board definitions. Each BOARD has a unique name which is the ID for the board. The board names are unique within the top-level namespace of the GenCAM file. Each board includes many COMPONENTS and the components on a board are uniquely identified by reference designators, the IDs for the COMPONENT on a board. A COMPONENT is required for each instance of a DEVICE placed on a board. CAD tools traditionally use designators "R1", "R2"... for resistors; "U1", "U2"... for integrated circuits and so on. You can find the numbers next to components on older assembles because these numbers were silkscreened onto the surface of boards when boards were hand assembled. In GenCAM all COMPONENTS used in all boards are stored in a COMPONENTS section. The COMPONENT section is divided by GROUP tags. To fully identify a COMPONENT in a GenCAM file you need the name of the GROUP and the reference designator of the COMPONENT. So if there are BOARD names "bd1" and "bd2" there could be COMPONENTs that are identified using "bd1:R1", bd2:R2", "bd1:U1", bd2:U2", etc. The "bd1:R1" COMPONENT may reference a DEVICE named "stdparts:CR1206 while the bd2:R2" might reference a DEVICE named "digikey:100ohm1W". The current implementation of GenCAM requires that all data to be included be in a single file. In the XML implementation groups can be imported by URL reference. For example: Perhaps there is a way to do this efficiently with the current XML capabilities. Any suggestions? > > If you are interested in alternative access methods, I'd suggest getting the > following book if you don't already have it: > > Data on the Web > Abiteboul, Buneman, and Suciu > Morgan Kaufmann Publishers > ISBN 1-55860-622-X > > This book discusses semi-structured data (and XML data), and storing and > accessing it. It also covers dealing with cycles in the data graphs. I'll look this up. Thanks for the reference. > > I think the actual reference ID values would depend on how they structure > would be stored. If they are going to be expanded into existing data > structures, the numbers would probably have to be translated into some > internal form expected by the existing system, so it hardly matters what > their exact format is (except for human reviewability, of course). With > millions of elements possible,maybe a high-powered object database would be > a good thing to look at. In that case, you'd want to see what kind of > object IDs the ODBMS wants to use. The standards committee spend months working out the details of how names must be used in GenCAM so that the information model fully described the manufacturing data required to manufacture and test PCB and PCB products. The committee was constrained by the need to make it reasonable to implement by CAD and CAM vendors. The community is satisfied that the object relationships are captured properly. The only problem is understanding how to use the features of XML in mapping from the current syntax to XML. EXAMPLE: In the following example is the value of /GENCAM/COMPONENTS/GROUP[id="cmp1"]/COMPONENT[id="R1"]/DEVICEREF/part_ref the following string? "/GENCAM/DEVICES/GROUP[name="digikey"]/DEVICE/[name="100ohm1W"]" Here's the example: This may solve how to do references within a file. How would I do write the path if I need to reference an object outside of this file? It wouldn't be as simple as: "http://www.digikey.com/devices/catalog00524GENCAM/DEVICES/DEVICE/[name="100ohm1W"]" This would be cool, but the name is very long and I have millions of these references in a file. How do I shorten the repeated part of this statement, i.e., http://www.digikey.com/devices/catalog00524GENCAM/DEVICES/DEVICE/ I'd like to give that a namespace name that could be used just like element and attribute names are shortened in the meta data. > Since the PCB can be seen as a sort of drawing, I was thinking that using > SVG as a base might be interesting. But if you are trying to map to the > existing systems as closely as possible, that would be different. We are looking at SVG. Ideally GenCAM would be rewritten to use SGV and then stuff all GenCAM specific data into the SVG extension mechanism. Legacy system issues require we take an intermediate step. From andy@reportlab.com Fri Aug 11 16:24:50 2000 From: andy@reportlab.com (=?iso-8859-1?q?Andy=20Robinson?=) Date: Fri, 11 Aug 2000 08:24:50 -0700 (PDT) Subject: [XML-SIG] qp_xml Message-ID: <20000811152450.16231.qmail@web1601.mail.yahoo.com> > PyExpat has the additional cost of having to cross > the C->Python line > all of the time. Still, I haven't heard that anyone > has made a > reasonably complete pure-Python parser that is as > fast as PyExpat. I > don't know anything about Aaron's. That comment of Robin's wasn't supposed to have leaked yet! The cat is out of the bag, so here's what is happening: ReportLab (me, Robin and Aaron) have been looking at all the parsers around and trying to figure out a natural way to map XML to Python object models for a whole load of customer projects. So we wanted the easiest way to get a tree structure, without caring if it was DOM or not. Aaron sat down to write a simple rec-descent parser using string.find and nothing else, which outputs a tree of dictionaries. It handles tags, text, cdata and very little else. This was mostly a learning exercise and took half a day. Amazingly, it gets similar speeds on Hamlet to qp_xml. We reckon this is because essentially the same thing is going on: C code (string.find) grabs the next token, then calls back into Python to do something with it. We've found out in the past that extensions don't give much of a speedup when you make lots of little calls to them. Don't get too excited, as there are probably a whole bunch of occasional cases it doesn't handle yet and which may slow it down. It may not be "reasonably complete" by your definition - unlike PyExpat, which is extremely well proven. It should hopefully get released in a week or two, but there's some more to do first. - Andy ===== Andy Robinson ReportLab, Inc. __________________________________________________ Do You Yahoo!? Kick off your party with Yahoo! Invites. http://invites.yahoo.com/ From tpassin@home.com Sat Aug 12 03:46:01 2000 From: tpassin@home.com (tpassin@home.com) Date: Fri, 11 Aug 2000 22:46:01 -0400 Subject: [XML-SIG] Using XPATH as references References: <14740.10228.435967.788239@fermi.eeel.nist.gov> Message-ID: <003901c00407$73f9b240$7cac1218@reston1.va.home.com> Michael McLay wrote back - > > tpassin@home.com writes: > > > ... > > Michael, this is very interesting. Looking at your examples, I didn't think > > the nesting levels were especially deep. As for the references, you > > certainly want to use IDs in elements, as you are. But what about the colon > > in the reference IDs? That's not according to the Namespace Rec, is it? > > Maybe another character would be better. > > Thanks for the feedback. I did model the usage of element IDs in the > GenCAM format after the Namespaces Rec, however the Namespaces Rec is > only concerned about element and attribute names. The namespace > concept, AFAIK, are not allowed in element IDs. At least I haven't > found a single reference or example that would illustrate how I would > use namespaces in an ID. I didn't mean you should have used namespaces. I meant that the use of colons in names has been reserved to denote namespace prefixes only. Since you aren't actually using XML namespaces, you should change to some other character instead of a colon. The available ones are '_', '-', and '.'. And an ID attribute is still an attribute. Speaking of which, you're not supposed to use colons in attribute values which are declared as ID or IDREF, which your example does (assuming that somewhere you've made that declaration). The namespace Rec says: "Strictly speaking, attribute values declared to be of types ID, IDREF(S), ENTITY(IES), and NOTATION are also Names, and thus should be colon-free. However, the declared type of attribute values is only available to processors which read markup declarations, for example validating processors. Thus, unless the use of a validating processor has been specified, there can be no assurance that the contents of attribute values have been checked for conformance to this specification." Tom Passin From Fredrik Lundh" <20000809115629.F19525@lyra.org> <3993FDB8.749F9028@prescod.net> Message-ID: <008201c0046f$379b9d00$f2a6b5d4@hagrid> paul wrote: > PyExpat has the additional cost of having to cross the C->Python line > all of the time. Still, I haven't heard that anyone has made a > reasonably complete pure-Python parser that is as fast as PyExpat. if the benchmarks someone posted a few days ago are correct, 1.6b1's xmllib is only a few percent slower than pyexpat. From Fredrik Lundh" Message-ID: <003b01c00477$4d3f7e80$f2a6b5d4@hagrid> travish@realtime.net wrote: > > | a) most of the XML "parsers" act appear to be lexers > >=20 > > You mean, since they don't build complete document trees? >=20 > I mean since they appear to be lexers: only if you think that something like: is a single token. most people don't. From jkraai@murl.com Sun Aug 13 21:26:01 2000 From: jkraai@murl.com (jim kraai) Date: Sun, 13 Aug 2000 15:26:01 -0500 Subject: [XML-SIG] xmlpickle.py ?! References: <398EBE78.A01DE1D6@lemburg.com> <14735.42567.691887.481072@temoleh.chem.uu.nl> Message-ID: <39970459.5C7D340B@murl.com> "Rob W. W. Hooft" wrote: > > [snip] > > I would personally never use 20, since > you might need more structure later. It is very difficult to add more > sub-elements once there is character data (DTD issue). You can use > 20 to be more > extensible (e.g. 20 > 10 30 > %2d; who knows when we'll have structured > integers or subclassed integers like that....) I believe XML Schema is "when". --jim From tgagne@ix.netcom.com Mon Aug 14 02:45:39 2000 From: tgagne@ix.netcom.com (Thomas Gagne) Date: Sun, 13 Aug 2000 21:45:39 -0400 Subject: [XML-SIG] Fixed source for XML tutorial Message-ID: <39974F43.6148C497@ix.netcom.com> This is a multi-part message in MIME format. --------------FF7372391C5B4F9254B89302 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Though it was fairly easy to fix (and maybe that's part of the tutorial), the source included doesn't work. It calls parser.parseFile() with the an 'file' argument but it never opened it. --------------FF7372391C5B4F9254B89302 Content-Type: text/plain; charset=us-ascii; name="xmltest.py" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="xmltest.py" #!/usr/bin/env python from xml.sax import saxlib from xml.sax import saxexts class FindIssue(saxlib.HandlerBase): def __init__(self, title, number): self.search_title, self.search_number = title, number def startElement(self, name, attrs): if name != 'comic': return title = attrs.get('title', None) number = attrs.get('number', None) if title == self.search_title and number == self.search_number: print title, '#' + str(number), 'found' if __name__ == '__main__': parser = saxexts.make_parser() dh = FindIssue('Sandman', '62') parser.setDocumentHandler(dh) fh = open('test.xml', 'r') parser.parseFile(fh) parser.close() fh.close() --------------FF7372391C5B4F9254B89302-- From tgagne@ix.netcom.com Mon Aug 14 18:12:20 2000 From: tgagne@ix.netcom.com (Thomas Gagne) Date: Mon, 14 Aug 2000 13:12:20 -0400 Subject: [XML-SIG] My first xml sax import Message-ID: <39982874.541615BE@ix.netcom.com> The tutorial for xmllib shows how to create a document handler class and include sample code for the startElement() method, but it doesn't list what other methds are available and their arguments (and why should it, it's a tutorial?) Problem is, the library reference for xmllib doesn't list a method called startElement(). I assume there's probably one called endElement() but what do I do about the value for the tag? From tgagne@ix.netcom.com Mon Aug 14 19:39:29 2000 From: tgagne@ix.netcom.com (Thomas Gagne) Date: Mon, 14 Aug 2000 14:39:29 -0400 Subject: [XML-SIG] normalize_whitespace() Message-ID: <39983CE1.96F4A02F@ix.netcom.com> Is this the right function name? It's described in the documentation for PyXML-0.5.5.1 but when I try to use it after: from xml.sax import saxlib from xml.sax import saxexts I get: tgagne:/home/tgagne/work/efinnet/xml !sa sax1.py Traceback (innermost last): File ".//sax1.py", line 32, in ? parser.parseFile(fh) File "/usr/lib/python1.5/site-packages/xml/sax/drivers/drv_pyexpat.py", line 77, in parseFile if not self.parser.Parse(fileobj.read(),1): File "/usr/lib/python1.5/site-packages/xml/sax/drivers/drv_pyexpat.py", line 48, in startElement self.doc_handler.startElement(name,saxutils.AttributeMap(attrs)) File ".//sax1.py", line 12, in startElement self.currentElement = normalize_whitespace(name) NameError: normalize_whitespace From paul@prescod.net Wed Aug 16 05:13:16 2000 From: paul@prescod.net (Paul Prescod) Date: Wed, 16 Aug 2000 00:13:16 -0400 Subject: [XML-SIG] parsers and XML References: <200008072352.SAA98730@sullivan.realtime.net> Message-ID: <399A14DC.E8FF29B5@prescod.net> travish wrote: > > Hi... I was taking a look at some of the docs, code, and examples, > and was a bit surprised about a number of things. Below are some > comments, problems, diffs, etc. You may already know some of this. > > a) most of the XML "parsers" act appear to be lexers > > b) none of the examples are of sufficient/substantial complexity > (e.g. recursive nesting, deep/complex hierarchy) > > If anyone has suggestions on what kind of parser to use as a back end > (yapps? kjParsing? etc.) I'd be interested to hear it. Let's say we divide it this way: * a lexer is a tool that recognizes lexical boundaries (usually boundaries are described as regular expressions) * parser is a tool that organizes the stream of lexical events into a *logical* tree structure. They may or may not generate an AST but they will at least call your methods in a tree nested fashion. Well, all XML parsers I know of do both functions. It might seem at first that the "parse" part of the task is trivial for XML but it isn't so if you consider entities. Obviously you expect more from a parser than just building your logical tree. If you state exactly what you are looking for we might be able to point you to it or develop it. Note that a lot of people with very serious parser theory backgrounds have worked with XML so the relationship with formal parsing theory is pretty well understood. -- Paul Prescod - Not encumbered by corporate consensus "I don't want you to describe to me -- not ever -- what you were doing to that poor boy to make him sound like that; but if you ever do it again, please cover his mouth with your hand," Grandmother said. -- John Irving, "A Prayer for Owen Meany" From akuchlin@mems-exchange.org Wed Aug 16 21:36:30 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Wed, 16 Aug 2000 16:36:30 -0400 Subject: [XML-SIG] Python code for RDF? Message-ID: <20000816163630.B28965@kronos.cnri.reston.va.us> Does anyone have public code for dealing with RDF? Parsing, storing, anything? --amk From akuchlin@mems-exchange.org Wed Aug 16 23:17:29 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Wed, 16 Aug 2000 18:17:29 -0400 Subject: [XML-SIG] Moving forward Message-ID: With 1.6b1 out there, 1.6final imminent, and 2.0 not far away, it's now time to seriously consider dropping the concessions made for 1.5 compatibility. These would be: 1) The attempt to make setup.py work without having the Distutils installed 2) The wstrop and intl modules. 3) For 1.6/2.0, the package should install itself as _xmlplus. Other things can probably go later; for example, the Wise installer may no longer be needed once the Distutils Windows installer code has shaken down a bit more. Should it now be OK to make 1.6/2.0-specific checkins? (I've already tagged the current state of the tree with the tag "v056" for future reference, BTW.) --amk From wunder@ultraseek.com Wed Aug 16 23:44:40 2000 From: wunder@ultraseek.com (Walter Underwood) Date: Wed, 16 Aug 2000 15:44:40 -0700 Subject: [XML-SIG] Expat, Unicode, and 1.6b1 Message-ID: <73039009.966440680@nosferatu.inktomi.com> Is pyexpat supposed to work with Unicode in 1.6b1? I see pyexpat.c, but no Expat, and no changes for Unicode. We've already done the work to integrate pyexpat with the 1.6a2 Unicode support. It is a bit tricky to get right, but our stuff is solid on NT, Solaris, Linux/Intel, and HP-UX. wunder -- Walter Underwood Senior Staff Engineer, Ultraseek Server, Inktomi Corp. formerly Infoseek Software, GO.com, The Walt Disney Company http://www.ultraseek.com/ http://www.inktomi.com/ All Mickey Mouse films are founded on the motif of leaving home in order to learn what fear is. -- Walter Benjamin, 1931 From fdrake@beopen.com Thu Aug 17 02:30:09 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Wed, 16 Aug 2000 21:30:09 -0400 (EDT) Subject: [XML-SIG] Moving forward In-Reply-To: References: Message-ID: <14747.16417.436732.601966@cj42289-a.reston1.va.home.com> Andrew Kuchling writes: > With 1.6b1 out there, 1.6final imminent, and 2.0 not far away, > it's now time to seriously consider dropping the concessions made for > 1.5 compatibility. These would be: > 1) The attempt to make setup.py work without having the Distutils > installed > 2) The wstrop and intl modules. These all should just go. > 3) For 1.6/2.0, the package should install itself as _xmlplus. This is true, somewhat. 1.6 does not include the xml package, but the "right" thing is to go ahead and include an xml package that gets installed in site-packages which does the same thing as the xml package in the standard library (but does not need to protect against ImportError). > Other things can probably go later; for example, the Wise installer > may no longer be needed once the Distutils Windows installer code has > shaken down a bit more. > > Should it now be OK to make 1.6/2.0-specific checkins? (I've already > tagged the current state of the tree with the tag "v056" for future > reference, BTW.) I'd be concerned about losing easy access to revision histories. I think we can do "the right thing" as far as keeping histories and not screwing up the tag states, but it'll take messing with the CVS tree a bit, and getting it re-installed at SF. I can do the work if that helps, but it really does make sense to work on a local copy of the tree for this; not all operations are really supported throught the CVS interface. ;( (Greg Stein might say to wait for subversion, but that's too far off!) -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From fdrake@beopen.com Thu Aug 17 02:44:25 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Wed, 16 Aug 2000 21:44:25 -0400 (EDT) Subject: [XML-SIG] Expat, Unicode, and 1.6b1 In-Reply-To: <73039009.966440680@nosferatu.inktomi.com> References: <73039009.966440680@nosferatu.inktomi.com> Message-ID: <14747.17273.968595.182881@cj42289-a.reston1.va.home.com> Walter Underwood writes: > Is pyexpat supposed to work with Unicode in 1.6b1? I see > pyexpat.c, but no Expat, and no changes for Unicode. You should check the 2.0 tree for this (the head of development in the CVS repository at SourceForge). -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From akuchlin@mems-exchange.org Thu Aug 17 02:46:57 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Wed, 16 Aug 2000 21:46:57 -0400 Subject: [XML-SIG] Moving forward In-Reply-To: <14747.16417.436732.601966@cj42289-a.reston1.va.home.com>; from fdrake@beopen.com on Wed, Aug 16, 2000 at 09:30:09PM -0400 References: <14747.16417.436732.601966@cj42289-a.reston1.va.home.com> Message-ID: <20000816214657.A17892@newcnri.cnri.reston.va.us> On Wed, Aug 16, 2000 at 09:30:09PM -0400, Fred L. Drake, Jr. wrote: > I'd be concerned about losing easy access to revision histories. I >think we can do "the right thing" as far as keeping histories and not >screwing up the tag states, but it'll take messing with the CVS tree a >bit, and getting it re-installed at SF. I can do the work if that Why would we lose revision histories? The v056 tag will get the last 1.5 version, in case anyone cares, and everything else can continued to be developed. --amk From fdrake@beopen.com Thu Aug 17 02:58:21 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Wed, 16 Aug 2000 21:58:21 -0400 (EDT) Subject: [XML-SIG] Moving forward In-Reply-To: <20000816214657.A17892@newcnri.cnri.reston.va.us> References: <14747.16417.436732.601966@cj42289-a.reston1.va.home.com> <20000816214657.A17892@newcnri.cnri.reston.va.us> Message-ID: <14747.18109.861540.127476@cj42289-a.reston1.va.home.com> Andrew Kuchling writes: > Why would we lose revision histories? The v056 tag will get the last > 1.5 version, in case anyone cares, and everything else can continued > to be developed. Depends on how carelessly the changes are made. ;-) Moving files in CVS is tricking, but using "mv" on the repository isn't part of it. The "right" thing to do is to copy the ,v file into the new location, remove all the tags on the new file, and then "cvs rm" the old file. That's just really tedious, and requires access to the actual repository. *That's* what I'm proposing -- this keeps the development hist as part of the file in the *new* location, which is what I mean by "easy access" to the history; you can just get it from the current location of the file. -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From gstein@lyra.org Thu Aug 17 08:10:48 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 17 Aug 2000 00:10:48 -0700 Subject: [XML-SIG] Moving forward In-Reply-To: <14747.18109.861540.127476@cj42289-a.reston1.va.home.com>; from fdrake@beopen.com on Wed, Aug 16, 2000 at 09:58:21PM -0400 References: <14747.16417.436732.601966@cj42289-a.reston1.va.home.com> <20000816214657.A17892@newcnri.cnri.reston.va.us> <14747.18109.861540.127476@cj42289-a.reston1.va.home.com> Message-ID: <20000817001048.M17689@lyra.org> On Wed, Aug 16, 2000 at 09:58:21PM -0400, Fred L. Drake, Jr. wrote: > > Andrew Kuchling writes: > > Why would we lose revision histories? The v056 tag will get the last > > 1.5 version, in case anyone cares, and everything else can continued > > to be developed. > > Depends on how carelessly the changes are made. ;-) > Moving files in CVS is tricking, but using "mv" on the repository > isn't part of it. > The "right" thing to do is to copy the ,v file into the new > location, remove all the tags on the new file, and then "cvs rm" the > old file. That's just really tedious, and requires access to the > actual repository. *That's* what I'm proposing -- this keeps the > development hist as part of the file in the *new* location, which is > what I mean by "easy access" to the history; you can just get it from > the current location of the file. We go through this every six months in the Apache team. The "right" way to move a file is to "add" it into the new location. Your checkin message points at the old location. Then "remove" the old. No mucking with tags. No messing with the repository. Any time you muck in the repository, you open yourselves to danger. Some of the pitfalls with the cp/tag-remove are: 1) if the file was in the Attic in the new location, then you've mangled the repository: it is illegal to have a file outside *and* inside the attic 2) if somebody pulls files by date, then you'll have an extra/unwanted file at the old location. definitely bad. Really. Never touch the repository. It is just plain bad. ---- Back to the original point: what is moving? Why did this thread about moving files even start? As I saw, we'd be deleting a bunch of crap from the PyXML package. What moves? And I still disagree with Python 2.0's "replacement" strategy for the "xml" package with the "_xmlplus" package. How does one access the old files? Does _xmlplus need to completely replicate everything that appears in 2.0? What if it replicates the file wrong? Version skew? etc. No... letting the xmlplus package specify individual pieces is much better than a wholesale replacement of the module in sys.modules. [ I'll also point out that sys.modules is a bit shaky itself. I recall a couple times during imputil development and discussion, raising the desire to totally nuke it. For backwards compat, we certainly can't. But creating add'l reliance on it just doesn't feel right. And no, I don't recall the exact issue and don't have the brain cycles right now to go back and figger it out. ] Cheers, -g -- Greg Stein, http://www.lyra.org/ From cje2@biolpc22.york.ac.uk Thu Aug 17 10:47:26 2000 From: cje2@biolpc22.york.ac.uk (Chris Elliott) Date: Thu, 17 Aug 2000 10:47:26 +0100 (BST) Subject: [XML-SIG] compile of Version 0.5.5.1, Message-ID: I downloaded the code to my Redhat 6.1 system and tried to install it as described in the README file (with a vies to using it with the Sketch program) I got the following error message on trying to install: [root@localhost]# python setup.py build Executing 'build' action... Running command: make -f Makefile.pre.in boot rm -f *.o *~ rm -f `find . -name '*.pyc'` rm -f `find . -name '*.o'` rm -f `find . -name '*~'` cd expat ; make clean make[1]: Entering directory `/usr/src/PyXML-0.5.5.1/extensions/expat' rm -f xmltok/xmltok.o xmltok/xmlrole.o xmlwf/xmlwf.o xmlwf/xmlfile.o xmlwf/codepage.o xmlparse/xmlparse.o xmlparse/hashtable.o xmlwf/unixfilemap.o xmlwf/xmlwf make[1]: Leaving directory `/usr/src/PyXML-0.5.5.1/extensions/expat' rm -f *.a tags TAGS config.c Makefile.pre python sedscript rm -f *.so *.sl so_locations cd expat ; make clobber make[1]: Entering directory `/usr/src/PyXML-0.5.5.1/extensions/expat' rm -f xmltok/xmltok.o xmltok/xmlrole.o xmlwf/xmlwf.o xmlwf/xmlfile.o xmlwf/codepage.o xmlparse/xmlparse.o xmlparse/hashtable.o xmlwf/unixfilemap.o xmlwf/xmlwf rm -f libexpat.a make[1]: Leaving directory `/usr/src/PyXML-0.5.5.1/extensions/expat' VERSION=`python -c "import sys; print sys.version[:3]"`; \ installdir=`python -c "import sys; print sys.prefix"`; \ exec_installdir=`python -c "import sys; print sys.exec_prefix"`; \ make -f ./Makefile.pre.in VPATH=. srcdir=. \ VERSION=$VERSION \ installdir=$installdir \ exec_installdir=$exec_installdir \ Makefile make[1]: Entering directory `/usr/src/PyXML-0.5.5.1/extensions' make[1]: *** No rule to make target `/usr/lib/python1.5/config/Makefile', needed by `sedscript'. Stop. make[1]: Leaving directory `/usr/src/PyXML-0.5.5.1/extensions' make: *** [boot] Error 2 Running command: make make: *** No targets. Stop. Traceback (innermost last): File "setup.py", line 185, in ? func() File "setup.py", line 155, in build_unix shutil.copy('extensions/' + filename, 'build/xml/parsers/') File "/usr/lib/python1.5/shutil.py", line 52, in copy copyfile(src, dst) File "/usr/lib/python1.5/shutil.py", line 17, in copyfile fsrc = open(src, 'rb') IOError: [Errno 2] No such file or directory: 'extensions/pyexpat.so' From akuchlin@mems-exchange.org Thu Aug 17 12:45:06 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Thu, 17 Aug 2000 07:45:06 -0400 Subject: [XML-SIG] Moving forward In-Reply-To: <14747.18109.861540.127476@cj42289-a.reston1.va.home.com>; from fdrake@beopen.com on Wed, Aug 16, 2000 at 09:58:21PM -0400 References: <14747.16417.436732.601966@cj42289-a.reston1.va.home.com> <20000816214657.A17892@newcnri.cnri.reston.va.us> <14747.18109.861540.127476@cj42289-a.reston1.va.home.com> Message-ID: <20000817074506.A19955@newcnri.cnri.reston.va.us> On Wed, Aug 16, 2000 at 09:58:21PM -0400, Fred L. Drake, Jr. wrote: > Moving files in CVS is tricking, but using "mv" on the repository >isn't part of it. Yes, but who's moving files? The files can be installed to _xmlplus simply by adding "package_dir: {'_xmlplus':'xml'}" to the setup script, so there's no need to move any files around, beyond deleting them. And I'm less and less enthused by the _xmlplus solution, since it means the PyXML package will have to duplicate code that's in the Python CVS. Unfortunately that's necessary to let the package fix a buggy module that comes with Python, so there seems little choice. --amk From nicolas.menoux@wanadoo.fr Thu Aug 17 16:41:00 2000 From: nicolas.menoux@wanadoo.fr (Nicolas MENOUX) Date: Thu, 17 Aug 2000 17:41:00 +0200 Subject: [XML-SIG] *** No rule to make target (XML package) Message-ID: <001201c00861$8bb32fc0$4500a8c0@wanadoo.fr> This is a multi-part message in MIME format. ------=_NextPart_000_000F_01C00872.4EDEEBA0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi, I'm using sketch on RedHat 6.2 Linux and I need the Python xml package = to import some files like svg files. I've downloaded the PyXML package from = the python org site but I'm not able to compile it. As it is written in the README file, I did "python setup.py build" but I always got an error = telling me *** No rule to make target. Does anyone know how to add this xml = module to my python installation ? Thanks a lot, Nicolas Menoux ------=_NextPart_000_000F_01C00872.4EDEEBA0 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Hi,

I'm=20 using sketch on RedHat 6.2 Linux and I need the Python xml package = to
import=20 some files like svg files. I've downloaded the PyXML package from = the
python=20 org site but I'm not able to compile it. As it is written in = the
README file,=20 I did "python setup.py build" but I always got an error telling
me = *** No=20 rule to make target. Does anyone know how to add this xml module
to = my python=20 installation ?

Thanks a lot,

Nicolas=20 Menoux


------=_NextPart_000_000F_01C00872.4EDEEBA0-- From wunder@inktomi.com Thu Aug 17 18:32:48 2000 From: wunder@inktomi.com (Walter Underwood) Date: Thu, 17 Aug 2000 10:32:48 -0700 Subject: [XML-SIG] Expat, Unicode, and 1.6b1 In-Reply-To: <14747.17273.968595.182881@cj42289-a.reston1.va.home.com> Message-ID: <140726915.966508368@nosferatu.inktomi.com> --On Wednesday, August 16, 2000 9:44 PM -0400 "Fred L. Drake, Jr." wrote: > > Walter Underwood writes: > > Is pyexpat supposed to work with Unicode in 1.6b1? I see > > pyexpat.c, but no Expat, and no changes for Unicode. > > You should check the 2.0 tree for this (the head of development in > the CVS repository at SourceForge). So the XML in 1.6 will not return Python Unicode objects? If so, we'll probably stay with our tested mods to pyexpat. wunder -- Walter Underwood Senior Staff Engineer, Ultraseek Server, Inktomi Corp. formerly Infoseek Software, GO.com, The Walt Disney Company http://www.ultraseek.com/ http://www.inktomi.com/ All Mickey Mouse films are founded on the motif of leaving home in order to learn what fear is. -- Walter Benjamin, 1931 From akuchlin@mems-exchange.org Thu Aug 17 19:38:34 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Thu, 17 Aug 2000 14:38:34 -0400 Subject: [XML-SIG] Expat, Unicode, and 1.6b1 In-Reply-To: <140726915.966508368@nosferatu.inktomi.com>; from wunder@inktomi.com on Thu, Aug 17, 2000 at 10:32:48AM -0700 References: <14747.17273.968595.182881@cj42289-a.reston1.va.home.com> <140726915.966508368@nosferatu.inktomi.com> Message-ID: <20000817143834.A26404@kronos.cnri.reston.va.us> On Thu, Aug 17, 2000 at 10:32:48AM -0700, Walter Underwood wrote: >So the XML in 1.6 will not return Python Unicode objects? If so, >we'll probably stay with our tested mods to pyexpat. Yes, it will. Parser objects have a .returns_unicode attribute, and the handler functions are passed Unicode or UTF-8 strings depending on the attribute's value. (Note to self: add this to the docs.) Expat is not going into the Python CVS tree, though. --amk From fdrake@beopen.com Thu Aug 17 20:37:40 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Thu, 17 Aug 2000 15:37:40 -0400 (EDT) Subject: [XML-SIG] Expat, Unicode, and 1.6b1 In-Reply-To: <140726915.966508368@nosferatu.inktomi.com> References: <14747.17273.968595.182881@cj42289-a.reston1.va.home.com> <140726915.966508368@nosferatu.inktomi.com> Message-ID: <14748.16132.677235.649049@cj42289-a.reston1.va.home.com> Walter Underwood writes: > So the XML in 1.6 will not return Python Unicode objects? If so, > we'll probably stay with our tested mods to pyexpat. That's probably the right approach. The Python 2.0 library will include the updated Expat interface which can produce Unicode objects directly. You won't have too long a wait, but there are a few things left to do before Python 2.0 is ready. -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From fdrake@beopen.com Thu Aug 17 20:46:07 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Thu, 17 Aug 2000 15:46:07 -0400 (EDT) Subject: [XML-SIG] Expat, Unicode, and 1.6b1 In-Reply-To: <20000817143834.A26404@kronos.cnri.reston.va.us> References: <14747.17273.968595.182881@cj42289-a.reston1.va.home.com> <140726915.966508368@nosferatu.inktomi.com> <20000817143834.A26404@kronos.cnri.reston.va.us> Message-ID: <14748.16639.682532.838553@cj42289-a.reston1.va.home.com> Andrew Kuchling writes: > Yes, it will. Parser objects have a .returns_unicode attribute, and > the handler functions are passed Unicode or UTF-8 strings depending on > the attribute's value. (Note to self: add this to the docs.) This did not make it in for Python 1.6; this is with the Python 2.0 version of pyexpat. Using PyXML should fix this, once we've got the code re-shaped to the new package structure. -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From akuchlin@mems-exchange.org Thu Aug 17 21:04:18 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Thu, 17 Aug 2000 16:04:18 -0400 Subject: [XML-SIG] Expat, Unicode, and 1.6b1 In-Reply-To: <14748.16639.682532.838553@cj42289-a.reston1.va.home.com>; from fdrake@beopen.com on Thu, Aug 17, 2000 at 03:46:07PM -0400 References: <14747.17273.968595.182881@cj42289-a.reston1.va.home.com> <140726915.966508368@nosferatu.inktomi.com> <20000817143834.A26404@kronos.cnri.reston.va.us> <14748.16639.682532.838553@cj42289-a.reston1.va.home.com> Message-ID: <20000817160418.A26587@kronos.cnri.reston.va.us> On Thu, Aug 17, 2000 at 03:46:07PM -0400, Fred L. Drake, Jr. wrote: > This did not make it in for Python 1.6; this is with the Python 2.0 >version of pyexpat. Using PyXML should fix this, once we've got the >code re-shaped to the new package structure. Why is pyexpat in Python 1.6 at all, then? CNRI didn't write it, and it wasn't in 1.5.2. --amk From fdrake@beopen.com Thu Aug 17 21:15:04 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Thu, 17 Aug 2000 16:15:04 -0400 (EDT) Subject: [XML-SIG] Expat, Unicode, and 1.6b1 In-Reply-To: <20000817160418.A26587@kronos.cnri.reston.va.us> References: <14747.17273.968595.182881@cj42289-a.reston1.va.home.com> <140726915.966508368@nosferatu.inktomi.com> <20000817143834.A26404@kronos.cnri.reston.va.us> <14748.16639.682532.838553@cj42289-a.reston1.va.home.com> <20000817160418.A26587@kronos.cnri.reston.va.us> Message-ID: <14748.18376.370058.687357@cj42289-a.reston1.va.home.com> Andrew Kuchling writes: > Why is pyexpat in Python 1.6 at all, then? CNRI didn't write it, > and it wasn't in 1.5.2. It happened to be in the tree as of 16 May 2000, which was the last day any PythonLabs people were CNRI employees. Nothing sinister, but removing it could too easily turn into more delay, as CNRI would question the removal. (Well, not *everyone* at CNRI, I'm sure, but certain people.) -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From fdrake@beopen.com Fri Aug 18 15:29:37 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Fri, 18 Aug 2000 10:29:37 -0400 (EDT) Subject: [XML-SIG] Future direction of Expat development Message-ID: <14749.18513.252853.411982@cj42289-a.reston1.va.home.com> Paul Prescod, Clark Cooper, and myself, are in the process of setting up a new project at SourceForge to continue the development of Expat. Clark is the maintainer of the Perl bindings to Expat. He is bringing improved DTD reporting to the API, and Paul will be doing some work to improve the Namespaces support in the API, allowing more effective support for SAX2 without losing efficiency. I'll be the documentation guy (hard to see that one coming, huh? ;). We are doing this with the blessing of James Clark, who will be providing his CVS tree for us to work from. I'll announce more when I have more details. -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From akuchlin@mems-exchange.org Fri Aug 18 15:51:38 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Fri, 18 Aug 2000 10:51:38 -0400 Subject: [XML-SIG] Future direction of Expat development In-Reply-To: <14749.18513.252853.411982@cj42289-a.reston1.va.home.com>; from fdrake@beopen.com on Fri, Aug 18, 2000 at 10:29:37AM -0400 References: <14749.18513.252853.411982@cj42289-a.reston1.va.home.com> Message-ID: <20000818105138.B27419@kronos.cnri.reston.va.us> On Fri, Aug 18, 2000 at 10:29:37AM -0400, Fred L. Drake, Jr. wrote: > Paul Prescod, Clark Cooper, and myself, are in the process of >setting up a new project at SourceForge to continue the development of >Expat. Clark is the maintainer of the Perl bindings to Expat. He is Great news! Is James Clark no longer going to maintain Expat at all? One feature I'd like to see: as suggested a while ago, a way is needed to tell at run-time how the Expat library was compiled. Once the CVS tree is public, I'll take a look at adding that. (Or maybe there should simply be a way to compile "Fat Expat" that supports both Unicode and UTF-8...) --amk From fdrake@beopen.com Fri Aug 18 16:35:08 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Fri, 18 Aug 2000 11:35:08 -0400 (EDT) Subject: [XML-SIG] Future direction of Expat development In-Reply-To: <20000818105138.B27419@kronos.cnri.reston.va.us> References: <14749.18513.252853.411982@cj42289-a.reston1.va.home.com> <20000818105138.B27419@kronos.cnri.reston.va.us> Message-ID: <14749.22444.161278.551964@cj42289-a.reston1.va.home.com> Andrew Kuchling writes: > Great news! Is James Clark no longer going to maintain Expat at all? James Clark will be a contributor, and has registered on SourceForge so that he can continue, but is no longer interested in sole control of the package. > One feature I'd like to see: as suggested a while ago, a way is needed > to tell at run-time how the Expat library was compiled. Once the CVS > tree is public, I'll take a look at adding that. (Or maybe there > should simply be a way to compile "Fat Expat" that supports both > Unicode and UTF-8...) This would be an excellent idea. We'd like to see Expat installable and usable as a separate library, which certainly entails making all this information at least *available*, if not actually configurable at runtime. -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From gstein@lyra.org Fri Aug 18 22:29:24 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 18 Aug 2000 14:29:24 -0700 Subject: [XML-SIG] Future direction of Expat development In-Reply-To: <14749.18513.252853.411982@cj42289-a.reston1.va.home.com>; from fdrake@beopen.com on Fri, Aug 18, 2000 at 10:29:37AM -0400 References: <14749.18513.252853.411982@cj42289-a.reston1.va.home.com> Message-ID: <20000818142924.H17689@lyra.org> On Fri, Aug 18, 2000 at 10:29:37AM -0400, Fred L. Drake, Jr. wrote: > > Paul Prescod, Clark Cooper, and myself, are in the process of > setting up a new project at SourceForge to continue the development of > Expat. Clark is the maintainer of the Perl bindings to Expat. He is > bringing improved DTD reporting to the API, and Paul will be doing > some work to improve the Namespaces support in the API, allowing more > effective support for SAX2 without losing efficiency. I'll be the > documentation guy (hard to see that one coming, huh? ;). > We are doing this with the blessing of James Clark, who will be > providing his CVS tree for us to work from. > I'll announce more when I have more details. Excellent. I have some Expat changes from the ASF that I'd like to supply a patch for. When you're set up, I'll generate and deliver them. Great news! Cheers, -g -- Greg Stein, http://www.lyra.org/ From paul@prescod.net Sat Aug 19 15:35:53 2000 From: paul@prescod.net (Paul Prescod) Date: Sat, 19 Aug 2000 10:35:53 -0400 Subject: [XML-SIG] Future direction of Expat development References: <14749.18513.252853.411982@cj42289-a.reston1.va.home.com> Message-ID: <399E9B49.326B682D@prescod.net> Rather than setting up some kind of complicated system of governance, I am happy to let Clark review patches and decide on their incorporation. He is more knowledgable about and interested in Expat than almost anyone else in the world. I consider myself just another patch contributor. I trust him to keep Python's needs in mind as long as we keep him informed of what we need. -- Paul Prescod - Not encumbered by corporate consensus Simplicity does not precede complexity, but follows it. - http://www.cs.yale.edu/homes/perlis-alan/quotes.html From terrytv@1st.net Sun Aug 20 22:54:38 2000 From: terrytv@1st.net (THERE IS AN ANSWER) Date: Sun, 20 Aug 2000 21:54:38 Subject: [XML-SIG] TRAVEL AND MAKE MONEY Message-ID: Prospect Mailer 20009:54:38 PM ANNOUNCING IT'S TIME TO MAKE MONEY WITH VACTIONS INSTEAD OF ALWAYS SPENDING IT ALL ON THEM. PEOPLE SPEND THEIR MONEY ON VACTIONS NOW, SO HERE'S AN OPPORTUNITY TO MAKE MONEY AND HAVE FUN TOO. EVERY YEAR MORE AND MORE PEOPLE ARE SPENDING THIER MONEY ON THE EXPERINCES OF LIFE RATHER THAN ON IT'S POSSESSIONS. CHANCES ARE YOU GO ON VACTION AT LEAST ONCE A YEAR, AND HOW MANY TIMES IN THE LAST TEN YEARS HAVE YOU HAD TO JUMP ON A PLANE AND FLY SOME WHERE, WELL YOUR NOT THE ONLY ONE. I HAVE A INFORMATION KIT THAT'S GOING TO BE A BIG HELP TO YOU. HERE'S THE OPPORTUNITY YOU'VE BEEN WAITING FOR. YOU WILL BE ABLE TO STAY HOME AND WORK OR ADD TO YOUR INCOME. THROUGH THE INDUSTRY OF TRAVEL, YOU CAN TRAVEL FOR LESS, MAKE COMMISSIONS OWN YOUR OWN TRIP, MAKE COMMISSIONS WHEN OTHERS TRAVEL, AND BECOME A LICENSED TRAVEL PROFESSIONAL. WITH SO MANY PEOPLE WORKING FROM HOME NOW, AND WITH HOME BASED BUSSINESSES OFFERING MORE SECURITY, THIS IS THE WAY TO GO. AS A TRAVEL PROFESSIONAL,(FULL OR PART-TIME) YOU WILL GET PERKS SUCH AS FAMILIARIZATION TRIPS-TRIPS GIVEN BY SPONSERS TO FAMILIARIZE THE TRAVEL PROFESSIONAL WITH WHAT THE RESORTS ARE OFFERING. IN ADDITION THERE ARE RENTAL CAR DISCOUNTS, THEME PARK DISCOUNTS, AIRLINE DISCOUNTS AND MORE. THE BEST THING OF ALL IS, YOU GET TO MAKE MONEY. AN ADDED BONUS IS WHEN YOU REFER OTHERS YOU GET PAID TOO. WORKING FROM YOUR HOME, YOU WILL GET PAID!!!. SO IF YOUR READY FOR A NEW OR PART-TIME CAREER, OR YOU JUST WANT TO SAVE MONEY ON VACTION EVERY YEAR, ORDER THE INFORMATION KIT NOW. IT EXPLAINS WHAT YOU NEED TO GET STARTED BEING A TRAVEL PROFESSIONAL. SEND $14.95 PLUS $4.95 FOR SHIPPING AND HANDLING. OUTSIDE THE UNITED STATES ADD $10.00 FOR A TOTAL OF $29.90 TO FAITH HOLMES INC, AT P.O. BOX 834405 HOLLYWOOD,FL 33083-4405. ALLOW TWO WEEKS FOR DELIVERY. THE MONEY YOU WILL INVEST FOR YOUR LICENSE, AND MATERIALS NECESSARY FOR YOU TO BECOME A TRAVEL PROFESSIONAL, YOU WILL SAVE ON YOUR FIRST VACTION ALONE. SO ODER YOUR INFORMATION KIT NOW! ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- THIS MESSAGE IS SENT IN COMPLIANCE OF THE NEW eMAIL BILL SECTION 301. PER SECTION 301, PARAGRAPH (a)(2)(c), 1618 FURTHER TRANSMISSION TO YOU BY THE SENDER OF THIS MESSAGE MAY BE STOPPED AT NO COST TO YOU BY SENDING A REPLY TO THE SENDER WITH THE WORD "REMOVE" IN THE SUBJECT LINE ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- From alexandre.fayolle@free.fr Mon Aug 21 13:53:03 2000 From: alexandre.fayolle@free.fr (Alexandre Fayolle) Date: Mon, 21 Aug 2000 14:53:03 +0200 (MEST) Subject: [XML-SIG] DTD location Message-ID: <966862383.39a1262f256cc@imp.free.fr> Hello, I'm using a SAX parser that I get from xml.sax.saxexts.XMLValParserFactory.make_parser(), and I would like to know if there is a way to tell the parser where it can find the DTD, so that I don't have to set an absolute path in the document. Thanks for your help. Alexandre Fayolle From larsga@garshol.priv.no Mon Aug 21 15:20:17 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 21 Aug 2000 16:20:17 +0200 Subject: [XML-SIG] parsers and XML In-Reply-To: <200008101940.OAA61692@sullivan.realtime.net> References: <200008101940.OAA61692@sullivan.realtime.net> Message-ID: * Lars Marius Garshol | | This is so because XML has a much simpler structure (and potentially | much greater sizes) than what parsers traditionally have parsed. * travish@realtime.net | | I'm not so sure; I've compiled very large C files before. Upwards of 100 MB? In a single file? * Lars Marius Garshol | | This makes an event-based API very useful. * travish@realtime.net | | The "event-based API" bears a striking resemblance to a lexer, and | is usually only useful if you do a certain amount of state-tracking | yourself. (e.g. how many levels of tags deep am I, and which tags | are they?) That is the traditional role of a parser, and the | "event-driven API" apparently does none of it. This is all correct, but XML documents and computer programs have very different uses and structures and so it is really most productive if you try to forget how things are done in compilers and start with a clean slate when learning XML. | Actually, I want something between the two APIs that appear to be | present (lexing and generating an AST). For example, in the reduce | phase of a shift-reduce parser like yacc (which corresponds to a | close-tag event from an "event driven API"), one is given the | ability to 'condense' all of the subtrees of this particular node, | requiring neither a full AST nor keeping track of the stack of | nested tags you may currently be processing in. This would be | extremely handy for (e.g.) converting XML to nested data structures. This is a very reasonable wish and such tools are already in existence. Pyxie and eventdom can both do this. * Lars Marius Garshol | | The diffs seem to be for the pyexpat driver. This has nothing to do | with sgmlop or xmllib. * travish@realtime.net | | Perhaps you should look a little more carefully before sending back | such a pointed response. I'm sorry if I have offended you. My response was not at all intended to be pointed. * Lars Marius Garshol | | What is the problem with the description? * travish@realtime.net | | For one thing, it appears that the character accumulation callback has | a different signature than the other parsers, passing only one argument | instead of three (charstr, start, len). If so, that hardly makes sgmlop | replace the other parsers invisibly. It seems that you have confused the levels here. sgmlop's native API (not the SAX driver) is intended to be a drop-in replacement for xmllib. Of course, the SAX drivers are all intended to be (as far as possible) interchangeable, so if you've found a version of the sgmlop driver that has this problem I would be glad to hear where you have it from. --Lars M. From larsga@garshol.priv.no Mon Aug 21 15:26:18 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 21 Aug 2000 16:26:18 +0200 Subject: [XML-SIG] My first xml sax import In-Reply-To: <39982874.541615BE@ix.netcom.com> References: <39982874.541615BE@ix.netcom.com> Message-ID: * Thomas Gagne | | The tutorial for xmllib shows how to create a document handler class | and include sample code for the startElement() method, | but it doesn't list what other methds are available and their | arguments (and why should it, it's a tutorial?) Problem is, the | library reference for xmllib doesn't list a method called | startElement(). What tutorial is this? The DocumentHandler is a SAX class, and so something different from xmllib. The reason why the xmllib library reference does not show the startElement() method is that it belongs to SAX and not to xmllib. If you want a SAX reference you can look here: | I assume there's probably one called endElement() but what do I do | about the value for the tag? There is an event called endElement(), but XML does not have a concept of the value of a tag (or even of an element, which is what I think you really mean). Elements have content, which may be a single piece of text or quite a few other things. If you want the textual content of the element you must accumulate it yourself using the characters() events. --Lars M. From larsga@garshol.priv.no Mon Aug 21 15:33:42 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 21 Aug 2000 16:33:42 +0200 Subject: [XML-SIG] DTD location In-Reply-To: <966862383.39a1262f256cc@imp.free.fr> References: <966862383.39a1262f256cc@imp.free.fr> Message-ID: * Alexandre Fayolle | | I'm using a SAX parser that I get from | xml.sax.saxexts.XMLValParserFactory.make_parser(), and I would like | to know if there is a way to tell the parser where it can find the | DTD, so that I don't have to set an absolute path in the document. The best way to do that is to use a public identifier in your document, like so: Then you can use the SAX EntityResolver based on catalog files provided by xmlproc (which is always the parser you get back from that method, since it is the only alternative) to map the public identifier to the location you want. Alternatively you can write your own EntityResolver and make it do the mapping for you without using a catalog file at all. See for information on catalog files. --Lars M. From ale@sift.co.uk Tue Aug 22 10:53:33 2000 From: ale@sift.co.uk (Alejandro Fernandez) Date: Tue, 22 Aug 2000 10:53:33 +0100 Subject: [XML-SIG] Problems installing sig. Message-ID: <0008221102570A.01220@kubric.office.sift.co.uk> Here is what I get when I try to run PyXML on my red hat 6.2 box: I downloaded and untarred http://www.python.org/sigs/xml-sig/files/PyXML-0.5.5.1.tar.gz It seems to not find libexpat.so, which I thought was one of the packages it was trying to install in the first place. Is there anything I am doing wrong? Thanks for your help, Alejandro --- [root@kubric PyXML-0.5.5.1]# python setup.py build Executing 'build' action... Running command: make -f Makefile.pre.in boot rm -f *.o *~ rm -f `find . -name '*.pyc'` rm -f `find . -name '*.o'` rm -f `find . -name '*~'` cd expat ; make clean make[1]: Entering directory `/usr/archive/PyXML-0.5.5.1/extensions/expat' rm -f xmltok/xmltok.o xmltok/xmlrole.o xmlwf/xmlwf.o xmlwf/xmlfile.o xmlwf/codepage.o xmlparse/xmlparse.o xmlparse/hashtable.o xmlwf/unixfilemap.o xmlwf/xmlwf make[1]: Leaving directory `/usr/archive/PyXML-0.5.5.1/extensions/expat' rm -f *.a tags TAGS config.c Makefile.pre python sedscript rm -f *.so *.sl so_locations cd expat ; make clobber make[1]: Entering directory `/usr/archive/PyXML-0.5.5.1/extensions/expat' rm -f xmltok/xmltok.o xmltok/xmlrole.o xmlwf/xmlwf.o xmlwf/xmlfile.o xmlwf/codepage.o xmlparse/xmlparse.o xmlparse/hashtable.o xmlwf/unixfilemap.o xmlwf/xmlwf rm -f libexpat.a make[1]: Leaving directory `/usr/archive/PyXML-0.5.5.1/extensions/expat' VERSION=`python -c "import sys; print sys.version[:3]"`; \ installdir=`python -c "import sys; print sys.prefix"`; \ exec_installdir=`python -c "import sys; print sys.exec_prefix"`; \ make -f ./Makefile.pre.in VPATH=. srcdir=. \ VERSION=$VERSION \ installdir=$installdir \ exec_installdir=$exec_installdir \ Makefile make[1]: Entering directory `/usr/archive/PyXML-0.5.5.1/extensions' make[1]: *** No rule to make target `/usr/lib/python1.5/config/Makefile', needed by `sedscript'. Stop. make[1]: Leaving directory `/usr/archive/PyXML-0.5.5.1/extensions' make: *** [boot] Error 2 Running command: make make: *** No targets specified and no makefile found. Stop. Traceback (innermost last): File "setup.py", line 185, in ? func() File "setup.py", line 155, in build_unix shutil.copy('extensions/' + filename, 'build/xml/parsers/') File "/usr/lib/python1.5/shutil.py", line 52, in copy copyfile(src, dst) File "/usr/lib/python1.5/shutil.py", line 17, in copyfile fsrc = open(src, 'rb') IOError: [Errno 2] No such file or directory: 'extensions/pyexpat.so' [root@kubric PyXML-0.5.5.1]# python setup.py install Executing 'build' action... Running command: make -f Makefile.pre.in boot rm -f *.o *~ rm -f `find . -name '*.pyc'` rm -f `find . -name '*.o'` rm -f `find . -name '*~'` cd expat ; make clean make[1]: Entering directory `/usr/archive/PyXML-0.5.5.1/extensions/expat' rm -f xmltok/xmltok.o xmltok/xmlrole.o xmlwf/xmlwf.o xmlwf/xmlfile.o xmlwf/codepage.o xmlparse/xmlparse.o xmlparse/hashtable.o xmlwf/unixfilemap.o xmlwf/xmlwf make[1]: Leaving directory `/usr/archive/PyXML-0.5.5.1/extensions/expat' rm -f *.a tags TAGS config.c Makefile.pre python sedscript rm -f *.so *.sl so_locations cd expat ; make clobber make[1]: Entering directory `/usr/archive/PyXML-0.5.5.1/extensions/expat' rm -f xmltok/xmltok.o xmltok/xmlrole.o xmlwf/xmlwf.o xmlwf/xmlfile.o xmlwf/codepage.o xmlparse/xmlparse.o xmlparse/hashtable.o xmlwf/unixfilemap.o xmlwf/xmlwf rm -f libexpat.a make[1]: Leaving directory `/usr/archive/PyXML-0.5.5.1/extensions/expat' VERSION=`python -c "import sys; print sys.version[:3]"`; \ installdir=`python -c "import sys; print sys.prefix"`; \ exec_installdir=`python -c "import sys; print sys.exec_prefix"`; \ make -f ./Makefile.pre.in VPATH=. srcdir=. \ VERSION=$VERSION \ installdir=$installdir \ exec_installdir=$exec_installdir \ Makefile make[1]: Entering directory `/usr/archive/PyXML-0.5.5.1/extensions' make[1]: *** No rule to make target `/usr/lib/python1.5/config/Makefile', needed by `sedscript'. Stop. make[1]: Leaving directory `/usr/archive/PyXML-0.5.5.1/extensions' make: *** [boot] Error 2 Running command: make make: *** No targets specified and no makefile found. Stop. Traceback (innermost last): File "setup.py", line 185, in ? func() File "setup.py", line 155, in build_unix shutil.copy('extensions/' + filename, 'build/xml/parsers/') File "/usr/lib/python1.5/shutil.py", line 52, in copy copyfile(src, dst) File "/usr/lib/python1.5/shutil.py", line 17, in copyfile fsrc = open(src, 'rb') IOError: [Errno 2] No such file or directory: 'extensions/pyexpat.so' -- Alejandro Fernandez Webmaster Sift Group plc., 100 Victoria Street, BRISTOL, BS1 6HZ tel:+44 117 915 9600 fax:+44 117 915 9630 http://www.sift.co.uk ------------ "Vertical B2B Communities" From tpassin@home.com Thu Aug 24 04:39:58 2000 From: tpassin@home.com (tpassin@home.com) Date: Wed, 23 Aug 2000 23:39:58 -0400 Subject: [XML-SIG] New data on speed of string appending References: <3935D13D.F4EAD64B@roguewave.com> <393C3FEB.A2DAB88E@roguewave.com> Message-ID: <011a01c00d7c$fa636e60$7cac1218@reston1.va.home.com> Remember back in June (June 22,2000), Bjorn Pettersen showed how using StringIO instead of just appending to a string was **much** faster? Then others showed that using list.append/join(list) was also very fast. Well, a colleague wrote a Python script to convert csv data to a particular xml format. It took 17 seconds to run on my computer, even though the output wasn't that long. He was producing the entire output string before writing any of it. I changed all the str=str+char statements to use list.append(). The script executed nearly instantaneously - much less than a second to run. I decided to do a little checking, and I wanted to share the results. I tried all three candidate methods: 1) str='' for char in data: str=str+char 2) l=[] for char in data: l=l.append(char) str=string.join(l) 3) st=cStringIO() for char in data: st.write(char) str=st.getvalue() The test program simply builds an output string equal to the input string one character at a time. The times include the time to create the string/list/cStringIO object. The results are dramatic. Method 1) is as good as or better than anything until the string length exceeds about 1000 bytes. Then Method 1 starts slowing down. Above about 4000 bytes, it's really getting ssslllooowww. Here is a table of the results on my system - 450 MHz PIII running Win98, Python 1.5.2. Rate of generating output string, char/sec length of input Method 1 Method 2 Method 3 50-1000 3.3e5 1.8e5 2.3e5 1200 3.2e5 1.8e5 2.6e5 1500 1.2e5 1.8e5 2.5e5 2000 1.2e5 2.7e5 2.6e5 4000 6.1e4 1.8e5 2.6e5 8000 3.6e4 1.9e5 2.5e5 15000 1.7e4 1.4e5 2.5e5 30,000 8200 1.8e5 2.7e5 40,000 6600 1.8e5 2.4e5 60,000 4500 2.1e5 2.2e5 100,000 --- 1.8e5 2.4e5 200,000 --- 1.8e5 2.4e5 These figures include some averaging. The few numbers that are a little different - like Method 2 at 60,000 char - probably don't mean anything. Oh, yes, plain StringIO was definitely slower that cStringIO, as you might think - I dont's have any figures, though. Below is the test program, followed by parts of Bjorn's post. Cheers, Tom Passin ======================================================== import time import string import cStringIO import sys def Min(x,y): if x>=y:return y return x if __name__=='__main__': args=sys.argv if len(args)>1: file=args[1] else: print "No file to test...' sys.exit(0) f=open(file) data=f.read() f.close() # Use this variable to set the effective length of the test. # Or, just make it larger than the input file size. testlength=300000 data=data[:testlength] datalen=len(data) # Input might have been shorter than testlength print "\nTesting Speed of adding characters to strings" print 'Data consists of %s characters' % datalen # String appending takes too long for long string sizes, # so it's better to force the data length to some maximum value. # We don't force the data size for the other methods (below). d1len=Min(40000,datalen) d1=data[:d1len] d1len=len(d1) start=time.time() reps=1 # Set up multiple reps to get a usable test duration if d1len<5000: reps=int(20000/d1len) for n in range(reps): s='' for c in d1: s=s+c # Method 1 duration=time.time()-start print '\tAdd-to-string for %s characters: %.3g seconds (%.3g char/sec)'\ % (d1len,duration/reps,(reps*d1len/duration)) d1=s=None print 'Times for %s characters:' % datalen # Append-to-list method start=time.time() reps=1 if datalen<10000: reps=int(100000/datalen) for n in range(reps): l=[] for c in data: l.append(c) # Method 2 s=string.join(l) duration=time.time()-start print '\tAppend-to-list: %.3g sec (%.3g char/sec)' \ % (duration/reps,(reps*datalen)/duration) l=[] # StringIO method start=time.time() reps=1 if datalen<10000: reps=int(100000/datalen) for n in range(reps): st=cStringIO.StringIO() for c in data: st.write(c) # Method 3 s=st.getvalue() st.close() duration=time.time()-start print '\tcStringIO: %0.3g sec (%.3g char/sec)'\ % (duration/reps,(reps*datalen)/duration) st=[]; s='' ================================================== ----- Original Message ----- From: "Bjorn Pettersen" To: "Lars Marius Garshol" Cc: Sent: Monday, June 05, 2000 8:03 PM Subject: Re: [XML-SIG] speed question re DOM parsing > Lars Marius Garshol wrote: > > > > * Bjorn Pettersen > > | > > | Question: does using StringIO (or perhaps array) and __getattr__ > > | sound like the right thing to do? > > > > StringIO sounds like the right thing, at least for that particular > > document. Probably it wouldn't be too bad for the other documents > > either, but I have no experience with its performance. > > > > I'm afraid I don't have the necessary context to answer the > > __getattr__ questions, but: I would definitely like to see your > > sources. If you could post them somewhere, I, at least, would be happy > > to have a look at them. > > I've included the patched file as an attachment. My changes are > confined to: > > - importing (c)StringIO at the top > - changing the constructor call to _element (line 82) to pass > a StringIO object rather than an empty string. > - hiding the "first_cdata" member in the __init__ method of _element > - adding a __getattr__ method to _element. > > With limited performance testing I got: > > File Size Original Patched > 37K 0.14s 0.07s > 968K 103.77s 1.68s > From robin@jessikat.fsnet.co.uk Thu Aug 24 10:36:49 2000 From: robin@jessikat.fsnet.co.uk (Robin Becker) Date: Thu, 24 Aug 2000 10:36:49 +0100 Subject: [XML-SIG] bad c extension practice Message-ID: Many of the modules in the current CVS tree are handling errors in a manner which I find unpythonic. A typical example is pyexpat.c which at the end of initpyexpat does /* Check for errors */ if (PyErr_Occurred()) Py_FatalError("can't initialize module pyexpat"); the xml-sig group might regard this as a tragedy, but I might wish to continue and use another parser. The correct behaviour for this sort of error ought IMHO to be to raise an ImportError clean up any privately allocated resources and return. c sources which raise fatal errors _cursesmodule.c: _localemodule.c: _tkinter.c: almodule.c: cdmodule.c: errnomodule.c: fcntlmodule.c: fpectlmodule.c: linuxaudiodev.c: main.c: mathmodule.c: mpzmodule.c: parsermodule.c: pcremodule.c: posixmodule.c: puremodule.c: pyexpat.c: shamodule.c: stropmodule.c: syslogmodule.c: timemodule.c: timingmodule.c: -- Robin Becker From jack@oratrix.nl Thu Aug 24 11:17:37 2000 From: jack@oratrix.nl (Jack Jansen) Date: Thu, 24 Aug 2000 12:17:37 +0200 Subject: [XML-SIG] bad c extension practice In-Reply-To: Message by Robin Becker , Thu, 24 Aug 2000 10:36:49 +0100 , Message-ID: <20000824101737.C27A1303181@snelboot.oratrix.nl> Robin, it's probably a good idea to enter this into the standard Python bug database. Modules have had this initialization code since the very beginning, but it is something that should change now that we have dynamically loaded modules and all such. Not doing the Py_FatalError may not be a good option, though: it could well be that the outer layers of the importing code also expect the initxxxx() routines to always succeed, so fixing this may be more work than it looks. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From robin@jessikat.fsnet.co.uk Thu Aug 24 11:58:41 2000 From: robin@jessikat.fsnet.co.uk (Robin Becker) Date: Thu, 24 Aug 2000 11:58:41 +0100 Subject: [XML-SIG] bad c extension practice In-Reply-To: <20000824101737.C27A1303181@snelboot.oratrix.nl> References: <20000824101737.C27A1303181@snelboot.oratrix.nl> Message-ID: In article <20000824101737.C27A1303181@snelboot.oratrix.nl>, Jack Jansen writes >Robin, >it's probably a good idea to enter this into the standard Python bug database. >Modules have had this initialization code since the very beginning, but it is >something that should change now that we have dynamically loaded modules and >all such. > >Not doing the Py_FatalError may not be a good option, though: it could well be >that the outer layers of the importing code also expect the initxxxx() >routines to always succeed, so fixing this may be more work than it looks. >-- >Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ >Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ >www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm The reason I got to this is that somewhen between 1.5.2 final and stackless 1.5.42+ somebody cleaned up the cPickle module so that it didn't fatal error. This has few ramifications for most as they normally have the copy_reg module available, but it changed the behaviour when using Gordon's installer from a non-recoverable error to a recoverable one. -- Robin Becker From akuchlin@mems-exchange.org Thu Aug 24 15:39:34 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Thu, 24 Aug 2000 10:39:34 -0400 Subject: [XML-SIG] bad c extension practice In-Reply-To: ; from robin@jessikat.fsnet.co.uk on Thu, Aug 24, 2000 at 10:36:49AM +0100 References: Message-ID: <20000824103934.A31485@kronos.cnri.reston.va.us> On Thu, Aug 24, 2000 at 10:36:49AM +0100, Robin Becker wrote: >the xml-sig group might regard this as a tragedy, but I might wish to >continue and use another parser. The correct behaviour for this sort of >error ought IMHO to be to raise an ImportError clean up any privately >allocated resources and return. This means the module's init*() function could not be completed for some reason, and hence the module object might be incomplete in some way. A fatal error therefore seems appropriate. --amk From scarraway@farmcreditproductions.com Thu Aug 24 17:25:46 2000 From: scarraway@farmcreditproductions.com (Carraway, Shawn) Date: Thu, 24 Aug 2000 12:25:46 -0400 Subject: [XML-SIG] Questions about this application Message-ID: This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------_=_NextPart_001_01C00DE5.E673FAAA Content-Type: text/plain; charset="iso-8859-1" Hi all, This may sound like a bunch of stupid questions, but I need to ask some things about Python and XML and don't know anyone who programs with Python or where to turn for help. I am a graduate student at the University of South Carolina in the College of Library and Information Science. I am currently part of a group that is building a database using XML and would like to use Python to search the database and then have the whole business available over the web. The reason we are looking at Python is because we have read that it is an easier language to learn if you are not a programmer and because it is open source. We chose XML for building the database because it is also open source and we believe it is the future of the WWW. If we can make this work, we plan to open source the code and the idea, and have other projects in mind for the future. What I need to know is this: 1) First, can this be done? What I read indicates that it can, but I want to make sure. If it cannot be done, there is no need to go on to the other questions. 2) Is SaX only available for Linux? I run Linux at home (as an alternative to M$), but I am not a programmer and am by no means skilled with Linux - I am an end user, although I did build the machine and installed/configured Linux myself. 3) What browser do you use to view XML with Linux? I use Netscape, but the only browser I know that supports XML is IE5 - not available for Linux, and Amaya - the W3C browser. 4) Can you direct me to more resources - like a user's group, maybe - or a mailing list that would not flame stupid questions? We did a database project with ASP this summer and while there was a lot of documentation on the web, it would have been nice to have someone to ask questions, especially since our project differed somewhat from what we read on the web. Thanks so much for all your help, Shawna Carraway ------_=_NextPart_001_01C00DE5.E673FAAA Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Questions about this application

Hi all,
This may sound like a bunch of stupid questions, but = I need to ask some things about Python and XML and don't know anyone = who programs with Python or where to turn for help.

I am a graduate student at the University of South = Carolina in the College of Library and Information Science.  I am = currently part of a group that is building a database using XML and = would like to use Python to search the database and then have the whole = business available over the web.  The reason we are looking at = Python is because we have read that it is an easier language to learn = if you are not a programmer and because it is open source.  We = chose XML for building the database because it is also open source and = we believe it is the future of the WWW.  If we can make this work, = we plan to open source the code and the idea, and have other projects = in mind for the future.

What I need to know is this:

1) First, can this be done?  What I read = indicates that it can, but I want to make sure.  If it cannot be = done, there is no need to go on to the other questions.

2) Is SaX only available for Linux?  I run Linux = at home (as an alternative to M$), but I am not a programmer and am by = no means skilled with Linux - I am an end user, although I did build = the machine and installed/configured Linux myself.

3) What browser do you use to view XML with = Linux?  I use Netscape, but the only browser I know that supports = XML is IE5 - not available for Linux, and Amaya - the W3C = browser.

4) Can you direct me to more resources - like a = user's group, maybe - or a mailing list that would not flame stupid = questions?  We did a database project with ASP this summer and = while there was a lot of documentation on the web, it would have been = nice to have someone to ask questions, especially since our project = differed somewhat from what we read on the web.

Thanks so much for all your help,
Shawna Carraway

------_=_NextPart_001_01C00DE5.E673FAAA-- From jeremy@beopen.com Thu Aug 24 17:36:14 2000 From: jeremy@beopen.com (Jeremy Hylton) Date: Thu, 24 Aug 2000 12:36:14 -0400 (EDT) Subject: [XML-SIG] New data on speed of string appending In-Reply-To: <011a01c00d7c$fa636e60$7cac1218@reston1.va.home.com> References: <3935D13D.F4EAD64B@roguewave.com> <393C3FEB.A2DAB88E@roguewave.com> <011a01c00d7c$fa636e60$7cac1218@reston1.va.home.com> Message-ID: <14757.20222.790968.68955@bitdiddle.concentric.net> >>>>> "TP" == tpassin writes: TP> Remember back in June (June 22,2000), Bjorn Pettersen showed how TP> using StringIO instead of just appending to a string was TP> **much** faster? Then others showed that using TP> list.append/join(list) was also very fast. [...] TP> The results are dramatic. Method 1) is as good as or better TP> than anything until the string length exceeds about 1000 bytes. TP> Then Method 1 starts slowing down. Above about 4000 bytes, it's TP> really getting ssslllooowww. Here is a table of the results on TP> my system - 450 MHz PIII running Win98, Python 1.5.2. This is an empirical confirmation of something that analysis shows clearly. The repeated string concatenation method (buf = buf + s) creates a new string object each time; this costs a malloc and two memcpys. As the string increases, the begin of the string is copied repeatedly. The other methods defer the creation of a new string object until it is needed; it defers the mallocs and memcpy until the end and does them once. Jeremy From larsga@garshol.priv.no Thu Aug 24 20:39:13 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 24 Aug 2000 21:39:13 +0200 Subject: [XML-SIG] Questions about this application In-Reply-To: References: Message-ID: * Shawn Carraway | | We chose XML for building the database because it is also open | source and we believe it is the future of the WWW. First, XML is not open source. XML is just a standardized syntax, it's not a piece of software, although most XML software is open source. Also, XML might not be a good choice for a database system since XML in itself has very little of what a database system generally offers. The data model of XML is also rather different from most others, which means that it may not fit your data very well. I'm writing this as general information and advice, just so you know. | 1) First, can this be done? Well, you haven't said what you want to do, just what technologies you want to use. :) Those technologies fit well together, but whether they can do what you want to do I have no idea. | 2) Is SaX only available for Linux? At the moment, SAX is available in Java, Python and Perl, which means that it is available pretty much anywhere. Note that the XML-SIG package for Python contains SAX 1.0, and that we are working on SAX 2.0 for Python. | 3) What browser do you use to view XML with Linux? I use Netscape, | but the only browser I know that supports XML is IE5 - not available | for Linux, and Amaya - the W3C browser. Will this system only be available to users running Linux? If so, it sounds like a very strange requirement to me for a web-based system. Web-based systems are, after all, among the very few kinds of systems that are truly platform-independent. Why do you want to send XML to the browser? All browsers support HTML, so why not use that instead? BTW, Amaya does not support arbitrary XML, only XHTML and MathML. | 4) Can you direct me to more resources - like a user's group, maybe | - or a mailing list that would not flame stupid questions? Welcome to the XML-SIG. :-) | We did a database project with ASP this summer and while there was a | lot of documentation on the web, it would have been nice to have | someone to ask questions, especially since our project differed | somewhat from what we read on the web. The XML-L mailing list may also be a useful forum, but I think you will find many more people familiar with Python here. See for a list of mailing lists. --Lars M. From jp_sc@yahoo.com Fri Aug 25 02:20:46 2000 From: jp_sc@yahoo.com (JP S-C) Date: Thu, 24 Aug 2000 18:20:46 -0700 (PDT) Subject: [XML-SIG] XML for the Visually Impaired Message-ID: <20000825012046.4713.qmail@web2206.mail.yahoo.com> Dear XML Mailing List, Project Ocularis is looking for volunteer XML, C/C++, Perl, and Python developers. In brief, Ocularis is a distribution of the Linux Operating System that aims to allow the visually impaired to communicate, work, and express themselves through computers as well as to install and customize their system, independent of sighted assistance. More detailed information about Ocularis in included below. Developer Positions: XML developers will be working on one of Ocularis' subprojects, User Interface Markup Language (UIML) Implementation, which separates an application's functions from its User Interface. The UIML 2.0 specification, which is available at the UIML web site, "www.uiml.org", is "fully XML compliant" ("http://www.uiml.org/specs/UIML2/specification.html"). The ultimate goal of the Ocularis' UIML Implementation subproject is to aid developers in making their pre-existing or new applications easily and freely accessible in a wide variety of interfaces, including in the form of an Audio User Interface (AUI). C/C++ developers will be focusing on modifying current Linux installers, pre-existing speech synthesizers, and other basic applications that have already been created. In some cases the developers will be writing code from scratch. The basic applications that Ocularis will possess are a word processor, calendar, calculator, basic accounting or finance application, file manager, Internet browser, and e-mail client. Perl and Python developers will be working on various scripts that filter and organize information for the visually impaired, that provide links between commonly used, basic applications or utilites and pre-existing speech synthesis software, and that comprise fundamental applications. Details about Ocularis: The computing enviroment and suite of applications that are the goal of Ocularis will be free software (see "www.gnu.org" for a definition of free software) and will be based on Linux. The basic applications that Ocularis will possess are a word processor, calendar, calculator, basic accounting or finance application, file manager, Internet browser, and e-mail client. All of these programs will run smoothly on computers consisting of commonly available hardware costing less than $500 that can be bought at almost any local computer store. In comparison to current adaptive technology, this is both a drastic price drop and an increase in the availability of the required hardware. Ocularis was started in response to research on current adaptive technology, which culminated in the editorial "The Potential of Open Source for the Visually Impaired" (available at the Ocularis web site). The project is run and the software is developed completely by volunteers. If you would like to become involved in Project Ocularis, there are many areas (not pertaining to programming or technology) in which we would greatly appreciate assistance. We would be similarly grateful if you could help spread the word about Ocularis or could forward this message to someone who might be interested in the project. For more information, please visit the Ocularis web site, "http://ocularis.sourceforge.net/", or contact me directly. Thank you very much. --JP Schnapper-Casteras jpsc@users.sourceforge.net __________________________________________________ Do You Yahoo!? Yahoo! Mail - Free email you can access from anywhere! http://mail.yahoo.com/ From frank63@ms5.hinet.net Fri Aug 25 17:25:40 2000 From: frank63@ms5.hinet.net (Frank J.S. Chen) Date: Fri, 25 Aug 2000 16:25:40 -0000 Subject: [XML-SIG] Questions about this application Message-ID: <200008250810.QAA02126@ms5.hinet.net> Hi: 1) First, can this be done? What I read indicates that it can, but I want to make sure. If it cannot be done, there is no need to go on to the other questions. Ans:I think it might be done. But the data structures of XML documents should be designed carefully to be applicable for the backend database system, for example, XML with XQL.You may also need to develop tools with Python to 'query' XML documents to get stuff, and then make the results into another XML file. After this, use XSL/XSLT tool to transform XML into HTML to present data on your Web site. 2) Is SaX only available for Linux? I run Linux at home (as an alternative to M$), but I am not a programmer and am by no means skilled with Linux - I am an end user, although I did build the machine and installed/configured Linux myself. Ans: No. SAX is a specification for XML API, not a parser, not a driver, not a 'real' API. Implementation of SAX is based on programmig languages, like java, tcl/tk, perl, C++, and Python, etc. Python's xml.sax package is by now the implementation of SAX 1.0.I am waiting for the SAX 2.0 implementation:-). 3) What browser do you use to view XML with Linux? I use Netscape, but the only browser I know that supports XML is IE5 - not available for Linux, and Amaya - the W3C browser. Ans: I've heard that IE5 supports XML well, and Netscape 6 supports XML too. But I have not tried yet. However, it's should not be the point. But you should notice that Microsoft intends to develop their own XML, if you can get good presentation on IE5, but get bad presentation on Netscape 6, that's perhaps because you use the specific XML features of Microsoft's MSXML. 4) Can you direct me to more resources - like a user's group, maybe - or a mailing list that would not flame stupid questions? We did a database project with ASP this summer and while there was a lot of documentation on the web, it would have been nice to have someone to ask questions, especially since our project differed somewhat from what we read on the web. Ans: Actually, you need do it yourselves. Documentations and technical supports cannot detail everything for your needs. ~~By Frank From dream@aevum.net Fri Aug 25 14:22:24 2000 From: dream@aevum.net (Syn.Terra) Date: Fri, 25 Aug 2000 13:22:24 +0000 Subject: [XML-SIG] PyXML and BeOS Message-ID: <967209744_PM_BeOS.dream@aevum.net> I'm trying to get the latest PyXML distribution to run on BeOS, with catastrophic results. I run the setup.py script with the build argument, and when it gets to the first gcc calls, BeOS immediately freezes and goes to kernel debugging land. I'm wondering if this dillema is known, and when/how this might be resolved. If not, can you just interface with the xmllib included with the Python distro? What's the best place to learn how? Thanks, Syn.Terra ---- Syn.Terra Aevum Industries http://www.aevum.net From fdrake@beopen.com Fri Aug 25 19:10:44 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Fri, 25 Aug 2000 14:10:44 -0400 (EDT) Subject: [XML-SIG] Two pyexpat questions Message-ID: <14758.46756.389919.120300@cj42289-a.reston1.va.home.com> I have two questions about the pyexpat module. 1. The module actually defines a sub-module, pyexpat.errors, which provides a number of constants that give the error numbers reported by Expat. Does this really need to be a separate module? One issue with the way this is defined is that you can't say import pyexpat.errors unless you've also done an "import pyexpat" already (which was an improvement over not being able to import it at all, as in the original code). How much objection would there be to make it a single module? 2. Guido sees the pyexpat module as part of the "new XML order" for Python (my description), and I agree. He'd like it to be importable as xml.parsers.expat instead of pyexpat. This would break code, but I'm not sure how much. Would this be terribly objectionable if pyexpat became _expat, and xml/parsers/expat.py contained something like: """Really useful docstring...""" from _expat import * -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From walter@livinglogic.de Mon Aug 28 14:33:18 2000 From: walter@livinglogic.de (=?ISO-8859-1?Q?=22Walter_D=F6rwald=22?=) Date: Mon, 28 Aug 2000 15:33:18 +0200 Subject: [XML-SIG] Processing instructions in sgmlop Message-ID: <200008281533180968.011F4EB8@mail.tmt.de> Hello all! I think I discovered another bug in sgmlop. If I understood the XML standard (http://www.w3.org/TR/2000/WD-xml-2e-20000814#sec-pi) correctly, a processing instruction terminates with the next occurence of '?>': [16] PI ::=3D '' Char*)))?= '?>' [17] PITarget ::=3D Name - (('X' | 'x') ('M' | 'm') ('L' | 'l')) [2] Char ::=3D #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |= [#x10000-#x10FFFF] this would mean that the pi data may contain literal '>' characters. But sgmlop seems to end the pi at the next occurrence of '>': #!/usr/bin/env python from xml.parsers import sgmlop class Handler: def handle_proc(self,target,data): print "pi", target, data def handle_data(self,data): print "data", data parser =3D sgmlop.XMLParser() parser.register(Handler()) parser.parse('bar?>') The output from this short test is: > python test.py pi echo $foo- data bar?> Bye, Walter D=F6rwald -- Walter D=F6rwald =B7 LivingLogic AG =B7 Bayreuth, Germany =B7= www.livinglogic.de From guenther@e-mundo.de Mon Aug 28 17:28:23 2000 From: guenther@e-mundo.de (Guenther Palfinger) Date: Mon, 28 Aug 2000 18:28:23 +0200 Subject: [XML-SIG] precompiled version of PyXML Message-ID: <001401c0110c$fc27a430$1601a8c0@apollo> > Hello, > > where may I get the precompiled version for Windows of PyXML > > Thank you, > Günther Palfinger > > ___________________________________________________________________________ > eMundo GmbH - Schwanthalerstr. 102 - 80336 Munich - Germany > phone/fax: +49-89-130398-40/41 -- mobile: +49-178-8185002 > palfinger@e-mundo.de - www.e-mundo.de > > From thomas.heller@ion-tof.com Tue Aug 29 09:10:17 2000 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Tue, 29 Aug 2000 10:10:17 +0200 Subject: [XML-SIG] precompiled version of PyXML References: <001401c0110c$fc27a430$1601a8c0@apollo> Message-ID: <005401c01190$91447740$4500a8c0@thomasnb> I can send you a compiled version packed up with distutils windows installer (which needs testing anyway). Do you need it for 1.5 or 1.6? Thomas Heller ----- Original Message ----- From: "Guenther Palfinger" To: Sent: Monday, August 28, 2000 6:28 PM Subject: [XML-SIG] precompiled version of PyXML > > Hello, > > > > where may I get the precompiled version for Windows of PyXML > > > > Thank you, > > Günther Palfinger > > > > ___________________________________________________________________________ > > eMundo GmbH - Schwanthalerstr. 102 - 80336 Munich - Germany > > phone/fax: +49-89-130398-40/41 -- mobile: +49-178-8185002 > > palfinger@e-mundo.de - www.e-mundo.de > > > > > > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig From neeloy_saha@infy.com Tue Aug 29 12:08:25 2000 From: neeloy_saha@infy.com (neeloy_saha) Date: Tue, 29 Aug 2000 16:38:25 +0530 Subject: [XML-SIG] PyXML : winodws precompiled version-0.5.2 Message-ID: <8EE756E49A17D21194860008C7F49AFE0452908C@TWRMSG01> Hi, I want to get the precompiled version of pyxml for win95. Is there someplace that i can get it. The python/XM howto doc does not give the location...Can somebody help me to install it.. i am a python and xml newbie. Thx in advance -neeloy ---------------------------------------------------------------------------- ------------------------------------------------------- Python/XML HOWTO _________________________________________________________________ _________________________________________________________________ Python/XML HOWTO The Python/XML Special Interest Group xml-sig@python.org (edited by akuchling@acm.org) This is a draft document; 'XXX' in the text indicates that something has to be filled in later, or rewritten, or verified, or something. .... ..... Windows users should get the precompiled version at XXX From wkiri@CS.Cornell.EDU Wed Aug 30 20:58:33 2000 From: wkiri@CS.Cornell.EDU (Kiri Wagstaff) Date: Wed, 30 Aug 2000 15:58:33 -0400 (EDT) Subject: [XML-SIG] Problems installing PyXML-0.5.5.1 Message-ID: Hello, I recently downloaded PyXML-0.5.5.1 but have been unable to install it. I'm running Linux. The README says to > 1) > Run "python setup.py build" to copy *.py files and compile the C > extensions. When I do this, I get the following output: Executing 'build' action... Running command: make -f Makefile.pre.in boot rm -f *.o *~ rm -f `find . -name '*.pyc'` rm -f `find . -name '*.o'` rm -f `find . -name '*~'` cd expat ; make clean make[1]: Entering directory `/usr/src/PyXML-0.5.5.1/extensions/expat' rm -f xmltok/xmltok.o xmltok/xmlrole.o xmlwf/xmlwf.o xmlwf/xmlfile.o xmlwf/codepage.o xmlparse/xmlparse.o xmlparse/hashtable.o xmlwf/unixfilemap.o xmlwf/xmlwf make[1]: Leaving directory `/usr/src/PyXML-0.5.5.1/extensions/expat' rm -f *.a tags TAGS config.c Makefile.pre python sedscript rm -f *.so *.sl so_locations cd expat ; make clobber make[1]: Entering directory `/usr/src/PyXML-0.5.5.1/extensions/expat' rm -f xmltok/xmltok.o xmltok/xmlrole.o xmlwf/xmlwf.o xmlwf/xmlfile.o xmlwf/codepage.o xmlparse/xmlparse.o xmlparse/hashtable.o xmlwf/unixfilemap.o xmlwf/xmlwf rm -f libexpat.a make[1]: Leaving directory `/usr/src/PyXML-0.5.5.1/extensions/expat' VERSION=`python -c "import sys; print sys.version[:3]"`; \ installdir=`python -c "import sys; print sys.prefix"`; \ exec_installdir=`python -c "import sys; print sys.exec_prefix"`; \ make -f ./Makefile.pre.in VPATH=. srcdir=. \ VERSION=$VERSION \ installdir=$installdir \ exec_installdir=$exec_installdir \ Makefile make[1]: Entering directory `/usr/src/PyXML-0.5.5.1/extensions' make[1]: *** No rule to make target `/usr/lib/python1.5/config/Makefile', needed by `sedscript'. Stop. make[1]: Leaving directory `/usr/src/PyXML-0.5.5.1/extensions' make: *** [boot] Error 2 Running command: make make: *** No targets. Stop. Traceback (innermost last): File "setup.py", line 185, in ? func() File "setup.py", line 155, in build_unix shutil.copy('extensions/' + filename, 'build/xml/parsers/') File "/usr/lib/python1.5/shutil.py", line 52, in copy copyfile(src, dst) File "/usr/lib/python1.5/shutil.py", line 17, in copyfile fsrc = open(src, 'rb') IOError: [Errno 2] No such file or directory: 'extensions/pyexpat.so' Can you offer any suggestions? Thanks, Kiri ------------- Kiri Wagstaff, M.S. -------- wkiri@cs.cornell.edu ------------- And on the brave and crazy wings of youth They went flying around in the rain And their feathers, once so fine, Grew torn and tattered -- Jackson Browne, Before the Deluge ----------------------------------------------------------------------------- From frank63@ms5.hinet.net Thu Aug 31 06:08:55 2000 From: frank63@ms5.hinet.net (Frank J.S. Chen) Date: Thu, 31 Aug 2000 05:08:55 -0000 Subject: [XML-SIG] DOM:core.py Message-ID: <200008302055.EAA07306@ms5.hinet.net> Hi: As I imported sax_builder to write a DOM parser, I've found that my Chinese words can not represent as usual. Then I tracked down the codes in core.py, and the problem seemed to be located at the class Text. Here is the skeleton: class Text(CharacterData): childNodeTypes = [] nodeName = "#text" # Methods def __repr__(self): if len(self._node.value)<20: s=self._node.value else: s=self._node.value[:17] + '...' return '' % (repr(s),) The built-in function repr() makes conversions to fit with eval() , which then damage the encoding defined by other locales. It is better to use str() to replace repr() for now. str() will return a string if the passed value is the same as a string without any conversion: return '' % (str(s),) I deem it necessary that the truncated ouput constrained by the above if-else blocks needs to be refined for a better output solution, or mutiple-bytes words will still have bad outputs. It shouldn't do anything harm to print all contents out within these elements. ~Frank Chen From Anthony Baxter Thu Aug 31 17:26:22 2000 From: Anthony Baxter (Anthony Baxter) Date: Thu, 31 Aug 2000 16:26:22 +0000 Subject: [XML-SIG] Problems installing PyXML-0.5.5.1 In-Reply-To: Message from Kiri Wagstaff of "Wed, 30 Aug 2000 15:58:33 -0400." Message-ID: <200008311626.DAA03592@mbuna.arbhome.com.au> >>> Kiri Wagstaff wrote > I recently downloaded PyXML-0.5.5.1 but have been unable to install > it. I'm running Linux. The README says to > > make[1]: *** No rule to make target > `/usr/lib/python1.5/config/Makefile', needed by `sedscript'. Stop. > make[1]: Leaving directory `/usr/src/PyXML-0.5.5.1/extensions' You haven't installed the python-dev package. Anthony From dieter@handshake.de Wed Aug 30 21:47:07 2000 From: dieter@handshake.de (Dieter Maurer) Date: Wed, 30 Aug 2000 22:47:07 +0200 (CEST) Subject: [XML-SIG] Problems installing PyXML-0.5.5.1 In-Reply-To: References: Message-ID: <14765.29210.512630.475763@lindm.dm> Kiri Wagstaff writes: > make[1]: *** No rule to make target > `/usr/lib/python1.5/config/Makefile', needed by `sedscript'. Stop. This is an FAQ. I saw it at least 5 times in the last 2 months. Hope, people slowly recognize that the mailing list is archived. Okay, again: You must install the Python development RPM/package. Dieter