From scipy-user-bounces at scipy.org Tue Mar 4 08:56:34 2008 From: scipy-user-bounces at scipy.org (scipy-user-bounces at scipy.org) Date: Tue, 04 Mar 2008 01:56:34 -0600 Subject: [XML-SIG] Your message to SciPy-user awaits moderator approval Message-ID: Your mail to 'SciPy-user' with the subject (no subject) Is being held until the list moderator can review it for approval. The reason it is being held: SpamAssassin identified this message as possible spam (score 6) Either the message will get posted to the list, or you will receive notification of the moderator's decision. If you would like to cancel this posting, please visit the following URL: http://projects.scipy.org/mailman/confirm/scipy-user/685b303324561f228108a315973e86c4a47cc0c0 From martin at v.loewis.de Sat Mar 8 13:11:52 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 08 Mar 2008 13:11:52 +0100 Subject: [XML-SIG] PyXML for py 2.5 In-Reply-To: <19904.194.114.62.39.1202833201.squirrel@groupware.dvs.informatik.tu-darmstadt.de> References: <8B473CE55AB5B34FAAE37EF3BE19EE949E09E6@esebe113.NOE.Nokia.com> <472ADB06.3090907@v.loewis.de> <472AE76E.8060305@behnel.de> <472AEA6A.9040102@v.loewis.de> <19904.194.114.62.39.1202833201.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Message-ID: <47D28288.7020403@v.loewis.de> > What about changing the "XML" link on the Python homepage to point to a > Wiki page? I think this one would come close: > > http://wiki.python.org/moin/PythonXml Ok, I changed it so. Regards, Martin From martin at v.loewis.de Mon Mar 10 08:06:31 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 10 Mar 2008 08:06:31 +0100 Subject: [XML-SIG] Converting XML Schema to data struture and then to XML In-Reply-To: <479DB671.8000608@behnel.de> References: <116467.61512.qm@web35906.mail.mud.yahoo.com> <479DB671.8000608@behnel.de> Message-ID: <47D4DDF7.1070801@v.loewis.de> > BTW, from the POV of objectify, generating Python classes from a schema would > basically mean infering a document instance from an XML Schema (sort of a > meta-model to model transformation). I find that an interesting relation, but > maybe that's just me... It's just you. It would *not* be a meta-model to model transformation, but a meta-model-to-meta-model one. The schema defines a type system, just as a set of Python classes does. Instances of the schema (i.e. a document) then correspond to a set of instances of these classes. It's a very natural thing to do, and has been done in other languages dozens of time. It gives the term "document type" a true representation in the programming language. Regards, Martin From stefan_ml at behnel.de Mon Mar 10 08:46:51 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 10 Mar 2008 08:46:51 +0100 Subject: [XML-SIG] Converting XML Schema to data struture and then to XML In-Reply-To: <47D4DDF7.1070801@v.loewis.de> References: <116467.61512.qm@web35906.mail.mud.yahoo.com> <479DB671.8000608@behnel.de> <47D4DDF7.1070801@v.loewis.de> Message-ID: <47D4E76B.1090606@behnel.de> Hi, Martin v. L?wis wrote: >> BTW, from the POV of objectify, generating Python classes from a schema would >> basically mean inferring a document instance from an XML Schema (sort of a >> meta-model to model transformation). I find that an interesting relation, but >> maybe that's just me... > > It's just you. It would *not* be a meta-model to model transformation, > but a meta-model-to-meta-model one. It obviously is a meta-to-meta model transformation to generate Python classes from a schema, but "inferring a document instance from an XML Schema" is not. > The schema defines a type system, > just as a set of Python classes does. Instances of the schema (i.e. > a document) then correspond to a set of instances of these classes. Objectify doesn't generate code. Instead, it comes with an extensible meta-model that resembles the basic Python type system, and which gets mapped on a tree at runtime. So the act of inferring the classes from the schema is actually linked to the instance, not the meta model. And the link is done through validation, which assures that the document really is an instance. So we end up with classes that represent an instance of a meta-model. There is no intermediate step of a meta-to-meta model transformation. Stefan From somayeh.farnoush at gmail.com Mon Mar 10 11:31:12 2008 From: somayeh.farnoush at gmail.com (Somayeh Farnoush) Date: Mon, 10 Mar 2008 02:31:12 -0800 Subject: [XML-SIG] Pyxml Message-ID: Dear sir, I have installed PyXml , but when I run this $ rpm -qa | grep python-xml it does not returned anything. does it mean that python-xml is missed? How can I fix it? regards, SF -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20080310/c0cda4a5/attachment.htm From stefan_ml at behnel.de Mon Mar 10 12:05:34 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 10 Mar 2008 12:05:34 +0100 Subject: [XML-SIG] Pyxml In-Reply-To: References: Message-ID: <47D515FE.7010200@behnel.de> Hi, Somayeh Farnoush wrote: > Dear sir, You've just missed half of the world population here. > I have installed PyXml , but when I run this > > $ rpm -qa | grep python-xml > > it does not returned anything. does it mean that python-xml is missed? How > can I fix it? Depends on how you installed it. Did you use rpm for it? Is the package called "python-xml" on your platform? What did rpm tell you when it installed it? Did you also try $ rpm -qa | grep -i python | grep -i xml Stefan From stefan_ml at behnel.de Mon Mar 10 12:34:28 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 10 Mar 2008 12:34:28 +0100 Subject: [XML-SIG] Pyxml In-Reply-To: References: <47D515FE.7010200@behnel.de> Message-ID: <47D51CC4.9090003@behnel.de> Hi, Somayeh Farnoush wrote: > I've installed it on redhat Enterprise4 and use the > PyXML-0.8.4.tar.gz > form http://sourceforge.net/project/showfiles.php?group_id=6473 and you've done *what* with that tar.gz file? In case you ran "setup.py install", note that you can also run python setup.py bdist_rpm to build a .rpm file which you can then install with the normal rpm tool. Stefan From strangest at comcast.net Mon Mar 10 12:49:53 2008 From: strangest at comcast.net (Gloria) Date: Mon, 10 Mar 2008 07:49:53 -0400 Subject: [XML-SIG] Pyxml In-Reply-To: <47D515FE.7010200@behnel.de> References: <47D515FE.7010200@behnel.de> Message-ID: <47D52061.5030100@comcast.net> Stefan Behnel wrote: > Hi, > > Somayeh Farnoush wrote: > >> Dear sir, >> > > You've just missed half of the world population here. > LOL! > > >> I have installed PyXml , but when I run this >> >> $ rpm -qa | grep python-xml >> >> it does not returned anything. does it mean that python-xml is missed? How >> can I fix it? >> > > Depends on how you installed it. Did you use rpm for it? Is the package called > "python-xml" on your platform? What did rpm tell you when it installed it? > > Did you also try > > $ rpm -qa | grep -i python | grep -i xml > > Stefan > > _______________________________________________ > XML-SIG maillist - XML-SIG at python.org > http://mail.python.org/mailman/listinfo/xml-sig > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20080310/a0760fdd/attachment.htm From stefan_ml at behnel.de Mon Mar 10 13:13:16 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 10 Mar 2008 13:13:16 +0100 Subject: [XML-SIG] Pyxml In-Reply-To: References: <47D515FE.7010200@behnel.de> <47D51CC4.9090003@behnel.de> Message-ID: <47D525DC.4060303@behnel.de> Hi, please keep this discussion on the list. Somayeh Farnoush wrote: > I've just run > setup.py build > setup.py install > ..... > I am trying to install mpi intel which needs Python and Pyxml ... in > troubleshooting part of installing intel mpi suggested to use the command > rpm -qa | grep python-xml > to enshure existance of pyxml properly. That's ok, they just didn't know what they were doing either. If you ran "setup.py install" as root (and it didn't fail), then PyXML should be correctly installed. It's just that RPM doesn't know about it as you didn't install it using the rpm command. Stefan From info at mosp-tech.se Mon Mar 10 23:30:44 2008 From: info at mosp-tech.se (info at mosp-tech.se) Date: Mon, 10 Mar 2008 23:30:44 +0100 (CET) Subject: [XML-SIG] PyXML Howto Message-ID: <63893.85.228.252.45.1205188244.squirrel@webmail01.one.com> Hi! I am writing to you to see if there is a tar.bz2 or tar.gz file of http://pyxml.sourceforge.net/topics/howto/xml-howto.html available to download. I am currently translating and posting different python related tutorials and howtos to swedish and i would like to do this with that howto. Best regards, Mikael J From stefan_ml at behnel.de Tue Mar 11 08:34:51 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 11 Mar 2008 08:34:51 +0100 Subject: [XML-SIG] PyXML Howto In-Reply-To: <63893.85.228.252.45.1205188244.squirrel@webmail01.one.com> References: <63893.85.228.252.45.1205188244.squirrel@webmail01.one.com> Message-ID: <47D6361B.6020402@behnel.de> Hi, info at mosp-tech.se wrote: > I am writing to you to see if there is a tar.bz2 or tar.gz file of > http://pyxml.sourceforge.net/topics/howto/xml-howto.html available to > download. I am currently translating and posting different python related > tutorials and howtos to swedish and i would like to do this with that > howto. Hmmm, that's an old version (0.7.1) of a tutorial for a no-longer-maintained library. I don't think there's much use in translating it. If you want to translate an XML tutorial for Python, especially for people who have little experience with XML processing, try this: http://effbot.org/zone/element.htm Or, ask in the comp.lang.python newsgroup what others consider a good "Python and XML" tutorial that's worth being translated. Stefan From ht at inf.ed.ac.uk Thu Mar 13 11:24:36 2008 From: ht at inf.ed.ac.uk (Henry S. Thompson) Date: Thu, 13 Mar 2008 10:24:36 +0000 Subject: [XML-SIG] PyXML for py 2.5 In-Reply-To: <47C10103.20908@v.loewis.de> (Martin v. =?iso-2022-int-1?B?TPZ3aXMncw==?= message of "Sun, 24 Feb 2008 06:30:43 +0100") References: <8B473CE55AB5B34FAAE37EF3BE19EE949E09E6@esebe113.NOE.Nokia.com> <472ADB06.3090907@v.loewis.de> <472AE76E.8060305@behnel.de> <472AEA6A.9040102@v.loewis.de> <19904.194.114.62.39.1202833201.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <47B1D1FD.7010407@rksystems.com> <47C10103.20908@v.loewis.de> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Martin v. L?wis writes: > If you found that validation is a processing need, I strongly recommend > that you re-evaluate your processing needs (whether you use Python > or not). IMHO, validation is much over-rated and over-used. Strong words, which I strongly disagree with. "Validate at trust boundaries" is a long-standing and helpful mantra, IMO. If you're only processing XML you produce yourself, sure, validation is probably unnecessary. But if you're accepting XML from others, a validating parser will simplify your code and give your users better error reporting. ht - -- Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh Half-time member of W3C Team 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 650-4587, e-mail: ht at inf.ed.ac.uk URL: http://www.ltg.ed.ac.uk/~ht/ [mail really from me _always_ has this .sig -- mail without it is forged spam] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (GNU/Linux) iD8DBQFH2QDkkjnJixAXWBoRAqt5AJ9Np/DQ9YlscIIkIda9fMJDt8AegQCdGAS3 5T+DZHKYZnKzazF/C1w6i2g= =XEqC -----END PGP SIGNATURE----- From naeemkhans79 at hotmail.com Sun Mar 16 19:45:43 2008 From: naeemkhans79 at hotmail.com (Khan1814) Date: Sun, 16 Mar 2008 11:45:43 -0700 (PDT) Subject: [XML-SIG] XMI Access in Netbeans 5.5 Message-ID: <16082402.post@talk.nabble.com> Hello everyone, I have transformed the UML diagram into XMI file and wana to use it in Netbeans for the purpose of checking logical errors in the class diagram. In this connection, I have also added the MDR library in the project. But I dont know how to assign my XMI file to it and access it in java. Can any body help me out to make it posible?? Regards, Khan -- View this message in context: http://www.nabble.com/XMI-Access-in-Netbeans-5.5-tp16082402p16082402.html Sent from the Python - xml-sig mailing list archive at Nabble.com. From spammb at gmail.com Thu Mar 20 23:27:09 2008 From: spammb at gmail.com (Michael Becker) Date: Thu, 20 Mar 2008 15:27:09 -0700 (PDT) Subject: [XML-SIG] Issues with XMLTreeBuilder in cElementTree and ElementTree (Cross-post from comp.lang.python) Message-ID: <697339f4-c549-4022-b945-434f2909cdfc@13g2000hsb.googlegroups.com> I had some xmls being output by an application whose formatting did not allow for easy editing by humans so I was trying to write a short python app to pretty print xml files. Most of the data in these xml files is in the attributes so I wanted each attribute on its own line. I wrote a short app using xml.etree.ElementTree.XMLTreeBuilder(). To my dismay the attributes were getting reordered. I found that the implementation of XMLTreeBuilder did not make proper use of the ordered_attributes attribute of the expat parser (which it defaults to). The constructor sets ordered_attributes = 1 but then the _start_list method iterates through the ordered list of attributes and stores them in a dictionary! This is incredibly unintuitive and seems to me to be a bug. I would recommend the following changes to ElementTree.py: class XMLTreeBuilder: ... def _start_list(self, tag, attrib_in): fixname = self._fixname tag = fixname(tag) attrib = [] if attrib_in: for i in range(0, len(attrib_in), 2): attrib.append((fixname(attrib_in[i]),self._fixtext(attrib_in[i+1]))) return self._target.start(tag, attrib) class _ElementInterface: ... def items(self): try: return self.attrib.items() except AttributeError: return self.attrib These changes would allow the user to take advantage of the ordered_attributes attribute in the expat parser to use either ordered or unorder attributes as desired. For backwards compatibility it might be desirable to change XMLTreeBuilder to default to ordered_attributes = 0. I've never submitted a bug fix to a python library so if this seems like a real bug please let me know how to proceed. Secondly, I found a potential issue with the cElementTree module. My understanding (which could be incorrect) of python C modules is that they should work the same as the python versions but be more efficient. The XMLTreeBuilder class in cElementTree doesn't seem to be using the same parser as that in ElementTree. The following code illustrates this issue: >>> import xml.etree.cElementTree >>> t1=xml.etree.cElementTree.XMLTreeBuilder() >>> t1._parser.ordered_attributes = 1 Traceback (most recent call last): File "", line 1, in AttributeError: _parser >>> import xml.etree.ElementTree >>> t1=xml.etree.ElementTree.XMLTreeBuilder() >>> t1._parser.ordered_attributes = 1 In case it is relevant, here is the version and environment information: tpadmin at osswlg1{/tpdata/ossgw/config} $ python -V Python 2.5.1 tpadmin at osswlg1{/tpdata/ossgw/config} $ uname -a SunOS localhost 5.10 Generic_118833-33 sun4u sparc SUNW,Netra-240 From smcg4191 at frii.com Mon Mar 24 04:56:59 2008 From: smcg4191 at frii.com (Stuart McGraw) Date: Sun, 23 Mar 2008 21:56:59 -0600 Subject: [XML-SIG] lxml iterparse and comments Message-ID: Hello, I am probably mising something elementary (I am new to both xml and lxml), but I am having problems figuring out how to get comments when using lxml's iterparse(). When I parse xml with parse() and iterate though the result, I get the comments. But when I try to do the same thing (approximately I think) with iterparse, I don't see any comments. See example code below. (lxml-2.02, Python-2.5.1) (I was using the standard Python ElementTree but my understanding is that it doesn't save comments at all. If that's wrong I would go back to using it). The real file is ~50MB and has about 1M nodes under the root so I have to use iterparse and I also have to process comments, so I would really appreciate a clue about how to do it. Thanks. Example code: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ import lxml.etree as ET from cStringIO import StringIO # XML data... #============================================= xmltxt = \ ''' ]> text 1 text 2 ''' #============================================= print 'Parse:\n------' et = ET.parse( StringIO (xmltxt)) for elem in et.iter(): print elem print '\nIterparse:\n----------' xx = ET.iterparse( StringIO (xmltxt), ("start","end")) for event, elem in iter(xx): print event, elem From stefan_ml at behnel.de Mon Mar 24 08:33:53 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 24 Mar 2008 08:33:53 +0100 Subject: [XML-SIG] lxml iterparse and comments In-Reply-To: References: Message-ID: <47E75961.2040503@behnel.de> Hi, Stuart McGraw wrote: > I am probably mising something elementary (I am new > to both xml and lxml), but I am having problems figuring > out how to get comments when using lxml's iterparse(). > When I parse xml with parse() and iterate though the > result, I get the comments. But when I try to do the > same thing (approximately I think) with iterparse, > I don't see any comments. While the comments end up in the tree that iterparse generates, they do not show up in the events. Now that you mention it, I actually think that should change. There should be events "comment" and "pi" that yield them if requested. > I was using the standard Python ElementTree but my > understanding is that it doesn't save comments at all. ElementTree strips comments in the parser, that's right. > The real file is ~50MB and has about 1M nodes under the > root so I have to use iterparse and I also have to process > comments, so I would really appreciate a clue about how > to do it. Thanks. Have you tried the parser target interface? It's a SAX-like interface that uses callbacks. http://codespeak.net/lxml/parsing.html#the-target-parser-interface http://effbot.org/elementtree/elementtree-xmlparser.htm#the-target-interface Stefan From jmaze at fas.harvard.edu Tue Mar 25 01:22:02 2008 From: jmaze at fas.harvard.edu (Jero Maze) Date: Mon, 24 Mar 2008 20:22:02 -0400 Subject: [XML-SIG] How do I test PyXML Message-ID: <47E845AA.80809@fas.harvard.edu> To whom it may concern, I'm trying to use the extension for "Inkscape", "textext" which needs "PyXML". I've been trying to install PyXML and then uses "textext" without success so I don't know if I'm installing PyXML correctly. My OS is Mac OS X version 10.5.2 When I run "python regrtest.py" I got the message below (this might be helpful) Sincerely, Jero test_c14n test test_c14n skipped -- an optional feature could not be imported test_dom test test_dom skipped -- an optional feature could not be imported test_domreg test_encodings test_expatreader test test_expatreader failed -- Traceback (most recent call last): File "/Applications/MyApplications/PyXML-0.8.4/test/test_expatreader.py", line 21, in setUp self.parser.setFeature(handler.feature_namespace_prefixes, 1) File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/xml/sax/expatreader.py", line 157, in setFeature "expat does not report namespace prefixes") SAXNotSupportedException: expat does not report namespace prefixes test_filter test test_filter failed -- Writing: u'textmoreabcxyz', expected: '\ntextmoreabcxyz : 'module' object has no attribute 'DefaultHandler' test_htmlb test test_htmlb skipped -- an optional feature could not be imported test_javadom test test_javadom skipped -- an optional feature could not be imported test_marshal test test_marshal skipped -- an optional feature could not be imported test_minidom test test_minidom failed -- Writing: 'Test Failed: ', expected: '' test_ns test test_ns skipped -- an optional feature could not be imported test_pyexpat test_sax test test_sax skipped -- an optional feature could not be imported test_sax2 test test_sax2 skipped -- an optional feature could not be imported test_sax2_xmlproc test_sax_xmlproc test test_sax_xmlproc skipped -- an optional feature could not be imported test_saxdrivers test test_saxdrivers skipped -- an optional feature could not be imported test_utils test test_utils skipped -- an optional feature could not be imported test_xmlbuilder test test_xmlbuilder failed -- errors occurred; run in verbose mode for details test_xmlproc test test_xmlproc skipped -- an optional feature could not be imported 4 tests OK. 5 tests failed: test_expatreader test_filter test_howto test_minidom test_xmlbuilder 12 tests skipped: test_c14n test_dom test_htmlb test_javadom test_marshal test_ns test_sax test_sax2 test_sax_xmlproc test_saxdrivers test_utils test_xmlproc From smcg4191 at frii.com Tue Mar 25 05:19:15 2008 From: smcg4191 at frii.com (Stuart McGraw) Date: Mon, 24 Mar 2008 22:19:15 -0600 Subject: [XML-SIG] lxml iterparse and comments Message-ID: <47E87D43.1090802@frii.com> Hello Stefan, Thanks for your response. > Stuart McGraw wrote: > > I am probably mising something elementary (I am new > > to both xml and lxml), but I am having problems figuring > > out how to get comments when using lxml's iterparse(). > > When I parse xml with parse() and iterate though the > > result, I get the comments. But when I try to do the > > same thing (approximately I think) with iterparse, > > I don't see any comments. > > While the comments end up in the tree that iterparse generates, > they do not show up in the events. Now that you mention it, I > actually think that should change. There should be events > "comment" and "pi" that yield them if requested. That would be ideal, from my perspective. It also seems more consistent with the other interfaces (parse, parse target, etc) > > I was using the standard Python ElementTree but my > > understanding is that it doesn't save comments at all. > > ElementTree strips comments in the parser, that's right. > > > The real file is ~50MB and has about 1M nodes under the > > root so I have to use iterparse and I also have to process > > comments, so I would really appreciate a clue about how > > to do it. Thanks. > > Have you tried the parser target interface? It's a SAX-like > interface that uses callbacks. > > http://codespeak.net/lxml/parsing.html#the-target-parser-interface > http://effbot.org/elementtree/elementtree-xmlparser.htm#the-target-interfa ce Thanks for pointing that out. I'd seen it in the docs but hadn't appreciated that it was relevant. However, I am having trouble getting it to work. Specifically, the test code below produces the output I expected when run with cElementTree, but with lxml, it is missing "end" callbacks, the second "start(entry) " callback, and the resolved entity text. Am I doing something wrong? Test code: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #import xml.etree.cElementTree as ET import lxml.etree as ET from cStringIO import StringIO # XML data... #============================================= xmltxt = \ ''' ]> text 1 is &ex; text 2 ''' #============================================= print '\nTargetParser:\n-------------' try: XMLParser = ET.XMLParser except AttributeError: XMLParser = ET.XMLTreeBuilder class EchoTarget: def comment(self, tag): print "comment", tag def start(self, tag, attrib): print "start", tag, attrib def end(self, tag): print "end", tag def data(self, data): print "data", repr(data) def close(self): print "close" return "closed!" parser = XMLParser( target = EchoTarget()) result = ET.parse( StringIO (xmltxt), parser) From stefan_ml at behnel.de Tue Mar 25 22:04:02 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 25 Mar 2008 22:04:02 +0100 Subject: [XML-SIG] lxml iterparse and comments In-Reply-To: <47E87D43.1090802@frii.com> References: <47E87D43.1090802@frii.com> Message-ID: <47E968C2.6030905@behnel.de> Hi, Stuart McGraw wrote: >> Stuart McGraw wrote: >> > I am probably mising something elementary (I am new >> > to both xml and lxml), but I am having problems figuring >> > out how to get comments when using lxml's iterparse(). >> > When I parse xml with parse() and iterate though the >> > result, I get the comments. But when I try to do the >> > same thing (approximately I think) with iterparse, >> > I don't see any comments. >> >> While the comments end up in the tree that iterparse generates, they >> do not show up in the events. Now that you mention it, I >> actually think that should change. There should be events >> "comment" and "pi" that yield them if requested. > > That would be ideal, from my perspective. It also seems > more consistent with the other interfaces (parse, parse target, > etc) Implemented on the trunk, will be in lxml 2.1. >> Have you tried the parser target interface? > I am having trouble getting it to work. Specifically, the test > code below produces the output I expected when run with > cElementTree, but with lxml, it is missing "end" callbacks, > the second "start(entry) " callback, and the resolved entity > text. Am I doing something wrong? > > Test code: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > #import xml.etree.cElementTree as ET > import lxml.etree as ET > from cStringIO import StringIO > > # XML data... > #============================================= > xmltxt = \ > ''' > > > > > > ]> > > > > text 1 is &ex; > > text 2 > ''' > #============================================= > > print '\nTargetParser:\n-------------' > > try: XMLParser = ET.XMLParser > except AttributeError: XMLParser = ET.XMLTreeBuilder > > class EchoTarget: > def comment(self, tag): > print "comment", tag > def start(self, tag, attrib): > print "start", tag, attrib > def end(self, tag): > print "end", tag > def data(self, data): > print "data", repr(data) > def close(self): > print "close" > return "closed!" > > parser = XMLParser( target = EchoTarget()) > result = ET.parse( StringIO (xmltxt), parser) I can reproduce that. Seems to require an entity reference in the data, though. I'll look into it. Stefan From stefan_ml at behnel.de Tue Mar 25 23:04:37 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 25 Mar 2008 23:04:37 +0100 Subject: [XML-SIG] Issues with XMLTreeBuilder in cElementTree and ElementTree In-Reply-To: <697339f4-c549-4022-b945-434f2909cdfc@13g2000hsb.googlegroups.com> References: <697339f4-c549-4022-b945-434f2909cdfc@13g2000hsb.googlegroups.com> Message-ID: <47E976F5.3020704@behnel.de> Hi again, Michael Becker wrote: > These changes would allow the user to take advantage of the > ordered_attributes attribute in the expat parser to use either ordered > or unorder attributes as desired. For backwards compatibility it might > be desirable to change XMLTreeBuilder to default to ordered_attributes > = 0. I've never submitted a bug fix to a python library so if this > seems like a real bug please let me know how to proceed. > > Secondly, I found a potential issue with the cElementTree module. My > understanding (which could be incorrect) of python C modules is that > they should work the same as the python versions but be more > efficient. The XMLTreeBuilder class in cElementTree doesn't seem to be > using the same parser as that in ElementTree. The following code > illustrates this issue: > >>>> import xml.etree.cElementTree >>>> t1=xml.etree.cElementTree.XMLTreeBuilder() >>>> t1._parser.ordered_attributes = 1 > Traceback (most recent call last): > File "", line 1, in > AttributeError: _parser (c)ET's XMLParser has an attribute "parser" that references the expat parser instance. It was renamed in newer versions. Stefan From 2huggie at gmail.com Wed Mar 26 08:12:28 2008 From: 2huggie at gmail.com (Timothy Wu) Date: Wed, 26 Mar 2008 15:12:28 +0800 Subject: [XML-SIG] Content is split into two Message-ID: Hi, I post the following in the Python mailing list but no one responded. So I'm posting here again. ------------ Hi, I have created a very, very simple parser for an XML. class FindGoXML2(ContentHandler): def characters(self, content): print content I have made it simple because I want to debug. This prints out any content enclosed by tags (right?). The XML is publicly available here: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=9622&retmode=xml I show a few line embedded in this XML: GO 3824 catalytic activity evidence: IEA Notice the third line before the last. I expect my content printout to print out "evidence:IEA". However this is what I get. ------------------------- catalytic activity ==> this is the print out the line before e vidence: IEA ------------------------- I don't understand why a few blank lines were printed after "catalytic activity". But that doesn't matter. What matters is where the string "evidence: IEA" is split into two printouts. First it prints only "e", then "vidence: IEA". I parsed 825 such XMLs without a problem, this occurs on my 826th XML. Any explanations?? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20080326/5e67a967/attachment.htm From jcd at unc.edu Wed Mar 26 14:39:21 2008 From: jcd at unc.edu (J. Cliff Dyer) Date: Wed, 26 Mar 2008 09:39:21 -0400 Subject: [XML-SIG] Content is split into two In-Reply-To: References: Message-ID: <1206538761.3328.3.camel@aalcdl07.lib.unc.edu> On Wed, 2008-03-26 at 15:12 +0800, Timothy Wu wrote: > Hi, I post the following in the Python mailing list but no one > responded. So I'm posting here again. > > ------------ > > Hi, > > I have created a very, very simple parser for an XML. > > class FindGoXML2(ContentHandler): > def characters(self, content): > print content > > I have made it simple because I want to debug. This prints out any > content enclosed by tags (right?). > > The XML is publicly available here: > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=9622&retmode=xml > > I show a few line embedded in this XML: > > > > > > GO > > > 3824 _id> > > > > > catalytic > activity > evidence: > IEA > > > > Notice the third line before the last. I expect my content printout to > print out "evidence:IEA". > However this is what I get. > > ------------------------- > catalytic activity ==> this is the print out the line before > > > > e > vidence: IEA > ------------------------- > > I don't understand why a few blank lines were printed after "catalytic > activity". But that > doesn't matter. What matters is where the string "evidence: IEA" is > split into two printouts. > First it prints only "e", then "vidence: IEA". I parsed 825 such XMLs > without a problem, > this occurs on my 826th XML. > > Any explanations?? The parser will retrieve input in chunks of unspecified size. There is no guarantee that a text block will all get returned at once. You are seeing this problem because the print statement adds a newline after it prints. If you want to see the text itself, without phantom newlines, try replacing print with sys.stdout.write(). Cheers, Cliff From smcg4191 at frii.com Wed Mar 26 17:11:10 2008 From: smcg4191 at frii.com (Stuart McGraw) Date: Wed, 26 Mar 2008 10:11:10 -0600 Subject: [XML-SIG] lxml iterparse and comments Message-ID: <47EA759E.1080103@frii.com> Stefan Behnel wrote: [...re adding comment and pi events to iterparse...] > Implemented on the trunk, will be in lxml 2.1. Thanks. [... re missing callbacks from target parser...] > I can reproduce that. Seems to require an entity reference in the data, > though. I'll look into it. [and(from lxml-dev)] > Fixed for 2.0.3. Thanks again! From martin at v.loewis.de Wed Mar 26 20:54:19 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 26 Mar 2008 20:54:19 +0100 Subject: [XML-SIG] How do I test PyXML In-Reply-To: <47E845AA.80809@fas.harvard.edu> References: <47E845AA.80809@fas.harvard.edu> Message-ID: <47EAA9EB.8020205@v.loewis.de> > I'm trying to use the extension for "Inkscape", "textext" which needs > "PyXML". I've been trying to install PyXML and then uses "textext" > without success so I don't know if I'm installing PyXML correctly. > > My OS is Mac OS X version 10.5.2 > > When I run "python regrtest.py" I got the message below (this might be > helpful) Did you install PyXML, using "setup.py install"? It seems you are not picking up the installed copy, but the standard XML packages from your Python 2.5 installation. Regards, Martin From 2huggie at gmail.com Thu Mar 27 05:01:38 2008 From: 2huggie at gmail.com (Timothy Wu) Date: Thu, 27 Mar 2008 12:01:38 +0800 Subject: [XML-SIG] Content is split into two In-Reply-To: <1206538761.3328.3.camel@aalcdl07.lib.unc.edu> References: <1206538761.3328.3.camel@aalcdl07.lib.unc.edu> Message-ID: On Wed, Mar 26, 2008 at 9:39 PM, J. Cliff Dyer wrote: > The parser will retrieve input in chunks of unspecified size. There is > no guarantee that a text block will all get returned at once. You are > seeing this problem because the print statement adds a newline after it > prints. If you want to see the text itself, without phantom newlines, > try replacing print with sys.stdout.write(). > > Cheers, > Cliff Thanks for the help. Now I see that on page http://pyxml.sourceforge.net/topics/howto/node14.html "You also shouldn't assume that all the characters are passed in a single function call." Wow, totally unexpected. Wonder why it's designed as it is? This is especially weird to me since the string size isn't big (small buffer) and this add a bit of complexity to the text processing. Now I have to set flag to make sure that I should finish off when moving out of the tag. This now all sounds like of de javu, maybe I ran into this before. =/ I don't process XML that often. Timothy -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20080327/79b0c6eb/attachment.htm From stefan_ml at behnel.de Thu Mar 27 09:23:54 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 27 Mar 2008 09:23:54 +0100 Subject: [XML-SIG] Content is split into two In-Reply-To: References: <1206538761.3328.3.camel@aalcdl07.lib.unc.edu> Message-ID: <47EB599A.1060802@behnel.de> Hi, Timothy Wu wrote: > "You also shouldn't assume that all the characters are passed in a single > function call." > > Wow, totally unexpected. Wonder why it's designed as it is? This is > especially weird to me since the string size isn't big (small buffer) and For you maybe, but nothing keeps an XML document from having text entries of a couple of megabytes, possibly separated by entity references. Aggregating all that in memory could be quite expensive, so it's a good design choice not to require that in the parser. > this add a bit of complexity to the text processing. Not that much. The usual pattern is: append text content to a list and join it when you see something that's not text. That works very well unless your strings are really long. Stefan From debian-users-admin at debian.or.jp Thu Mar 27 15:28:00 2008 From: debian-users-admin at debian.or.jp (debian-users-admin at debian.or.jp) Date: Thu, 27 Mar 2008 23:28:00 +0900 Subject: [XML-SIG] Subscribe request result (debian-users ML) References: <20080327142754.BA478C2DFF@osdn.debian.or.jp> Message-ID: <200803272328.FMLAAA13658.debian-users@debian.or.jp> Hi, I am the fml ML manager for the ML . --debian-users at debian.or.jp, Be Seeing You! ************************************************************ If you have any questions or problems, please contact debian-users-admin at debian.or.jp ************************************************************ From fredrik at pythonware.com Sun Mar 30 15:28:38 2008 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sun, 30 Mar 2008 15:28:38 +0200 Subject: [XML-SIG] Issues with XMLTreeBuilder in cElementTree and ElementTree In-Reply-To: <47E976F5.3020704@behnel.de> References: <697339f4-c549-4022-b945-434f2909cdfc@13g2000hsb.googlegroups.com> <47E976F5.3020704@behnel.de> Message-ID: Stefan Behnel wrote: > (c)ET's XMLParser has an attribute "parser" that references the expat parser > instance. It was renamed in newer versions. cElementTree doesn't use the pyexpat API, and the expat binding it uses doesn't support the ordered_attributes nonsense (*) at all. *) it's an XML parser, after all. bugs in downstream tools should be fixed in those tools, or by post-processing, not by hacking XML tools to produce things that are not XML. From HDoran at air.org Mon Mar 31 19:34:58 2008 From: HDoran at air.org (Doran, Harold) Date: Mon, 31 Mar 2008 13:34:58 -0400 Subject: [XML-SIG] Learning to use elementtree Message-ID: <2323A6D37908A847A7C32F1E3662C80E017BDC9A@dc1ex01.air.org> Dear List: I am brand new to xml and have some experience with python using it to parse through text files. Now, however, I need to use python to parse through some xml files. I am working with elementtree right now and am able to make this work on some toy examples. Things are going well with these toy examples. But, now I am trying to apply the code I have written to a real xml file I need to work with and things are hitting a road block. Is anyone on this able willing to look at an xml file I can send them and work with me through a small example to see if I can get this to work? I am working with python 2.5.2 for windows XP. Harold -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20080331/76c7da27/attachment.htm