From martin at v.loewis.de Sun Dec 2 19:10:53 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 02 Dec 2007 19:10:53 +0100 Subject: [XML-SIG] Problems with PyXML Mac OS 10.5 install In-Reply-To: References: Message-ID: <4752F52D.90902@v.loewis.de> > It can load the xml module OK, but I presume that this is simply the old > version. It seems to me that it's simply not being installed - if I use > the spotlight to find xpath, it's only found in the local directory, not > where it should be installed. Well, "setup.py install" should tell you what files it copies - does that not look right? Notice that PyXML installs itself as _xmlplus. Regards, Martin From Pierre.DeWet at BITC.ORG.UK Mon Dec 3 12:03:22 2007 From: Pierre.DeWet at BITC.ORG.UK (Pierre DeWet) Date: Mon, 03 Dec 2007 11:03:22 +0000 Subject: [XML-SIG] XML-SIG Digest, Vol 56, Issue 1 (Out of office) Message-ID: I will be out of the office until Monday 10 December. If your request is urgent, please contact the helpdesk at: helpdesk at bitc.org.uk, alternatively, please dial: 0207 566 8771 Cheers Pierre From martin at v.loewis.de Tue Dec 4 00:39:29 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 04 Dec 2007 00:39:29 +0100 Subject: [XML-SIG] How to parse an XML in SAX In-Reply-To: References: Message-ID: <475493B1.7020900@v.loewis.de> > Hi I want to parse an XML using sax but my big issue are the > WhiteSpaces when they get reported. I want to know how to efficiently > ignore them. I know there are some DocumentHandlers and one specific > for ignore Whitespace but I still come up with a bunch of invisible > nodes like \t or \n. > > Anyone have a tutorial on how to handle SAX for this kind of parsing? In general, the notion of "significant whitespace" is pretty weak in XML (independent of SAX, so I don't think Stefan's bashing of SAX was of any help). Here is what I know about it: - white space should be preserved if the attribute xml:space was given on an element, and has the value of "preserve". Otherwise, it's up to the application on what precisely to do with white space. - white space in "element content" is usually considered ignorable, and the XML spec requires that it is reported as such. However, whether an element has element content depends on the DTD, so only a validating parser can know. If you turn on validation on in SAX, white space in element content will be reported through the "ignorableWhitespace" event. So, it's your own choice, and you should make that choice based on your knowledge of the actual XML application. Typical options are a) preserve all whitespace b) perform validation, then strip all whitespace in element content c) drop white space that completely spans from one tag to another, assuming the element has element content. In SAX, track characterData since either the last startElement or endElement, and then chose to drop the whitespace at the next startElement or endElement. d) In many cases, you have either element content or simple text content, so in SAX, you can drop the white space if you see nested elements. e) strip whitespace, in the sense of Python's string.strip. I.e. at endElement, perform .strip() on the collected data. HTH, Martin From gelston at doosanbabcock.com Wed Dec 5 11:38:18 2007 From: gelston at doosanbabcock.com (Elston, Gareth R) Date: Wed, 5 Dec 2007 10:38:18 -0000 Subject: [XML-SIG] Amara xml_xpath() behaviour Message-ID: <9D4464CAAAB788439D66EE2432F9B5F10292EE28@00001EXCH.uk.mitsuibabcock.com> Hi, I'm new to XML and I've just started using Amara - I'm very impressed. I've been trying to use xml_xpath() on a bindery object itself created with xml_xpath(). I didn't get what I expected, which may be my misunderstanding of what xml_xpath() is doing. Here's a short example to illustrate (I'm using Amara 1.2.0.2 and Python 2.4.3 on Windows XP.): In [1]: import amara In [2]: l = amara.parse('file:///F:/lines.xml') In [3]: print l.xml() In [4]: l.xml_xpath('//Line') Out[4]: [, , ] In [5]: print l.xml_xpath('//Line')[0].xml() In [6]: l.xml_xpath('//Line')[0].xml_xpath('//Point') Out[6]: [, , , , , ] I expected only 2 amara.bindery.Point objects in the last step. Is this (all 6 Points in the XML data) the expected behaviour? Thanks, Gareth From morillas at gmail.com Wed Dec 5 12:29:03 2007 From: morillas at gmail.com (Luis Miguel Morillas) Date: Wed, 5 Dec 2007 12:29:03 +0100 Subject: [XML-SIG] Amara xml_xpath() behaviour In-Reply-To: <9D4464CAAAB788439D66EE2432F9B5F10292EE28@00001EXCH.uk.mitsuibabcock.com> References: <9D4464CAAAB788439D66EE2432F9B5F10292EE28@00001EXCH.uk.mitsuibabcock.com> Message-ID: <68d25cbc0712050329q3cf4b58etce7793d3e1b9b38f@mail.gmail.com> 2007/12/5, Elston, Gareth R : > Hi, > > I'm new to XML and I've just started using Amara - I'm very impressed. > > I've been trying to use xml_xpath() on a bindery object itself created > with xml_xpath(). I didn't get what I expected, which may be my > misunderstanding of what xml_xpath() is doing. Here's a short example to > illustrate (I'm using Amara 1.2.0.2 and Python 2.4.3 on Windows XP.): > > In [1]: import amara > > In [2]: l = amara.parse('file:///F:/lines.xml') > > In [3]: print l.xml() > > > > > > > > > > > > > > > > > In [4]: l.xml_xpath('//Line') > Out[4]: > [, > , > ] > > In [5]: print l.xml_xpath('//Line')[0].xml() > > > > > > In [6]: l.xml_xpath('//Line')[0].xml_xpath('//Point') > Out[6]: > [, > , > , > , > , > ] > > I expected only 2 amara.bindery.Point objects in the last step. Is this > (all 6 Points in the XML data) the expected behaviour? > About XPath: http://www.w3.org/TR/xpath //Point selects all the Point descendants of the document root. l.xml_xpath('//Line')[0].xml_xpath('Point') or better: l.xml_xpath('//Line[1]/Point') But, be care because amara xpath has some problems ordering nodes (see http://lists.fourthought.com/pipermail/4suite/2007-June/008285.html) and it will not be fixed until amara 2.0 -- lm From info at thegrantinstitute.com Thu Dec 6 09:01:39 2007 From: info at thegrantinstitute.com (Anthony Jones) Date: 06 Dec 2007 00:01:39 -0800 Subject: [XML-SIG] Professional Grant Proposal Writing Workshop (January 2008: San Diego, CA) Message-ID: <20071206000138.817203F8CD6F9852@thegrantinstitute.com> An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20071206/0c0ea2ef/attachment.htm From info at thegrantinstitute.com Sun Dec 16 02:00:54 2007 From: info at thegrantinstitute.com (Anthony Jones) Date: 15 Dec 2007 17:00:54 -0800 Subject: [XML-SIG] Professional Grant Proposal Writing Workshop (January 2008: San Diego, CA) Message-ID: <20071215170054.800464A55D1EFDB5@thegrantinstitute.com> An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20071215/1474f5ce/attachment.htm From noreply at sourceforge.net Fri Dec 21 23:08:15 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Fri, 21 Dec 2007 14:08:15 -0800 Subject: [XML-SIG] [ pyxml-XBEL-1856104 ] getAttribute returns blank string Message-ID: XBEL item #1856104, was opened at 2007-12-21 16:08 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=707658&aid=1856104&group_id=6473 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Priority: 5 Private: No Submitted By: Gazi Alankus (alanic7) Assigned to: Nobody/Anonymous (nobody) Summary: getAttribute returns blank string Initial Comment: When I read an HTML document using xml.dom.ext.reader.HtmlLib.Reader, getAttribute() module for the elements returns blank string. Attached is a test source with comments on the prints. As far as I can see, xml/dom/Element.py implements getAttribute() as: def getAttribute(self, name): att = self.attributes.getNamedItem(name) return att and att.value or '' The last three prints are from that return line. I'm not sure if xml/dom/Element.py is the source that the getAttribute() I used, but my trials show that this code should work but does not. So either this is not the source, or maybe the python source is compiled and the compiler messes something up. Version info: pyXml pyxml-0.8.4, installed through Gentoo ebuild. Python 2.4.4 (#1, Nov 6 2007, 18:42:27) [GCC 4.1.2 (Gentoo 4.1.2)] on linux2 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=707658&aid=1856104&group_id=6473 From info at thegrantinstitute.com Fri Dec 28 07:01:42 2007 From: info at thegrantinstitute.com (Anthony Jones) Date: 27 Dec 2007 22:01:42 -0800 Subject: [XML-SIG] Professional Grant Proposal Writing Workshop (January 2008: San Diego, CA) Message-ID: <20071227220142.5A91A1FFB0C61F1D@thegrantinstitute.com> An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20071227/9522aefc/attachment.htm