From and-xml at doxdesk.com Sun Feb 1 07:43:42 2004 From: and-xml at doxdesk.com (Andrew Clover) Date: Sun Feb 1 08:17:46 2004 Subject: [XML-SIG] DOM memory problem In-Reply-To: References: Message-ID: <20040201124342.GA4412@doxdesk.com> Bruce Jewell wrote: > My problem is that this threw an exception IOError:[Errno 12]Not > enough space. You're using Windows, right? sys.stdin behaves oddly under Python for Windows, it's not a PyXML-specific problem. Was mentioned here with no obvious solution: http://groups.google.de/groups?threadm=mailman.1020460208.1269.python-list%40python.org&rnum=1 (On Win 9x, stdout may be affected too, judging by this:) http://mail.python.org/pipermail/python-bugs-list/2002-November/014356.html PyXML is trying to read stdin to an arbitrarily-sized buffer. If the buffer is larger than a certain critical size, Windows will give it an IOError. This size is 0x6FFF on my Py2.3/WinXP setup, unless Python is run with the -u switch in which case (for some reason) it is 0x4FFF. This may vary; 1.5.2 also seems to get stuck at 0x4FFFF. Currently xml.sax.xmlreader is defaulting to 0x10000: def __init__(self, bufsize=2**16): if this line (111) is changed to 0x4000 it seems to make the problems go away. It may be worth making this change to PyXML to avoid this problem cropping up for Win users. But in the longer term think Python itself ought to include a workaround (assuming it is indeed a bug in Microsoft's stdio implementation). -- Andrew Clover mailto:and@doxdesk.com http://www.doxdesk.com/ From morillas at posta.unizar.es Mon Feb 2 19:24:45 2004 From: morillas at posta.unizar.es (luis miguel morillas) Date: Mon Feb 2 19:05:23 2004 Subject: [XML-SIG] libxslt error Message-ID: <20040203002445.GA1686@marmota> I've installed libxml2-2.6.4 and libxslt-1.1.2 in a linux debian with python2.3. When i import libxslt, i get this error: >>> import libxml2 >>> import libxslt Traceback (most recent call last): File "", line 1, in ? File "/usr/lib/python2.3/site-packages/libxslt.py", line 46, in ? import libxsltmod ImportError: /usr/lib/python2.3/site-packages/libxsltmod.so: undefined symbol: xsltSetCtxtParseOptions I don't know how to solve this error. Any suggestion? -- Luis Miguel No a las patentes de software en Europa EuropeSwPatentFree http://EuropeSwPatentFree.hispalinux.es From bortzmeyer at nic.fr Tue Feb 3 04:15:49 2004 From: bortzmeyer at nic.fr (Stephane Bortzmeyer) Date: Tue Feb 3 04:16:08 2004 Subject: [XML-SIG] Re: libxslt error In-Reply-To: <20040203002445.GA1686@marmota> References: <20040203002445.GA1686@marmota> Message-ID: <20040203091549.GA6975@nic.fr> On Tue, Feb 03, 2004 at 01:24:45AM +0100, luis miguel morillas wrote a message of 26 lines which said: > I've installed libxml2-2.6.4 and libxslt-1.1.2 in a linux debian > with python2.3. > When i import libxslt, i get this error: It works for me. Python 2.3.3, libxml 2.6.4, libxslt 1.1.2. ~ % python Python 2.3.3 (#2, Jan 13 2004, 00:47:05) [GCC 3.3.3 20040110 (prerelease) (Debian)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import libxml2 Traceback (most recent call last): File "", line 1, in ? ImportError: No module named libxml2 >>> import libxml2 >>> import libxslt >>> From bortzmeyer at nic.fr Tue Feb 3 04:45:59 2004 From: bortzmeyer at nic.fr (Stephane Bortzmeyer) Date: Tue Feb 3 04:46:09 2004 Subject: [XML-SIG] Re: libxslt error In-Reply-To: <20040203091549.GA6975@nic.fr> References: <20040203002445.GA1686@marmota> <20040203091549.GA6975@nic.fr> Message-ID: <20040203094559.GA7638@nic.fr> On Tue, Feb 03, 2004 at 10:15:49AM +0100, Stephane Bortzmeyer wrote a message of 25 lines which said: > >>> import libxml2 > Traceback (most recent call last): > File "", line 1, in ? > ImportError: No module named libxml2 > >>> import libxml2 I forgot to add that I typed "apt-get install libxml2-python2.3" between the two imports :-) From cuba at iotacenter.org Tue Feb 3 17:25:58 2004 From: cuba at iotacenter.org (Larry Cuba) Date: Tue Feb 3 17:26:27 2004 Subject: [XML-SIG] Basic Question Message-ID: <4.3.2.7.2.20040203132427.03e21ef0@mail.well.com> Hi Folks, I'm new to PyXML and having trouble finding the answer to a most basic question. With this slight modification of the sample code in section 6.5 of the Python/XML How To, Walking the Over the Entire Tree: >walker = xml_dom_object.createTreeWalker(xml_dom_object.documentElement, > NodeFilter.SHOW_ALL, None, 0) >while 1: > print walker.currentNode.nodeName, walker.currentNode.nodeValue > next = walker.nextNode() > if next is None: break I was able to get this output (every node name followed by its value): City None #text Santa Monica #text State None #text CA #text Country None #text US #text The output I want is just the element name and the *element* value (not node value): City Santa Monica State CA Country US ... The relevant methods and/or attributes and how to use them are not well documented, (not for a beginner, at least). Can someone explain this to me? Thanks muchly. Larry Cuba From chrish at cryptocard.com Wed Feb 4 12:06:53 2004 From: chrish at cryptocard.com (Chris Herborth) Date: Wed Feb 4 12:51:58 2004 Subject: [XML-SIG] DOM with EntityReference support? Message-ID: <402126AD.1020209@cryptocard.com> Is there a Python DOM (other than xml.dom.javadom) that supports EntityReference nodes? -- Chris Herborth chrish@cryptocard.com Documentation Overlord, CRYPTOCard Corp. http://www.cryptocard.com/ Never send a monster to do the work of an evil scientist. From and-xml at doxdesk.com Thu Feb 5 11:09:08 2004 From: and-xml at doxdesk.com (Andrew Clover) Date: Thu Feb 5 11:53:53 2004 Subject: [XML-SIG] DOM with EntityReference support? In-Reply-To: <402126AD.1020209@cryptocard.com> References: <402126AD.1020209@cryptocard.com> Message-ID: <20040205160908.GA9782@doxdesk.com> Chris Herborth wrote: > Is there a Python DOM (other than xml.dom.javadom) that supports > EntityReference nodes? Yes: http://www.doxdesk.com/software/py/pxdom.html However, the replacement content of external entities is not resolved. (Yet. You can expect support for external entities in version 1.1. 1.0 will be out shortly, dependent on the release of the next spec draft.) -- Andrew Clover mailto:and@doxdesk.com http://www.doxdesk.com/ From John.Ertl at fnmoc.navy.mil Thu Feb 5 12:55:32 2004 From: John.Ertl at fnmoc.navy.mil (Ertl, John) Date: Thu Feb 5 12:53:00 2004 Subject: [XML-SIG] Document style SOAP Web Service with ZSI (docstyle) Message-ID: I was following a thread about a doc style web service using Pythons ZSI and I could not follow the solution. Would anyone have a full simple example of a ZSI server and client doc style? I am also trying to use dispatch.AsCGI. John C. Ertl Fleet Numerical Meteorology & Oceanography Center 7 Grace Hopper Ave Monterey, CA 93943 phone: (831) 656-5704 fax: (831) 656-4363 From JRBoverhof at lbl.gov Thu Feb 5 23:56:38 2004 From: JRBoverhof at lbl.gov (Joshua Boverhof) Date: Thu Feb 5 23:46:15 2004 Subject: [XML-SIG] Document style SOAP Web Service with ZSI (docstyle) References: Message-ID: <40231E86.70403@lbl.gov> Creating a client is simple, just use the wsdl2py tool to generate a couple modules. %wsdl2py -u http://ws.cdyne.com/ziptogeo/zip2geo.asmx?wsdl % ls Zip2Geo_services.py Zip2Geo_services_types.py % python Python 2.3 (#1, Aug 8 2003, 13:12:42) [GCC 3.2 20020903 (Red Hat Linux 8.0 3.2-7)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from Zip2Geo_services import Zip2GeoLocator, GetLatLongSoapInWrapper >>> import sys >>> kw = {'tracefile':sys.stdout} >>> loc = Zip2GeoLocator() >>> port = loc.getZip2GeoSoap(**kw) >>> msg = GetLatLongSoapInWrapper() >>> msg._zipcode = '94610' >>> response = port.GetLatLong(msg) _________________________________ Thu Feb 5 20:26:31 2004 REQUEST: < xmlns="http://ws.cdyne.com"> 94610 _________________________________ Thu Feb 5 20:26:32 2004 RESPONSE: Date: Fri, 06 Feb 2004 04:19:39 GMT Server: Microsoft-IIS/6.0 X-Powered-By: ASP.NET MicrosoftOfficeWebServer: 5.0_Pub X-AspNet-Version: 1.1.4322 Cache-Control: no-cache Pragma: no-cache Expires: -1 Content-Type: text/xml; charset=utf-8 Content-Length: 779 falsefalseOAKLANDCA94610ALAMEDA-122.27948037.799127-122.27948037.828727-122.24732937.81392773625775 >>> response._GetLatLongResult >>> response._GetLatLongResult._ServiceError 0 >>> response._GetLatLongResult._City u'OAKLAND' >>> response._GetLatLongResult._AvgLatitude 37.813927 There isn't currently a way to automatically generate the server-side, but basically you create modules that contain callback functions. But using the dispatch module in ZSI is problematic. This is a way to do it but it is not 100% correct. $ cat run.py #!/usr/bin/env python from ZSI.dispatch import AsServer import ZipServer AsServer(port=8080, modules=(ZipServer,), docstyle=1) $ cat ZipServer.py from xml.dom.ext.reader import PyExpat def GetLatLong(msg): reply = """falsefalseOAKLANDCA94610ALAMEDA-122.27948037.799127-122.27948037.828727-122.24732937.81392773625775""" reader = PyExpat.Reader() doc = reader.fromString(reply) return doc To change your client to send a message to your server just change the portAddress of the port when you get it from the locator object. port = loc.getZip2GeoSoap(portAddress='http://localhost:8080', **kw) -josh Ertl, John wrote: >I was following a thread about a doc style web service using Pythons ZSI and >I could not follow the solution. Would anyone have a full simple example of >a ZSI server and client doc style? I am also trying to use dispatch.AsCGI. > >John C. Ertl >Fleet Numerical Meteorology & Oceanography Center >7 Grace Hopper Ave >Monterey, CA 93943 >phone: (831) 656-5704 >fax: (831) 656-4363 > > >_______________________________________________ >XML-SIG maillist - XML-SIG@python.org >http://mail.python.org/mailman/listinfo/xml-sig > > From list-matt at reprocessed.org Sat Feb 7 05:35:40 2004 From: list-matt at reprocessed.org (Matt Patterson) Date: Sat Feb 7 05:35:47 2004 Subject: [XML-SIG] libxslt error In-Reply-To: <20040203002445.GA1686@marmota> References: <20040203002445.GA1686@marmota> Message-ID: <5FF98B10-5959-11D8-9B0F-000393CBB978@reprocessed.org> On 3 Feb 2004, at 00:24, luis miguel morillas wrote: > I've installed libxml2-2.6.4 and libxslt-1.1.2 in a linux debian > with python2.3. > When i import libxslt, i get this error: > >>>> import libxml2 >>>> import libxslt if you built them from source and didn't specify the --with-python option then you'll need to rebuild libxml2 and libxslt using --with-python # ./configure --with-python=/usr/lib/python2.2 (on my elderly Debian box) then make, make install as usual. If you installed from binary then you need to install the libxml2-python, libxslt-python packages as well: The Python bindings aren't installed as part of the usual package. Also, libxml2-python is not libxml2 + python, it's just the python bits. Best, Matt -- Matt Patterson | Typographer | http://www.emdash.co.uk/ | http://reprocessed.org/ From list-matt at reprocessed.org Sat Feb 7 14:33:37 2004 From: list-matt at reprocessed.org (Matt Patterson) Date: Sat Feb 7 14:33:56 2004 Subject: [XML-SIG] PyXML XPath woes Message-ID: <86E0FD1B-59A4-11D8-954F-000393CBB978@reprocessed.org> Hello, I've got an XML file in which I want to locate all elements with the attribute boundary set to 'true'. I use the following XPath with 4DOM: //*[@boundary='true'] like so: boundaryFinder = Compile("//*[@boundary='true']") context = Context(self.document) # evaluate the expression and get a nodeList boundaryNodes = boundaryFinder.evaluate(context) But the results of the XPath do not return all the nodes which match! The file I'm parsing is an external entity which I've reached by parsing it as if it were stand-alone XML and not by following an entity reference from a different XML doc. It's well-formed, but missing its declaration (it's a FrameMaker generated entity: Frame outputs multi-document projects ('books') as a single XML file with all the component documents referenced as entities). The project I'm working on involves paginating large XML files in an arbitrary way using DOM Range. To figure out where the page boundaries lie I'm using XPath to locate the nodes which cause a new page to start. The boundary="true" attributes are added in a pre-processing step Can anyone shed any light onto why only some of the boundary="true" nodes are being found? The files in question are here: http://www.emdash.co.uk/opf/12_%20Chldrns_edction.e12 http://www.emdash.co.uk/opf/tmp603_12_%20Chldrns_edction.e12 Thanks, Matt As an aside, I originally did the whole thing using PyXML but the XPath were too complex for it (example below) and it would return no results! I now pre-process the file by running with the complex XPaths through libxslt to add the boundary attributes using the compkex XPath, and then searching for the attributes with PyXML. This is the XPath to find the boundary nodes without help: //H1[ancestor::boxtexttable = false()][ancestor::casestudy = false()][ancestor::casetexttable = false()][ancestor::checklist = false()]|//H2[ancestor::boxtexttable = false()][ancestor::casestudy = false()][ancestor::casetexttable = false()][ancestor::checklist = false()][preceding-sibling::*[1][name() != 'H1']]|//H2[ancestor::boxtexttable = false()][ancestor::casestudy = false()][ancestor::casetexttable = false()][ancestor::checklist = false()][count(preceding-sibling::*) = 0]|//H3[ancestor::boxtexttable = false()][ancestor::casestudy = false()][ancestor::casetexttable = false()][ancestor::checklist = false()][preceding-sibling::*[1][name() != 'H2'][name() != 'H1']]|//H3[ancestor::boxtexttable = false()][ancestor::casestudy = false()][ancestor::casetexttable = false()][ancestor::checklist = false()][count(preceding-sibling::*) = 0] -- Matt Patterson | Typographer | http://www.emdash.co.uk/ | http://reprocessed.org/ From morillas at posta.unizar.es Sat Feb 7 20:19:23 2004 From: morillas at posta.unizar.es (luis miguel morillas) Date: Sat Feb 7 19:59:44 2004 Subject: [XML-SIG] PyXML XPath woes In-Reply-To: <86E0FD1B-59A4-11D8-954F-000393CBB978@reprocessed.org> References: <86E0FD1B-59A4-11D8-954F-000393CBB978@reprocessed.org> Message-ID: <20040208011923.GA3849@marmota> Asunto: [XML-SIG] PyXML XPath woes Fecha: s?b, feb 07, 2004 at 07:33:37 +0000 Citando a Matt Patterson (list-matt@reprocessed.org): > Hello, > > I've got an XML file in which I want to locate all elements with the > attribute boundary set to 'true'. I use the following XPath with 4DOM: > > //*[@boundary='true'] > > like so: > boundaryFinder = Compile("//*[@boundary='true']") > context = Context(self.document) > # evaluate the expression and get a nodeList > boundaryNodes = boundaryFinder.evaluate(context) > > But the results of the XPath do not return all the nodes which match! > Doesn't it? I get 47 nodes with http://www.emdash.co.uk/opf/tmp603_12_%20Chldrns_edction.e12 from Ft.Xml import Domlette from Ft.Xml.XPath.Context import Context from Ft.Xml.XPath import Compile reader = Domlette.DEFAULT_NONVALIDATING_READER() doc = reader.fromStream(open('tmp603_12_%20Chldrns_edction.e12')) context=Context(doc) bf = Compile("//*[@boundary='true']") result = bf.evaluate(context) print len(result) > The project I'm working on involves paginating large XML files in an > arbitrary way using DOM Range. To figure out where the page boundaries > lie I'm using XPath to locate the nodes which cause a new page to > start. The boundary="true" attributes are added in a pre-processing > step > Very interesting!! -- Luis Miguel No a las patentes de software en Europa EuropeSwPatentFree http://EuropeSwPatentFree.hispalinux.es From tpassin at comcast.net Sat Feb 7 21:49:14 2004 From: tpassin at comcast.net (Thomas B. Passin) Date: Sat Feb 7 21:47:22 2004 Subject: [XML-SIG] PyXML XPath woes In-Reply-To: <86E0FD1B-59A4-11D8-954F-000393CBB978@reprocessed.org> References: <86E0FD1B-59A4-11D8-954F-000393CBB978@reprocessed.org> Message-ID: <4025A3AA.8020202@comcast.net> Matt Patterson wrote: > > I've got an XML file in which I want to locate all elements with the > attribute boundary set to 'true'. I use the following XPath with 4DOM: > > //*[@boundary='true'] > > like so: > boundaryFinder = Compile("//*[@boundary='true']") > context = Context(self.document) > # evaluate the expression and get a nodeList > boundaryNodes = boundaryFinder.evaluate(context) > > But the results of the XPath do not return all the nodes which match! > How many nodes did you get and how many are actually there? You have an encoding problem with the file you linked to. It is encoded in iso-8859-1 but with no encoding declaration it is treated as utf-8. Unfortunately there are some non-utf-8 characters in it, so it is not well-formed. Thus any results you get would be suspect. In fact, it should not parse sucessfully at all. Once I added an encoding declaration for iso-8859-1, running the xpath expression in XML Cooktop I got 47 @boundary nodes, the same as Luis Miguel said he found. Fix your encoding (I have no idea how you will do that, but presumably either a Framemaker setting or a bit more preprocessing would do it). Then follow Luis Miguel's example and see if you get the same results. Cheers, Tom P From mike at skew.org Sun Feb 8 01:18:04 2004 From: mike at skew.org (Mike Brown) Date: Sun Feb 8 01:18:07 2004 Subject: [XML-SIG] PyXML XPath woes In-Reply-To: <86E0FD1B-59A4-11D8-954F-000393CBB978@reprocessed.org> "from Matt Patterson at Feb 7, 2004 07:33:37 pm" Message-ID: <200402080618.i186I4gZ065456@chilled.skew.org> Matt Patterson wrote: > As an aside, I originally did the whole thing using PyXML but the XPath > were too complex for it (example below) and it would return no results! > I now pre-process the file by running with the complex XPaths through > libxslt to add the boundary attributes using the compkex XPath, and > then searching for the attributes with PyXML. This is the XPath to find > the boundary nodes without help: > > //H1[ancestor::boxtexttable = false()][ancestor::casestudy = > false()][ancestor::casetexttable = false()][ancestor::checklist = > false()]|//H2[ancestor::boxtexttable = false()][ancestor::casestudy = > false()][ancestor::casetexttable = false()][ancestor::checklist = > false()][preceding-sibling::*[1][name() != > 'H1']]|//H2[ancestor::boxtexttable = false()][ancestor::casestudy = > false()][ancestor::casetexttable = false()][ancestor::checklist = > false()][count(preceding-sibling::*) = 0]|//H3[ancestor::boxtexttable = > false()][ancestor::casestudy = false()][ancestor::casetexttable = > false()][ancestor::checklist = false()][preceding-sibling::*[1][name() > != 'H2'][name() != 'H1']]|//H3[ancestor::boxtexttable = > false()][ancestor::casestudy = false()][ancestor::casetexttable = > false()][ancestor::checklist = false()][count(preceding-sibling::*) = > 0] Some XPath hints for you here... 1. These predicates don't have to be chained. For example, instead of [ancestor::boxtexttable = false()][ancestor::casestudy = false()] [ancestor::casetexttable = false()][ancestor::checklist = false()] [preceding-sibling::*[1][name() != 'H1'] you could just say [not(ancestor::boxtexttable or ancestor::casestudy or ancestor::casetexttable or ancestor::checklist) and name(preceding-sibling::*[1]) != 'H1'] 2. count(preceding-sibling::*) = 0 is more succinctly written as not(preceding-sibling::*) However I think this predicate may have been interfering with your results. Take your first H2, for example... it does have a preceding sibling: standfirst, but you did in fact want it to be recorded as a boundary, right? I am guessing that your boundary elements are those H1, H2 and H3 elements that are not descendants of boxtexttable, casestudy, casetextable, or checklist elements, and that are not immediately preceded by a higher-level heading element (H1 being higher than H2 being higher than H3). I think this last clause is not really necessary; you could more easily just rule out a given element as being a boundary if its immediate preceding sibling's name doesn't start with 'H' (since you don't have any
s). This expression is much simpler and I think will do what you want: (//H1|//H2|//H3)[not(ancestor::boxtexttable or ancestor::casestudy or ancestor::casetexttable or ancestor::checklist) and not(starts-with(local-name(preceding-sibling::*[1]),'H'))] And it could be easily made to be more specific if you do in fact have other element names starting with 'H'. However, any XPath expression that uses "//" for a full descent into the tree and "ancestor::" (multiple times!) to traverse all the way back up is putting a serious stress test on an XPath processor's optimization abilities. If efficiency is critical, I'd look into other mechanisms involving a single pass through the tree. For example, this XSLT stylesheet, which does a recursive copy-through ("identity transform", see XSLT spec under Copying) is far more efficient for what you want to do, which is generate a new document that has a boundary="true" attribute added to the appropriate elements: true -Mike From Alexandre.Fayolle at logilab.fr Tue Feb 10 05:46:24 2004 From: Alexandre.Fayolle at logilab.fr (Alexandre Fayolle) Date: Tue Feb 10 05:46:31 2004 Subject: [XML-SIG] CVS notification message failed Message-ID: <20040210104624.GB1121@calvin> Hi, I've juste committed some (cosmetic) changes in the man pages of xmlproc, and the mailing of the notification message failed: Mailing pyxml-checkins@lists.sourceforge.net... Generating notification message... Traceback (most recent call last): File "/cvsroot/pyxml/CVSROOT/syncmail", line 336, in ? main() File "/cvsroot/pyxml/CVSROOT/syncmail", line 329, in main blast_mail(subject, people, specs[1:], contextlines, fromhost) File "/cvsroot/pyxml/CVSROOT/syncmail", line 227, in blast_mail conn.connect(MAILHOST, MAILPORT) File "/usr/lib/python2.2/smtplib.py", line 276, in connect for res in socket.getaddrinfo(host, port, 0, socket.SOCK_STREAM): socket.gaierror: (-2, 'Name or service not known') This is probably not a major problem, though. SMTP server down on sourceforge or something like that. -- Alexandre Fayolle LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org D?veloppement logiciel avanc? - Intelligence Artificielle - Formations -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Digital signature Url : http://mail.python.org/pipermail/xml-sig/attachments/20040210/ece8b237/attachment.bin From fdrake at acm.org Tue Feb 10 09:57:42 2004 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue Feb 10 09:57:51 2004 Subject: [XML-SIG] CVS notification message failed In-Reply-To: <20040210104624.GB1121@calvin> References: <20040210104624.GB1121@calvin> Message-ID: <200402100957.42579.fdrake@acm.org> On Tuesday 10 February 2004 05:46 am, Alexandre Fayolle wrote: > I've juste committed some (cosmetic) changes in the man pages of > xmlproc, and the mailing of the notification message failed: ... > This is probably not a major problem, though. SMTP server down on > sourceforge or something like that. Actually, this is a really weird little matter of configuration. SF now runs their SMTP server on a machine called "localhost" which doesn't appear to be the machine the CVS server is running on. This is an issue that came up for Python some time ago (a month ago, perhaps?), and was resolved for several projects. I made a pass over all the syncmail-using project I had checkouts of and made the change in syncmail, but CVS was pretty much down by the time I'd done that, so none of that got checked in. (This was when they were syncing data to new CVS servers, so it was down for days, not hours.) I've committed the change for PyXML now, so things should be working again. The error only affected the generation of the checkin emails, not the actual checkin. Thanks for bringing this back to my attention! -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From list-matt at reprocessed.org Wed Feb 11 04:04:48 2004 From: list-matt at reprocessed.org (Matt Patterson) Date: Wed Feb 11 04:05:01 2004 Subject: [XML-SIG] PyXML XPath woes In-Reply-To: <200402080618.i186I4gZ065456@chilled.skew.org> References: <200402080618.i186I4gZ065456@chilled.skew.org> Message-ID: <583DC4F6-5C71-11D8-B0A3-000393CBB978@reprocessed.org> On 8 Feb 2004, at 06:18, Mike Brown wrote: > Some XPath hints for you here... > > 1. These predicates don't have to be chained. For example, instead of > > you could just say > > [not(ancestor::boxtexttable or ancestor::casestudy or > ancestor::casetexttable > or ancestor::checklist) and name(preceding-sibling::*[1]) != 'H1'] Aha. The original reason for such hideously complex predicates was that I had a view to generating the XPath from a much simpler source, or maybe even a little app, so I wanted to keep things as crude as possible. That's all gone out of the window now... > 2. count(preceding-sibling::*) = 0 > > is more succinctly written as not(preceding-sibling::*) > However I think this predicate may have been interfering with your > results. > Take your first H2, for example... it does have a preceding sibling: > standfirst, but you did in fact want it to be recorded as a boundary, > right? That's why there were two lines matching H2, one without a preceding sibling predicate and one with... > I am guessing that your boundary elements are those H1, H2 and H3 > elements > that are not descendants of boxtexttable, casestudy, casetextable, or > checklist elements, and that are not immediately preceded by a > higher-level > heading element (H1 being higher than H2 being higher than H3). Exactly right. > This expression is much simpler and I think will do what you want: > > (//H1|//H2|//H3)[not(ancestor::boxtexttable or > ancestor::casestudy or > ancestor::casetexttable or > ancestor::checklist) and > > not(starts-with(local-name(preceding-sibling::*[1]),'H'))] Blimey, just a little simpler... :) > If efficiency is critical, I'd look into other mechanisms involving a > single > pass through the tree. For example, this XSLT stylesheet, which does a > recursive copy-through ("identity transform", see XSLT spec under > Copying) is > far more efficient for what you want to do, which is generate a new > document > that has a boundary="true" attribute added to the appropriate elements: I originally had wanted to avoid an intermediate XSL pass of the content, but it became unavoidable: I am already doing an XSL pass, and just moving the whole thing boundary finding thing into XSL sounds mighty sensible to me. Thanks for all your advice: very helpful indeed! Best, Matt -- Matt Patterson | Typographer | http://www.emdash.co.uk/ | http://reprocessed.org/ From list-matt at reprocessed.org Wed Feb 11 09:35:28 2004 From: list-matt at reprocessed.org (Matt Patterson) Date: Wed Feb 11 09:35:34 2004 Subject: [XML-SIG] PyXML XPath woes In-Reply-To: <4025A3AA.8020202@comcast.net> References: <86E0FD1B-59A4-11D8-954F-000393CBB978@reprocessed.org> <4025A3AA.8020202@comcast.net> Message-ID: <89A28F13-5C9F-11D8-B0A3-000393CBB978@reprocessed.org> On 8 Feb 2004, at 02:49, Thomas B. Passin wrote: > Matt Patterson wrote: >> I've got an XML file in which I want to locate all elements with the >> attribute boundary set to 'true'. I use the following XPath with >> 4DOM: >> //*[@boundary='true'] >> like so: >> boundaryFinder = Compile("//*[@boundary='true']") >> context = Context(self.document) >> # evaluate the expression and get a nodeList >> boundaryNodes = boundaryFinder.evaluate(context) >> But the results of the XPath do not return all the nodes which match! > > How many nodes did you get and how many are actually there? Okay, this is weird: the XPath _is_ returning 47 nodes, as it should. I should have checked closer: I thought that the XPath was returning wrongly because in the final paginated output were pages which had several elements with boundary="true" attributes, but I panicked and assumed that XPath was to blame. This seems to have been caused by heinous problems with the 4DOM DOM Range implementation: either the range is storing its boundary points very strangely, or the range.cloneContents() method is simply bonkers. The range causing the problem has start and end points with the same parent, and the cloneContents() method is returning that all of that parent node's children. I had to do some poking, but it seems that when the start point of a range is the child of the range's common ancestor node and the end point is a grand-child or greater descendant then the cloneContents() method of the range returns all preceding-siblings of the last part of the end-point's ancestor chain: If the start and end-points of a range were set to and respectively below: Then range.cloneContents() would return: Which is clearly bonkers. > You have an encoding problem with the file you linked to. It is > encoded in iso-8859-1 but with no encoding declaration it is treated > as utf-8. Unfortunately there are some non-utf-8 characters in it, so > it is not well-formed. Thus any results you get would be suspect. In > fact, it should not parse sucessfully at all Hmmm. The Frame XML file (and thus it's entities) claim to be utf-8, and I've had no problems with them. It could be a file-transfer issue, I suppose. I'll have to investigate closer. Thanks for the heads up! Thanks for all your help, Matt -- Matt Patterson | Typographer | http://www.emdash.co.uk/ | http://reprocessed.org/ From phthenry at earthlink.net Thu Feb 12 00:09:44 2004 From: phthenry at earthlink.net (Paul Tremblay) Date: Thu Feb 12 00:10:22 2004 Subject: [XML-SIG] serializing with xslt with SAX Message-ID: <20040212050944.GA3173@localhost.localdomain> I need to chain several stylesheets together in order to transforma document. I was told that I should use SAX to perform this serialization. However, the only documenation I came across was doint so with java. I have used Python SAX in the past, but only to process XML documents. I dont' know how to use it in conjunction with an xslt processor. I have the Oreilly book *Python & XML,* but this didn't help me. Thanks Paul -- ************************ *Paul Tremblay * *phthenry@earthlink.net* ************************ From m6610 at prodigy.net.mx Thu Feb 12 17:13:03 2004 From: m6610 at prodigy.net.mx (KRIOGENIA RECORDS DE MEXICO S A DE C V) Date: Thu Feb 12 17:13:17 2004 Subject: [XML-SIG] ! CANTAS, TIENES UNA BANDA VEN Y PARTICIPA ! Message-ID: <41102-22004241222133174@pentium3> VISITA LA PAGINA www.kriogeniarecords.com "Este mail cuenta con todos los requisitos para no ser denominado SPAM" "Es fácil ser removido de esta lista de suscripción" "Haz Click Aquí" -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20040212/6c3da4ba/attachment.html From asc at vineyard.net Sat Feb 14 14:21:02 2004 From: asc at vineyard.net (Aaron Straup Cope) Date: Sat Feb 14 14:20:17 2004 Subject: [XML-SIG] [FYI] Net::Delicious::Export::Post::XBEL Message-ID: <1076786461.13787.21.camel@localhost> http://search.cpan.org/dist/Net-Delicious-Export-Post-XBEL/ see also : http://del.icio.us Cheers, From mike at skew.org Sat Feb 14 21:07:43 2004 From: mike at skew.org (Mike Brown) Date: Sat Feb 14 21:07:46 2004 Subject: [XML-SIG] serializing with xslt with SAX In-Reply-To: <20040212050944.GA3173@localhost.localdomain> "from Paul Tremblay at Feb 12, 2004 00:09:44 am" Message-ID: <200402150207.i1F27hGS095611@chilled.skew.org> Paul Tremblay wrote: > I need to chain several stylesheets together in order to transforma > document. I was told that I should use SAX to perform this > serialization. However, the only documenation I came across was doint so > with java. > > I have used Python SAX in the past, but only to process XML documents. I > dont' know how to use it in conjunction with an xslt processor. I have > the Oreilly book *Python & XML,* but this didn't help me. Stylesheet chaining is feeding the output of one transformation as the input to the next. The simplest way to do that is of course to just do transformations normally, serializing the result as XML at the end of each transformation, and then letting this be reparsed by the next transformation process. To make this more efficient, it is preferable to feed the result documents to the next transform in the manner that would be most efficient for the XSLT processor. You need to research the APIs that are supported by your XSLT processor and determine what's the most efficient way of supplying a source document and see whether it's an option to directly output result documents in that manner, or to convert them to that format without doing a full serialization and reparse. In the Java world, the advice is to use SAX because it is fast and it is natively supported by all the processors that support JAXP -- the processor can be given a source document via SAX event calls and can generate SAX event calls for the result. In the Python world, SAX is not necessarily the most efficient. For example, 4Suite uses Expat to do parsing of serialized XML, and it builds Domlette documents from Expat's native callbacks (which are somewhat SAX-like, but different). It's more efficient to supply a Domlette to the processor than it is to supply an unparsed document or even Expat callbacks. The processor does support SAX, and Domlette (as Result Tree Fragment) output, though, so we could perhaps write a SAX-to-Expat layer for use in conjunction with the SAX XSLT output writer, or we could write an Expat XSLT output writer, but we're better off just using our Result Tree Fragment writer, which generates Domlette nodes that can be fed directly to the next transformation instance. We don't yet have a good chaining API or recipe for 4Suite in general, and in researching our capabilities in order to answer this question, Jeremy & I found some bugs that have since been fixed in CVS. The code sample below is an example that should work with a current CVS snapshot, and is pretty fast, although Jeremy points out that Processor re-use is not thoroughly tested and the overhead of creating a new Processor instance is minimal in comparison to going through all the things that happen when the Processor.reset() is called. -Mike # Just some experimentation... # # Don't take this as being the one and only way to do chaining in 4Suite # (outside of the repository). # The point is just to demonstrate using different output writers, and to # give us some food for thought on making a less cumbersome chaining API. # src_xml = """ This cursed hand, for thicker than itself with brother's blood -- Is there not rain enough in the sweet heavens to wash it white as snow? """ # A 6-letter rotation of the lowercase characters # xslt1 = """ """ # A 2-letter rotation + uppercasing of the lowercase characters # xslt2 = """ """ from Ft.Xml import InputSource, Domlette from Ft.Xml.Xslt import Processor, RtfWriter class Test: # we're going to try to reuse the processor p = Processor.Processor() def run(self, src_isrc, chain): i = 0 if not chain: return '' for (sty, uri) in chain: sty_isrc = InputSource.DefaultFactory.fromString(sty, uri) self.p.appendStylesheet(sty_isrc) # not on last stylesheet in chain? if i < len(chain) - 1: # use an RtfWriter w = RtfWriter.RtfWriter(None, 'urn:temp.xml') # not on first stylesheet in chain? if i: # use last RtfWriter's buffer as source doc self.p.execute(result, src_isrc, writer=w) else: # use original source doc self.p.run(src_isrc, writer=w) # save result to use as source doc next time result = w.getResult() # last stylesheet in chain else: if w: result = self.p.execute(result, src_isrc) else: result = self.p.run(src_isrc) self.p.reset() i += 1 return result xml_isrc = InputSource.DefaultFactory.fromString(src_xml, 'urn:hamlet.xml') # four 6-letter rotations + a 2-letter rotation and uppercasing # should result in a full rotation and uppercasing... # expected output is an uppercase version of the Hamlet quotation # chain = [(xslt1, 'urn:lc-rot6.xsl'), (xslt1, 'urn:lc-rot6.xsl'), (xslt1, 'urn:lc-rot6.xsl'), (xslt1, 'urn:lc-rot6.xsl'), (xslt2, 'urn:lc-rot2-uc.xsl'), ] t = Test() print t.run(xml_isrc, chain) From phthenry at earthlink.net Sun Feb 15 03:20:01 2004 From: phthenry at earthlink.net (Paul Tremblay) Date: Sun Feb 15 03:20:39 2004 Subject: [XML-SIG] serializing with xslt with SAX In-Reply-To: <200402150207.i1F27hGS095611@chilled.skew.org> References: <20040212050944.GA3173@localhost.localdomain> <200402150207.i1F27hGS095611@chilled.skew.org> Message-ID: <20040215082001.GA3575@localhost.localdomain> On Sat, Feb 14, 2004 at 07:07:43PM -0700, Mike Brown wrote: > > In the Python world, SAX is not necessarily the most efficient. For example, > 4Suite uses Expat to do parsing of serialized XML, and it builds Domlette > documents from Expat's native callbacks (which are somewhat SAX-like, but > different). It's more efficient to supply a Domlette to the processor than it > is to supply an unparsed document or even Expat callbacks. The processor does > support SAX, and Domlette (as Result Tree Fragment) output, though, so we > could perhaps write a SAX-to-Expat layer for use in conjunction with the SAX > XSLT output writer, or we could write an Expat XSLT output writer, but we're > better off just using our Result Tree Fragment writer, which generates > Domlette nodes that can be fed directly to the next transformation instance. I had suspected that the advice, from java gura Michael Kay, was biased towards java. > > We don't yet have a good chaining API or recipe for 4Suite in general, and in > researching our capabilities in order to answer this question, Jeremy & I > found some bugs that have since been fixed in CVS. The code sample below is an > example that should work with a current CVS snapshot, and is pretty fast, > although Jeremy points out that Processor re-use is not thoroughly tested and > the overhead of creating a new Processor instance is minimal in comparison to > going through all the things that happen when the Processor.reset() is called. So if creating a new Processor is minimal, I can use this code below? from Ft.Xml import InputSource from Ft.Xml.Xslt.Processor import Processor # first run document = InputSource.DefaultFactory.fromUri(xmlfile) stylesheet = InputSource.DefaultFactory.fromUri(xsltfile) processor = Processor() processor.appendStylesheet(stylesheet) result = processor.run(document) # second run. And so on. document = InputSource.DefaultFactory.fromString(result) stylesheet = InputSource.DefaultFactory.fromUri(xsltfile) processor = Processor() processor.appendStylesheet(stylesheet) result2 = processor.run(document) I'll have to download a CVS snapshot to test the code below. But I think I need something more standard, since the scripts I'm working with will be published. I'm coming to the realization that xslt isn't absolutely standard. Trax was supposed to allow a universal interface. But as of now, it only works with two processors: saxon and xalan. That means if you write an application to process XML with xslt stylesheets, you will be either using Java or perl/pyton (etc) with C++ libraries. By the way, do you know how read and write from a string using libsxlt? I coudn't find anything on the web on that. Okay, I have a lot of question on this example. > from Ft.Xml import InputSource, Domlette > from Ft.Xml.Xslt import Processor, RtfWriter I actually don't know what Rtf is, though I keep hearing this term. > > class Test: > # we're going to try to reuse the processor > p = Processor.Processor() > > def run(self, src_isrc, chain): > i = 0 > if not chain: > return '' > for (sty, uri) in chain: > sty_isrc = InputSource.DefaultFactory.fromString(sty, uri) > self.p.appendStylesheet(sty_isrc) > # not on last stylesheet in chain? > if i < len(chain) - 1: > # use an RtfWriter > w = RtfWriter.RtfWriter(None, 'urn:temp.xml') You are setting up an RtfWriter--what is that? Why the "urn" prefix? > # not on first stylesheet in chain? > if i: > # use last RtfWriter's buffer as source doc > self.p.execute(result, src_isrc, writer=w) But here you use p.execute. > else: > # use original source doc > self.p.run(src_isrc, writer=w) Okay, so the first time you use p.run. Why is that? > # save result to use as source doc next time > result = w.getResult() Save to a string > # last stylesheet in chain > else: > if w: Why wouldn't the Rtf writer be defined? > result = self.p.execute(result, src_isrc) > else: > result = self.p.run(src_isrc) > self.p.reset() > i += 1 > return result > > > xml_isrc = InputSource.DefaultFactory.fromString(src_xml, 'urn:hamlet.xml') > > # four 6-letter rotations + a 2-letter rotation and uppercasing > # should result in a full rotation and uppercasing... > # expected output is an uppercase version of the Hamlet quotation > # > chain = [(xslt1, 'urn:lc-rot6.xsl'), > (xslt1, 'urn:lc-rot6.xsl'), > (xslt1, 'urn:lc-rot6.xsl'), > (xslt1, 'urn:lc-rot6.xsl'), > (xslt2, 'urn:lc-rot2-uc.xsl'), > ] Sorry to be dense here, but what does each tupple represent? Is the first item a name or a path? Is the second item some type of uri address? > > t = Test() > print t.run(xml_isrc, chain) Thanks for all your help. Paul -- ************************ *Paul Tremblay * *phthenry@earthlink.net* ************************ From mike at skew.org Sun Feb 15 16:44:00 2004 From: mike at skew.org (Mike Brown) Date: Sun Feb 15 16:47:29 2004 Subject: [XML-SIG] serializing with xslt with SAX In-Reply-To: <20040215082001.GA3575@localhost.localdomain> "from Paul Tremblay at Feb 15, 2004 03:20:01 am" Message-ID: <200402152144.i1FLi0wK098466@chilled.skew.org> Paul Tremblay wrote: > So if creating a new Processor is minimal, I can use this code below? As the topic is now 4Suite specific, I am continuing the discussion over on the 4suite list. http://lists.fourthought.com/mailman/listinfo/4suite -Mike From phthenry at earthlink.net Mon Feb 16 04:19:28 2004 From: phthenry at earthlink.net (Paul Tremblay) Date: Mon Feb 16 04:20:09 2004 Subject: [XML-SIG] using libxslt with strings Message-ID: <20040216091928.GA17947@localhost.localdomain> Can someone tell me how to save to a string when using libxslt? import libxml2 import libxslt styledoc = libxml2.parseFile("/home/paul/paultemp/test.xsl") style = libxslt.parseStylesheetDoc(styledoc) doc = libxml2.parseFile("/home/paul/paultemp/test.xml") result = style.applyStylesheet(doc, None) style.freeStylesheet() doc.freeDoc() # style.saveResultToFilename("foo", result, 0) Now how do I store my result as a string so I can do further processing? I actually want to process a document with several xslt stylesheets, and then process it with SAX. Thanks Paul -- ************************ *Paul Tremblay * *phthenry@earthlink.net* ************************ From paul.boddie at ementor.no Mon Feb 16 04:28:34 2004 From: paul.boddie at ementor.no (Paul Boddie) Date: Mon Feb 16 04:28:41 2004 Subject: [XML-SIG] using libxslt with strings Message-ID: Paul Tremblay [mailto:phthenry@earthlink.net] wrote: > > Can someone tell me how to save to a string when using libxslt? [...] > styledoc = libxml2.parseFile("/home/paul/paultemp/test.xsl") > style = libxslt.parseStylesheetDoc(styledoc) > doc = libxml2.parseFile("/home/paul/paultemp/test.xml") > result = style.applyStylesheet(doc, None) > style.freeStylesheet() > doc.freeDoc() > # style.saveResultToFilename("foo", result, 0) Try this: s = style.saveResultToString(result) > Now how do I store my result as a string so I can do further processing? > I actually want to process a document with several xslt stylesheets, and > then process it with SAX. What would be really nice with libxslt/libxml2 is a means to write the results to a stream that could be read from using the normal Python file/stream objects, but without going via a temporary file. It is possible to use StringIO on a serialised document, but I doubt that it is particularly efficient. Paul From veillard at redhat.com Mon Feb 16 04:48:53 2004 From: veillard at redhat.com (Daniel Veillard) Date: Mon Feb 16 04:49:00 2004 Subject: [XML-SIG] using libxslt with strings In-Reply-To: References: Message-ID: <20040216094853.GH12603@redhat.com> On Mon, Feb 16, 2004 at 10:28:34AM +0100, Paul Boddie wrote: > Paul Tremblay [mailto:phthenry@earthlink.net] wrote: > > Now how do I store my result as a string so I can do further processing? > > I actually want to process a document with several xslt stylesheets, and > > then process it with SAX. > > What would be really nice with libxslt/libxml2 is a means to write the > results to a stream that could be read from using the normal Python > file/stream objects, but without going via a temporary file. It is > possible to use StringIO on a serialised document, but I doubt that it > is particularly efficient. Hum, at the C level there is a generic I/O handling layer, but I never made the custom wrappers for Python in libxslt. Daniel -- Daniel Veillard | Red Hat Network https://rhn.redhat.com/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ From phthenry at earthlink.net Mon Feb 16 05:12:06 2004 From: phthenry at earthlink.net (Paul Tremblay) Date: Mon Feb 16 05:12:44 2004 Subject: [XML-SIG] using libxslt with strings In-Reply-To: References: Message-ID: <20040216101206.GA24633@localhost.localdomain> On Mon, Feb 16, 2004 at 10:28:34AM +0100, Paul Boddie wrote: > > styledoc = libxml2.parseFile("/home/paul/paultemp/test.xsl") > > style = libxslt.parseStylesheetDoc(styledoc) > > doc = libxml2.parseFile("/home/paul/paultemp/test.xml") > > result = style.applyStylesheet(doc, None) > > style.freeStylesheet() > > doc.freeDoc() > > # style.saveResultToFilename("foo", result, 0) > > Try this: > > s = style.saveResultToString(result) > I got this error message: Traceback (most recent call last): File "/home/paul/paultemp/python_check3.py", line 32, in ? s = style.saveResultToString(result) AttributeError: stylesheet instance has no attribute 'saveResultToString' Also, before I forget, I also need to read *from* a string. I tried: doc = libxml2.parseString("/home/paul/paultemp/test.xml") But had no luck. -- ************************ *Paul Tremblay * *phthenry@earthlink.net* ************************ From paul.boddie at ementor.no Mon Feb 16 05:24:43 2004 From: paul.boddie at ementor.no (Paul Boddie) Date: Mon Feb 16 05:24:48 2004 Subject: [XML-SIG] using libxslt with strings Message-ID: Daniel Veillard [mailto:veillard@redhat.com] wrote: > > Hum, at the C level there is a generic I/O handling layer, but > I never made the custom wrappers for Python in libxslt. On a related subject, I promised to look into wrapping the existing Python bindings for libxml2 in an API more closely resembling the other PyXML-style DOMs. The resulting package isn't exactly fast, but in situations where interoperability is required, it is very practical - I personally use it to edit existing XML documents before shipping them to libxslt for the really heavy processing. Under the covers, libxml2dom uses libxml2's parsing and serialising routines, giving a significant performance advantage over a lot of the other solutions. Here are the locations of the libxml2dom and qtxmldom packages: http://www.boddie.org.uk/python/downloads/libxml2dom-0.1.tar.gz http://www.boddie.org.uk/python/downloads/qtxmldom-0.1.tar.gz The latter package, which wraps qtxml (PyQt required) and KHTML (PyQt and PyKDE required), is mostly intended for deployment within Qt and KDE applications. However, by combining these packages it should be possible, for example, to copy libxml2 documents (and indeed any PyXML-style document) into and out of qtxml documents and KHTML widgets. Unfortunately, the KHTML stuff could need a fair amount of attention before being considered usable. Paul From veillard at redhat.com Mon Feb 16 05:56:15 2004 From: veillard at redhat.com (Daniel Veillard) Date: Mon Feb 16 05:56:21 2004 Subject: [XML-SIG] using libxslt with strings In-Reply-To: References: Message-ID: <20040216105615.GK12603@redhat.com> On Mon, Feb 16, 2004 at 11:24:43AM +0100, Paul Boddie wrote: > Daniel Veillard [mailto:veillard@redhat.com] wrote: > > > > Hum, at the C level there is a generic I/O handling layer, but > > I never made the custom wrappers for Python in libxslt. > > On a related subject, I promised to look into wrapping the existing Python > bindings for libxml2 in an API more closely resembling the other PyXML-style > DOMs. The resulting package isn't exactly fast, but in situations where > interoperability is required, it is very practical - I personally use it to > edit existing XML documents before shipping them to libxslt for the really > heavy processing. Under the covers, libxml2dom uses libxml2's parsing and > serialising routines, giving a significant performance advantage over a lot > of the other solutions. Hum, interesting, maybe checking on the libxml2 list (especially with Stephane Bidoul who seems to work a lot with the python bindings) would allow to integrate your own layer to the existing bindings, Daniel -- Daniel Veillard | Red Hat Network https://rhn.redhat.com/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ From and-xml at doxdesk.com Tue Feb 17 14:57:38 2004 From: and-xml at doxdesk.com (Andrew Clover) Date: Tue Feb 17 15:19:59 2004 Subject: [XML-SIG] ANN: pxdom 1.0 released Message-ID: <20040217195738.GA18917@doxdesk.com> Oh yeah, I probably should mention this here. I released pxdom 1.0 [final] a few days ago. It's a full stand-alone pure-Python DOM Level 3 Core/XML + Load/Save implementation based on the recently-released W3C Proposed Recommendations. http://www.doxdesk.com/software/py/pxdom.html That's all. -- Andrew Clover mailto:and@doxdesk.com http://www.doxdesk.com/ From andyjmaclean at hotmail.com Tue Feb 10 08:01:15 2004 From: andyjmaclean at hotmail.com (Andrew Maclean) Date: Sun Feb 22 22:30:10 2004 Subject: [XML-SIG] XSV string length checking Message-ID: Dear All, I've come across an error running XSV and I was wondering if I could have an opinion. Basically, it doesn't seem to enforce restrictions on the length of string values. The output tells me that it's being "strict", but I put supply a zero-length string in the test xml file. Below are the source XSD, XML and output files: //////////////////////////////////////////////////////////////////////////////// XSD ////////////////////////////////////////////////////////////////////////////////// XML HG Y ////////////////////////////////////////////////////////////////////////////////////////// OUTPUT //////////////////////////////////////////////////////////////////////// Other checks work, and I get the "element content failed type check..." if I try and put something strange into the tag. String lengths aren't being checked though. Thank you, Mr Andrew Maclean _________________________________________________________________ It's fast, it's easy and it's free. Get MSN Messenger today! http://www.msn.co.uk/messenger From ht at inf.ed.ac.uk Mon Feb 23 04:36:23 2004 From: ht at inf.ed.ac.uk (Henry S. Thompson) Date: Mon Feb 23 04:36:30 2004 Subject: [XML-SIG] XSV string length checking In-Reply-To: (Andrew Maclean's message of "Tue, 10 Feb 2004 13:01:15 +0000") References: Message-ID: String length checking has only recently been added to XSV, and is currently only available from CVS and/or the online version. Next packaged release within a week or so will have this. ht -- Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh Half-time member of W3C Team 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk URL: http://www.ltg.ed.ac.uk/~ht/ [mail really from me _always_ has this .sig -- mail without it is forged spam] From list-matt at reprocessed.org Tue Feb 24 07:58:16 2004 From: list-matt at reprocessed.org (Matt Patterson) Date: Tue Feb 24 07:58:23 2004 Subject: [XML-SIG] error tracking with libxslt Message-ID: <1C92E298-66C9-11D8-B0B2-000393CBB978@reprocessed.org> Hello, Is there any way to get libxslt to output line numbers (corresponding to the source XSL) with its errors? Thanks, Matt -- Matt Patterson | Designer / Coder | http://www.emdash.co.uk/ | http://reprocessed.org/ From veillard at redhat.com Tue Feb 24 08:07:45 2004 From: veillard at redhat.com (Daniel Veillard) Date: Tue Feb 24 08:07:50 2004 Subject: [XML-SIG] error tracking with libxslt In-Reply-To: <1C92E298-66C9-11D8-B0B2-000393CBB978@reprocessed.org> References: <1C92E298-66C9-11D8-B0B2-000393CBB978@reprocessed.org> Message-ID: <20040224130745.GG31124@redhat.com> On Tue, Feb 24, 2004 at 12:58:16PM +0000, Matt Patterson wrote: > Hello, > > Is there any way to get libxslt to output line numbers (corresponding > to the source XSL) with its errors? it's best to follow the guidelines for libxslt help: http://xmlsoft.org/XSLT/bugs.html making sure you're using the latests versions is important. xsltproc output line numbers on errors, so it might be a version problem or a python binding one. Daniel -- Daniel Veillard | Red Hat Network https://rhn.redhat.com/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ From mertz at gnosis.cx Tue Feb 24 15:23:15 2004 From: mertz at gnosis.cx (David Mertz, Ph.D.) Date: Tue Feb 24 15:25:17 2004 Subject: [XML-SIG] [Announce] Gnosis Utils 1.1.1 Message-ID: This release adds gnosis.xml.relax to the package. Miscellaneous speedups and bugfixes to gnosis.xml.objectify. - addChild() convenience function Improvements to floating point handling in gnosis.xml.pickle. - minor improvements to tests Little used gnosis.trigramlib updated per needs of my personal spam filter. Minor fixes in gnosis.util.introspect. It may be obtained at: http://gnosis.cx/download/Gnosis_Utils-1.1.1.tar.gz The current release is always available as: http://gnosis.cx/download/Gnosis_Utils-current.tar.gz You may browse a snapshot at: http://gnosis.cx/download/gnosis/ Try it out, have fun, send feedback! David Mertz (mertz@gnosis.cx) Frank McIngvale (frankm@hiwaay.net) ------------------------------------------------------------------------ BACKGROUND: Gnosis Utilities contains a number of Python libraries, most (but not all) related to working with XML. These include: gnosis.indexer (Full-text indexing/searching) gnosis.xml.pickle (XML pickling of Python objects) gnosis.xml.objectify (Any XML to "native" Python objects) gnosis.xml.validity (Enforce validity constraints) gnosis.xml.relax (Tools for working with RelaxNG) gnosis.xml.indexer (XPATH indexing of XML documents) [...].convert.txt2html (Convert ASCII source files to HTML) gnosis.util.dtd2sql (DTD -> SQL 'CREATE TABLE' statements) gnosis.util.sql2dtd (SQL query -> DTD for query results) gnosis.util.xml2sql (XML -> SQL 'INSERT INTO' statements) gnosis.util.combinators (Combinatorial higher-order functions) gnosis.util.introspect (Introspect Python objects) gnosis.magic (Multimethods, metaclasses, etc) ...and so much more! :-) From jim at deltaxresearch.com Fri Feb 27 14:59:18 2004 From: jim at deltaxresearch.com (Jim Dukarm) Date: Fri Feb 27 14:59:24 2004 Subject: [XML-SIG] wddx 1.0 Message-ID: <163395148763.20040227115918@deltaxresearch.com> Hi - I am developing a Python web application that will be using XML data files for various purposes, and WDDX seems to solve the problem of making those data files more friendly to third-party applications. The wddx.py module in the PyXML marshal package is for WDDX 0.9, but the current WDDX SDK is for version 1.0. I have modified wddx.py to produce wddx_0100.py for WDDX 1.0, including 'null' implemented as Python None, 'boolean' implemented as Python bool, and also the new complex type 'binary'. Details are spelled out in the header comments. Because there are quite a few changes, the diff comparison with wddx.py is large. I would be happy to send wddx_0100.py to anyone interested; I don't want to bug everyone by sending it to this list as an attachment. I am not up to speed on CVS, so I apologize for not being able to submit this in a more respectable way. By the way, I note that in both wddx.py and generic.py in the marshal package, the marshalling methods have an argument 'dict'. In recent versions of Python, 'dict' is the name of a type, so this is a little like having an integer argument called 'int'. It would seem advisable to change the arg's name to 'id_dict' or something similar. Also, you might want to change the WDDX link in the PyXML project home page (http://pyxml.sourceforge.net/) to point to 'openwddx.org' instead of 'wddx.org', which is now defunct. Best regards to all, Jim Dukarm DELTA-X RESEARCH Victoria BC Canada From lewy0lewy at poczta.onet.pl Sat Feb 28 12:22:09 2004 From: lewy0lewy at poczta.onet.pl (Pawel Lewicki) Date: Sat Feb 28 12:30:50 2004 Subject: [XML-SIG] Some questions from a beginner Message-ID: Hi, I have just started playing with xml and python and some issues are still not clear to me. I need to analyze one large xml file. It consists of schema and data. The namespace is some Microsoft stuff (I don?t remember the name but If it has any meaning I can check it after a weekend). I guess that having schema I can analyze the file. It has 3 kinds of the nodes (DB tuples). I found out that I can use dom or sax module. And as a file is about 100MB sax is the way to go. What should I do to understand data? What should I ask google for? The only way I found to analyze a large file is to write a doc handler specifying startElement/characters/? methods. I don?t believe that there is no way to use given logical structure (and types of element). The only thing I need to do (now) is to parse a file and move data to RDBMS. Please, at least some keywords to start with. Regards, Pawel From derekfountain at yahoo.co.uk Sat Feb 28 18:48:57 2004 From: derekfountain at yahoo.co.uk (Derek Fountain) Date: Sat Feb 28 18:45:49 2004 Subject: [XML-SIG] Some questions from a beginner In-Reply-To: References: Message-ID: <200402290748.57630.derekfountain@yahoo.co.uk> > I guess that having schema I can analyze the file. It has 3 kinds of the > nodes (DB tuples). I found out that I can use dom or sax module. And as a > file is about 100MB sax is the way to go. What should I do to understand > data? What should I ask google for? The only way I found to analyze a large > file is to write a doc handler specifying startElement/characters/? > methods. I don?t believe that there is no way to use given logical > structure (and types of element). > The only thing I need to do (now) is to parse a file and move data to > RDBMS. Please, at least some keywords to start with. The truth is you already have your answer. I'll be interested to see if anyone else describes a different process, but as far as I am concerned, DOM and SAX are the alternatives to choose from. 100MB isn't that much to handle in DOM on a modern machine (my desktop has 1GB of RAM so can handle data several times that size without swapping), so DOM is a valid option. However, SAX is also valid, and perhaps applies better to your hardware or data. So you do what you say. Write a handler with code to catch the elements and characters and deal with them as you wish. Remember that since you have valid XML (what are you using to validate against the schema?) your code can be quite simple. You know what sort of data is coming, and the exact order it's coming in. Your error handling can be minimal. To use the "given logical structure" as you put it, you are better off using the DOM parser, then writing your code to walk around the data tree. But in your case - for transferring data to an RDBMS - you don't need to do that. Just parse over it using SAX and pick out each tuple as it goes past. -- > eatapple core dump From fredrik at pythonware.com Sun Feb 29 02:27:15 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sun Feb 29 02:27:42 2004 Subject: [XML-SIG] Re: Some questions from a beginner References: <200402290748.57630.derekfountain@yahoo.co.uk> Message-ID: Derek Fountain wrote: > The truth is you already have your answer. I'll be interested to see if anyone > else describes a different process, but as far as I am concerned, DOM and SAX > are the alternatives to choose from. 100MB isn't that much to handle in DOM > on a modern machine (my desktop has 1GB of RAM so can handle data several > times that size without swapping), so DOM is a valid option. what DOM library are you using that only needs a few bytes in memory for each byte on disk? (last time I checked, minidom and friends used around 50 bytes per source byte, on typical samples. libxml may do a better job, but it's hard to get under 20 bytes with a Python-based object model.). if the source file is relatively structured (e.g. it contains many thousand records, all having an identical structure), you can use an incremental DOM parsing approach. here's an example for the elementtree library: http://effbot.org/zone/element-pull.htm I'm sure you use a similar approach with many other DOM libraries. From kmmcdonald at wisc.edu Sun Feb 29 22:40:51 2004 From: kmmcdonald at wisc.edu (Kenneth McDonald) Date: Sun Feb 29 22:40:52 2004 Subject: [XML-SIG] Any standard (and python libs) for storing basic tables in XML? Message-ID: <3C3C7972-6B32-11D8-B175-000A956870AC@wisc.edu> I'm going to be doing some UIs with tables soon (using either the Tk Text or TkTable widgets--still need to check the latter), and thought it would be a no-brainer to store the table data in some sort of XML format. I can make up my own easily enough, of course, but first thought I would check to see if there was a standard of any sort for representing simple tabular data in XML, and if so, if there are any python libs for working with it. Thanks, Ken McDonald From tpassin at comcast.net Sun Feb 29 23:15:19 2004 From: tpassin at comcast.net (Thomas B. Passin) Date: Sun Feb 29 23:13:22 2004 Subject: [XML-SIG] Any standard (and python libs) for storing basic tables in XML? In-Reply-To: <3C3C7972-6B32-11D8-B175-000A956870AC@wisc.edu> References: <3C3C7972-6B32-11D8-B175-000A956870AC@wisc.edu> Message-ID: <4042B8D7.9090900@comcast.net> Kenneth McDonald wrote: > I'm going to be doing some UIs with tables soon (using either the Tk > Text or TkTable widgets--still need to check the latter), and thought it > would be a no-brainer to store the table data in some sort of XML > format. I can make up my own easily enough, of course, but first thought > I would check to see if there was a standard of any sort for > representing simple tabular data in XML, and if so, if there are any > python libs for working with it. > There isn't any real standard, but you might like to look at RAX - http://www.xml.com/pub/a/2000/04/26/rax/index.html This is a simple API/implementation for working with basic record-oriented documents. It is rather old, but should still be useful. Cheers, Tom P From kzurawel at umich.edu Sun Feb 29 13:42:40 2004 From: kzurawel at umich.edu (Kevin Zurawel) Date: Mon Mar 1 13:11:41 2004 Subject: [XML-SIG] Re: Some questions from a beginner Message-ID: <200402291342.40885.kzurawel@umich.edu> Try using the pulldom or "lazy DOM" method, it's worked great for me before. It's in xml.dom.pulldom.