From tennis at tripit.com Thu Jul 9 21:06:38 2009 From: tennis at tripit.com (Tennis Smith) Date: Thu, 9 Jul 2009 12:06:38 -0700 Subject: [XML-SIG] Advice On Testing With XML Message-ID: Hi, I'm looking for some guidence in handling a testing issue. I'm new to XML/XSLT, so please bear with me. First, a little background. My charter is to generate XML test messages to make sure we process them correctly. These messages are validated against a schema. I'm using generateDS to generate the test messages. This ensures the xml is correct. Everything works great except for one problem that keeps cropping up. Some elements cannot be defined easily ahead of time when generating the final test document. For example, a field of type "xs:date" will have to be modifed because tests are based on a relative date, not an absolute one. That is, dates in tests are based on things like "3 days before today". Therefore, I'd like to figure out some way to change certain fields like date so that I can pass a string and _still validate_ it against the schema. Using the example, "-3" would be passed in the date field so that the test harness will recognize it as "today - 3 days". Put another way, the goal is to make this: * * ...behave like this: ** Naturally, I can edit and copy/paste into a completely new schema file. But I was hoping someone could tell me if I can do some kind of XSLT or whatever to get the same effect. Thanks, -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Thu Jul 9 22:13:42 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 09 Jul 2009 22:13:42 +0200 Subject: [XML-SIG] Advice On Testing With XML In-Reply-To: References: Message-ID: <4A564F76.5080700@behnel.de> Hi, Tennis Smith wrote: > First, a little background. My charter is to generate XML test messages to > make sure we process them correctly. These messages are validated against a > schema. I'm using generateDS to generate the test messages. This ensures > the xml is correct. Hmm, I never (really) used generateDS. AFAIR, it generates Python objects that you work with. Does it validate their structure while you do so? Or did you refer to the schema validation that "ensures" the message correctness? > Everything works great except for one problem that keeps cropping up. Some > elements cannot be defined easily ahead of time when generating the final > test document. > > For example, a field of type "xs:date" will have to be modifed because tests > are based on a relative date, not an absolute one. That is, dates in tests > are based on things like "3 days before today". > > Therefore, I'd like to figure out some way to change certain fields like > date so that I can pass a string and _still validate_ it against the > schema. Using the example, "-3" would be passed in the date field so that > the test harness will recognize it as "today - 3 days". Why can't you just write the corresponding date into the messages when you generate them? > Put another way, the goal is to make this: > * * > ...behave like this: > ** > > Naturally, I can edit and copy/paste into a completely new schema file. But > I was hoping someone could tell me if I can do some kind of XSLT or whatever > to get the same effect. I'd just change the schema on the way in. You didn't say what tool you use for validation, but at least in lxml, modifying the schema tree is pretty trivial. You can simply use XPath to find all date types and then fix their type attribute. Stefan From bigotp at acm.org Fri Jul 10 00:46:50 2009 From: bigotp at acm.org (Peter A. Bigot) Date: Thu, 09 Jul 2009 17:46:50 -0500 Subject: [XML-SIG] Advice On Testing With XML In-Reply-To: References: Message-ID: <4A56735A.6090006@acm.org> I don't grasp exactly what you're trying to do, but if you need a program that generates XML documents that conform to a schema for which date values are relative to today, I agree having the harness write the older date seems to make sense. If generateDS doesn't fully support all the XML Schema date types, you could do that using PyXB with a program like this: import schema import pyxb.binding.datatypes as xsd import datetime delta = xsd.duration('P3D') s = schema.instance() s.setElt(datetime.date.today() - delta) print s.toxml() with output: 2009-07-06 assuming the schema is: PyXB (see http://pyxb.sourceforge.net) is definitely beta software, but it's coming along nicely. It makes a strong effort to validate the data written into the binding instances (in fact, a weakness is that you can't stop it from trying to validate). It can also handle very complex schemas, such as those from OpenGIS. If you really need to change the type of an element in a complex type at runtime, it could be done by generating a customized binding (though you'd have to modify the runtime support class pyxb.binding.basis.element to allow this particular kind of customization). Peter Tennis Smith wrote: > Hi, > > I'm looking for some guidence in handling a testing issue. I'm new to > XML/XSLT, so please bear with me. > > First, a little background. My charter is to generate XML test > messages to make sure we process them correctly. These messages are > validated against a schema. I'm using generateDS to generate the test > messages. This ensures the xml is correct. > > Everything works great except for one problem that keeps cropping up. > Some elements cannot be defined easily ahead of time when generating > the final test document. > > For example, a field of type "xs:date" will have to be modifed because > tests are based on a relative date, not an absolute one. That is, > dates in tests are based on things like "3 days before today". > > Therefore, I'd like to figure out some way to change certain fields > like date so that I can pass a string and _still validate_ it against > the schema. Using the example, "-3" would be passed in the date field > so that the test harness will recognize it as "today - 3 days". > > Put another way, the goal is to make this: > / / > ...behave like this: > // > > Naturally, I can edit and copy/paste into a completely new schema > file. But I was hoping someone could tell me if I can do some kind of > XSLT or whatever to get the same effect. > > Thanks, > > ------------------------------------------------------------------------ > > _______________________________________________ > XML-SIG maillist - XML-SIG at python.org > http://mail.python.org/mailman/listinfo/xml-sig > -------------- next part -------------- An HTML attachment was scrubbed... URL: From evdo.hsdpa at gmail.com Fri Jul 10 01:49:08 2009 From: evdo.hsdpa at gmail.com (Robert Kim Wireless Internet Advisor) Date: Thu, 9 Jul 2009 16:49:08 -0700 Subject: [XML-SIG] Advice On Testing With XML In-Reply-To: References: Message-ID: <1ec620e90907091649odb2872dj494d7503ab134ca0@mail.gmail.com> Are you guys on twitter? whats your twitter address? im @journik On Thu, Jul 9, 2009 at 12:06 PM, Tennis Smith wrote: > Hi, > > I'm looking for some guidence in handling a testing issue.? I'm new to > XML/XSLT, so please bear with me. > > First, a little background.? My charter is to generate XML test messages to > make sure we process them correctly.? These messages are validated against a > schema.? I'm using generateDS to generate the test messages.? This ensures > the xml is correct. > > Everything works great except for one problem that keeps cropping up.? Some > elements cannot be defined easily ahead of time when generating the final > test document. > > For example, a field of type "xs:date" will have to be modifed because tests > are based on a relative date, not an absolute one. That is, dates in tests > are based on things like "3 days before today". > > Therefore, I'd like to figure out some way to change certain fields like > date so that I can pass a string and _still validate_ it against the > schema.? Using the example, "-3" would be passed in the date field so that > the test harness will recognize it as "today - 3 days". > > Put another way, the goal is to make this: > ? > ...behave like this: > ? > > Naturally, I can edit and copy/paste into a completely new schema file. But > I was hoping someone could tell me if I can do some kind of XSLT or whatever > to get the same effect. > > Thanks, > > > _______________________________________________ > XML-SIG maillist ?- ?XML-SIG at python.org > http://mail.python.org/mailman/listinfo/xml-sig > > -- Robert Q Kim, Wireless Internet Provider http://journik.com http://journik.posterous.com http://twitter.com/journik From smcg4191 at frii.com Sun Jul 12 21:35:10 2009 From: smcg4191 at frii.com (Stuart McGraw) Date: Sun, 12 Jul 2009 13:35:10 -0600 Subject: [XML-SIG] my own entity defs when parsing with etree? Message-ID: <4A5A3AEE.3040109@frii.com> Hello, I could use some really basic help about using Etree. I have tried reading the etree and expat doc but I don't understand most of it. I have an xml file that contains a dtd that defines a number of entities that are subsequently referenced in the xml. What I would like to do: 1) Parse the xml file but override some or all of the entity definitions in the dtd with my own definitions. 2) Parse strings containing elements extracted from the full xml file, without the dtd, and supplying my own entity map to resolve any entities. I am nearly clueless when it comes to xml processesing so if I could get a code snippet illustrating how to do the above, that would be wonderful! I am currently using the stock Python 2.6 elementTree, but could switch to lxml's if that would help. From stefan_ml at behnel.de Sun Jul 12 22:27:45 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 12 Jul 2009 22:27:45 +0200 Subject: [XML-SIG] my own entity defs when parsing with etree? In-Reply-To: <4A5A3AEE.3040109@frii.com> References: <4A5A3AEE.3040109@frii.com> Message-ID: <4A5A4741.7040105@behnel.de> Stuart McGraw wrote: > I could use some really basic help about using Etree. > I have tried reading the etree and expat doc but I > don't understand most of it. In that case, you should read up on XML in general first. The Wikipedia article isn't all that bad: http://en.wikipedia.org/wiki/XML > I have an xml file that contains a dtd that defines a > number of entities that are subsequently referenced > in the xml. > > What I would like to do: > > 1) Parse the xml file but override some or all of the > entity definitions in the dtd with my own definitions. > > 2) Parse strings containing elements extracted from > the full xml file, without the dtd, and supplying my > own entity map to resolve any entities. http://effbot.org/elementtree/elementtree-xmlparser.htm#tag-ET.XMLParser.entity > I am nearly clueless when it comes to xml processesing > so if I could get a code snippet illustrating how to > do the above, that would be wonderful! I am currently > using the stock Python 2.6 elementTree, but could > switch to lxml's if that would help. ElementTree (i.e. the xml.etree package) does not supports DTDs at all. If you want to use DTDs, e.g. to do validation, to inject default attributes, or to resolve entity references, you can switch to the external lxml.etree package. Note, however, that lxml does not support the ".entity" dictionary on parsers. It doesn't currently have a way to override entity definitions outside of a DTD. Stefan From joshua.r.english at gmail.com Mon Jul 13 02:24:25 2009 From: joshua.r.english at gmail.com (Josh English) Date: Sun, 12 Jul 2009 17:24:25 -0700 Subject: [XML-SIG] my own entity defs when parsing with etree? In-Reply-To: <4A5A4741.7040105@behnel.de> References: <4A5A3AEE.3040109@frii.com> <4A5A4741.7040105@behnel.de> Message-ID: I gave up on Entities ages ago, but thought I'd try it after seeing your link. I tried this simple code: from elementtree import ElementTree as ET p = ET.XMLParser() p.entity["me"] = "Josh" text = """&me;""" p.feed(text) e = p.close() print e ET.dump(e) And got an error: >pythonw -u "ETParserWithEntities.py" Traceback (most recent call last): File "ETParserWithEntities.py", line 9, in p.feed(text) File "C:\Python26\lib\site-packages\elementtree\ElementTree.py", line 1524, in feed self._raiseerror(v) File "C:\Python26\lib\site-packages\elementtree\ElementTree.py", line 1426, in _raiseerror raise err elementtree.ElementTree.ParseError: undefined entity: line 1, column 6 >Exit code: 1 As far as I can tell, the XMLParser is using pyexpat, which only comes as a .pyd file, so I can't look into this. Any ideas? Windows XP, Python 2.6, elementtree 1v3a2 Josh English -- Josh English Joshua.R.English at gmail.com http://joshenglish.livejournal.com From stefan_ml at behnel.de Mon Jul 13 08:08:37 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 13 Jul 2009 08:08:37 +0200 Subject: [XML-SIG] my own entity defs when parsing with etree? In-Reply-To: References: <4A5A3AEE.3040109@frii.com> <4A5A4741.7040105@behnel.de> Message-ID: <4A5ACF65.5040404@behnel.de> Hi, Josh English wrote: > I gave up on Entities ages ago, but thought I'd try it after seeing your link. > > I tried this simple code: > > from elementtree import ElementTree as ET > > p = ET.XMLParser() > > p.entity["me"] = "Josh" > > text = """&me;""" > > p.feed(text) > > e = p.close() > > print e > ET.dump(e) > > And got an error: > >> pythonw -u "ETParserWithEntities.py" > Traceback (most recent call last): > File "ETParserWithEntities.py", line 9, in > p.feed(text) > File "C:\Python26\lib\site-packages\elementtree\ElementTree.py", > line 1524, in feed > self._raiseerror(v) > File "C:\Python26\lib\site-packages\elementtree\ElementTree.py", > line 1426, in _raiseerror > raise err > elementtree.ElementTree.ParseError: undefined entity: line 1, column 6 >> Exit code: 1 Interesting. I just tried and got the same result. I guess I never even tried to do this, given that I knew lxml won't support it anyway... Without debugging into this, it seems that expat raises that exception before ElementTree even gets to handle the unknown entity. I just found this post, but didn't try it: http://mail.python.org/pipermail/python-list/2007-April/607256.html Stefan From tennis at tripit.com Thu Jul 9 23:20:53 2009 From: tennis at tripit.com (Tennis Smith) Date: Thu, 9 Jul 2009 14:20:53 -0700 Subject: [XML-SIG] Advice On Testing With XML In-Reply-To: <4A564F76.5080700@behnel.de> References: <4A564F76.5080700@behnel.de> Message-ID: On Thu, Jul 9, 2009 at 1:13 PM, Stefan Behnel wrote: > Hi, > > Tennis Smith wrote: > > First, a little background. My charter is to generate XML test messages > to > > make sure we process them correctly. These messages are validated > against a > > schema. I'm using generateDS to generate the test messages. This > ensures > > the xml is correct. > > Hmm, I never (really) used generateDS. AFAIR, it generates Python objects > that you work with. Does it validate their structure while you do so? Or > did you refer to the schema validation that "ensures" the message > correctness? genDS ensures correctness because there are several layers of object types cascaded in the schema. Since genDS creates wrappers for all these, it makes creating schema-compliant objects really easy. > > > > Everything works great except for one problem that keeps cropping up. > Some > > elements cannot be defined easily ahead of time when generating the final > > test document. > > > > For example, a field of type "xs:date" will have to be modifed because > tests > > are based on a relative date, not an absolute one. That is, dates in > tests > > are based on things like "3 days before today". > > > > Therefore, I'd like to figure out some way to change certain fields like > > date so that I can pass a string and _still validate_ it against the > > schema. Using the example, "-3" would be passed in the date field so > that > > the test harness will recognize it as "today - 3 days". > > Why can't you just write the corresponding date into the messages when you > generate them? The messages are generated long before they are actually transmitted. There are literally thousands of tests which are created this way. After generation, they're stored in svn and then used much later. > Put another way, the goal is to make this: > * * > ...behave like this: > ** > > Naturally, I can edit and copy/paste into a completely new schema file. But > I was hoping someone could tell me if I can do some kind of XSLT or whatever > to get the same effect. I'd just change the schema on the way in. You didn't say what tool you use > for validation, but at least in lxml, modifying the schema tree is pretty > trivial. You can simply use XPath to find all date types and then fix their > type attribute. The tool I'm using is etree. That's a great suggestion concerning xpath. That sounds pretty easy. Thanks, Stefan! > > > Stefan > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sklein at cpcug.org Wed Jul 15 17:37:43 2009 From: sklein at cpcug.org (Stanley A. Klein) Date: Wed, 15 Jul 2009 11:37:43 -0400 (EDT) Subject: [XML-SIG] Is anyone implementing EXI in Python? Message-ID: <49531.207.188.248.157.1247672263.squirrel@www.cpcug.org> Efficient XML Interchange (EXI) is moving toward adoption by W3C. It provides a format for efficiently representing XML documents with schema-informed and schema-less modes. There is an open-source Java implementation available. Is anyone working to implement EXI in Python? Stan Klein From ht at inf.ed.ac.uk Wed Jul 15 19:37:34 2009 From: ht at inf.ed.ac.uk (Henry S. Thompson) Date: Wed, 15 Jul 2009 18:37:34 +0100 Subject: [XML-SIG] Is anyone implementing EXI in Python? In-Reply-To: <49531.207.188.248.157.1247672263.squirrel@www.cpcug.org> (Stanley A. Klein's message of "Wed, 15 Jul 2009 11:37:43 -0400 (EDT)") References: <49531.207.188.248.157.1247672263.squirrel@www.cpcug.org> Message-ID: Stanley A. Klein writes: > Efficient XML Interchange (EXI) is moving toward adoption by W3C. It > provides a format for efficiently representing XML documents with > schema-informed and schema-less modes. > > There is an open-source Java implementation available. > > Is anyone working to implement EXI in Python? Don't get me wrong, I think EXI is useful, in the right places, but, could I ask, why would you want to implement it in Python? I'd be very surprised if any Python XML application is spending anything like enough time in the raw parsing activity (as opposed to the structure-building activity) to make the marginal gain you might get from EXI worth it. . . EXI is, IMO, for closely coupled systems in particular messaging environments where every bit counts, and I guess I'm having difficulty imagining Python in such a context. . . ht -- Henry S. Thompson, School of Informatics, University of Edinburgh Half-time member of W3C Team 10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 651-1426, e-mail: ht at inf.ed.ac.uk URL: http://www.ltg.ed.ac.uk/~ht/ [mail really from me _always_ has this .sig -- mail without it is forged spam] From sklein at cpcug.org Wed Jul 15 21:51:12 2009 From: sklein at cpcug.org (Stanley A. Klein) Date: Wed, 15 Jul 2009 15:51:12 -0400 (EDT) Subject: [XML-SIG] Is anyone implementing EXI in Python? In-Reply-To: References: <49531.207.188.248.157.1247672263.squirrel@www.cpcug.org> Message-ID: <51461.71.163.219.209.1247687472.squirrel@www.cpcug.org> EXI is for data interchange. That can mean messaging or document/data storage. SOAP messages are very verbose, and SOAP messaging can benefit from EXI, especially if the communications channels have bandwidth or transit time considerations. SOAP is increasingly being considered in a variety of control system applications for which Python makes sense as an implementation language. Similarly, scientific applications involving large amounts of XML-formatted data could benefit from EXI in storing the data or interchanging it for purposes such as grid processing. The original application that contributed the technology for EXI was sending web pages to cell phones. In general, any applications implemented in Python that involves messaging or data storage with either bandwidth or storage volume concerns could benefit from EXI. And as best I know there are a growing number of such applications implemented in Python. Also, why would Java make sense and Python not? Stan Klein On Wed, July 15, 2009 1:37 pm, Henry S. Thompson wrote: > Stanley A. Klein writes: > >> Efficient XML Interchange (EXI) is moving toward adoption by W3C. It >> provides a format for efficiently representing XML documents with >> schema-informed and schema-less modes. >> >> There is an open-source Java implementation available. >> >> Is anyone working to implement EXI in Python? > > Don't get me wrong, I think EXI is useful, in the right places, but, > could I ask, why would you want to implement it in Python? I'd be > very surprised if any Python XML application is spending anything like > enough time in the raw parsing activity (as opposed to the > structure-building activity) to make the marginal gain you might get > from EXI worth it. . . > > EXI is, IMO, for closely coupled systems in particular messaging > environments where every bit counts, and I guess I'm having difficulty > imagining Python in such a context. . . > > ht > -- > Henry S. Thompson, School of Informatics, University of Edinburgh > Half-time member of W3C Team > 10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440 > Fax: (44) 131 651-1426, e-mail: ht at inf.ed.ac.uk > URL: http://www.ltg.ed.ac.uk/~ht/ > [mail really from me _always_ has this .sig -- mail without it is forged > spam] > -- From stefan_ml at behnel.de Wed Jul 15 22:26:57 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 15 Jul 2009 22:26:57 +0200 Subject: [XML-SIG] Is anyone implementing EXI in Python? In-Reply-To: <51461.71.163.219.209.1247687472.squirrel@www.cpcug.org> References: <49531.207.188.248.157.1247672263.squirrel@www.cpcug.org> <51461.71.163.219.209.1247687472.squirrel@www.cpcug.org> Message-ID: <4A5E3B91.4070401@behnel.de> Hi, Stanley A. Klein wrote: > On Wed, July 15, 2009 1:37 pm, Henry S. Thompson wrote: >> Stanley A. Klein writes: >> >>> Efficient XML Interchange (EXI) is moving toward adoption by W3C. It >>> provides a format for efficiently representing XML documents with >>> schema-informed and schema-less modes. >>> >>> There is an open-source Java implementation available. >>> >>> Is anyone working to implement EXI in Python? >> >> Don't get me wrong, I think EXI is useful, in the right places, but, >> could I ask, why would you want to implement it in Python? I'd be >> very surprised if any Python XML application is spending anything like >> enough time in the raw parsing activity (as opposed to the >> structure-building activity) to make the marginal gain you might get >> from EXI worth it. . . >> >> EXI is, IMO, for closely coupled systems in particular messaging >> environments where every bit counts, and I guess I'm having difficulty >> imagining Python in such a context. . . > > EXI is for data interchange. That can mean messaging or document/data > storage. SOAP messages are very verbose, and SOAP messaging can benefit > from EXI, especially if the communications channels have bandwidth or > transit time considerations. > > SOAP is increasingly being considered in a > variety of control system applications for which Python makes sense as an > implementation language. Similarly, scientific applications involving > large amounts of XML-formatted data could benefit from EXI in storing the > data or interchanging it for purposes such as grid processing. > > The original application that contributed the technology for EXI was > sending web pages to cell phones. > > In general, any applications implemented in Python that involves > messaging > or data storage with either bandwidth or storage volume concerns could > benefit from EXI. And as best I know there are a growing number of such > applications implemented in Python. Any XML transmission or storage can benefit from *compression*, often shrinking the data volume by factors up to 100. I doubt that the savings of EXI are sufficiently large compared to a well compressed XML stream that they compensate for the drawbacks of yet another new non-readable format. A well chosen compression method is a lot better suited to such applications and is already supported by most available XML parsers (or rather outside of the parsers themselves, which is a huge advantage). > Also, why would Java make sense and Python not? Because pretty much all XML technologies come from the Java environment? That doesn't mean that Java is a suitable language for working with them. It only means that it supports them because Java is used for developing them (often as a reference implementation). Stefan From sklein at cpcug.org Thu Jul 16 20:34:45 2009 From: sklein at cpcug.org (Stanley A. Klein) Date: Thu, 16 Jul 2009 14:34:45 -0400 (EDT) Subject: [XML-SIG] Is anyone implementing EXI in Python? In-Reply-To: <4A5E3B91.4070401@behnel.de> References: <49531.207.188.248.157.1247672263.squirrel@www.cpcug.org> <51461.71.163.219.209.1247687472.squirrel@www.cpcug.org> <4A5E3B91.4070401@behnel.de> Message-ID: <47353.207.188.248.157.1247769285.squirrel@www.cpcug.org> On Wed, 2009-07-15 at 22:26 +0200, Stefan Behnel wrote: > Hi, > > Stanley A. Klein wrote: > > On Wed, July 15, 2009 1:37 pm, Henry S. Thompson wrote: > >> Stanley A. Klein writes: > >> > >>> Efficient XML Interchange (EXI) is moving toward adoption by W3C. It > >>> provides a format for efficiently representing XML documents with schema-informed and schema-less modes. > >>> > >>> There is an open-source Java implementation available. > >>> > >>> Is anyone working to implement EXI in Python? > >> > >> Don't get me wrong, I think EXI is useful, in the right places, but, could I ask, why would you want to implement it in Python? I'd be very surprised if any Python XML application is spending anything like > >> enough time in the raw parsing activity (as opposed to the > >> structure-building activity) to make the marginal gain you might get from EXI worth it. . . > >> > >> EXI is, IMO, for closely coupled systems in particular messaging environments where every bit counts, and I guess I'm having difficulty > >> imagining Python in such a context. . . > > > > EXI is for data interchange. That can mean messaging or document/data storage. SOAP messages are very verbose, and SOAP messaging can benefit > > from EXI, especially if the communications channels have bandwidth or transit time considerations. > > > > SOAP is increasingly being considered in a > > variety of control system applications for which Python makes sense as an > > implementation language. Similarly, scientific applications involving large amounts of XML-formatted data could benefit from EXI in storing the > > data or interchanging it for purposes such as grid processing. > > > > The original application that contributed the technology for EXI was sending web pages to cell phones. > > > > In general, any applications implemented in Python that involves messaging > > or data storage with either bandwidth or storage volume concerns could benefit from EXI. And as best I know there are a growing number of such > > applications implemented in Python. > > Any XML transmission or storage can benefit from *compression*, often shrinking the data volume by factors up to 100. I doubt that the savings of EXI are sufficiently large compared to a well compressed XML stream that they compensate for the drawbacks of yet another new non-readable format. > > A well chosen compression method is a lot better suited to such > applications and is already supported by most available XML parsers (or rather outside of the parsers themselves, which is a huge advantage). > > > > Also, why would Java make sense and Python not? > > Because pretty much all XML technologies come from the Java environment? That doesn't mean that Java is a suitable language for working with them. > It only means that it supports them because Java is used for developing them (often as a reference implementation). > > Stefan It depends on the nature of the XML application. One feature of EXI is to support representation of numeric data as bits rather than characters. That is very useful in appropriate applications. There is a measurements document that shows the compression that was achieved on a wide variety of test cases. Straight use of a common compression algorithm does not necessarily achieve the best results. Besides, EXI incorporates elements of common compression algorithm(s) as both a fallback for its schema-less mode and an additional capability in its schema-informed mode. EXI is intended for use outboard of the parser, and that would apply equally well to a Python version. For example, EXI gets rid of the need to constantly resend over-the-wire all the namespace definitions with each message. The relevant strings would just go into the string table and get restored from there when the message is converted back. However, for something like SOAP in certain applications, it may be eventually desirable to integrate the EXI implementation within the communications system. The message sender could reasonably create a schema-informed EXI version without actually starting from and converting an XML object. The recipient would have to convert the EXI back to XML, parse it, and use the data. Regarding the format readability, it converts to XML and is readable there. Numeric data is most efficiently sent as bits, so that data is necessarily unreadable until converted. The value of EXI necessarily depends on the application. Stan Klein -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Fri Jul 17 10:06:01 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 17 Jul 2009 10:06:01 +0200 Subject: [XML-SIG] Is anyone implementing EXI in Python? In-Reply-To: <47353.207.188.248.157.1247769285.squirrel@www.cpcug.org> References: <49531.207.188.248.157.1247672263.squirrel@www.cpcug.org> <51461.71.163.219.209.1247687472.squirrel@www.cpcug.org> <4A5E3B91.4070401@behnel.de> <47353.207.188.248.157.1247769285.squirrel@www.cpcug.org> Message-ID: <4A6030E9.6010909@behnel.de> Hi, Stanley A. Klein wrote: > On Wed, 2009-07-15 at 22:26 +0200, Stefan Behnel wrote: >> A well chosen compression method is a lot better suited to such >> applications and is already supported by most available XML parsers (or >> rather outside of the parsers themselves, which is a huge advantage). > > It depends on the nature of the XML application. One feature of EXI is to > support representation of numeric data as bits rather than characters. > That is very useful in appropriate applications. One drawback is that this requires a schema to make sure the number of bits is sufficient. Otherwise, you'd need to add the information how many bits you use for their representation, which would add to the data volume. > There is a measurements > document that shows the compression that was achieved on a wide variety of > test cases. Straight use of a common compression algorithm does not > necessarily achieve the best results. Repetitive data like an XML byte stream compresses extremely well, though, and the 'best' compression isn't always required anyway. I worked on a Python SOAP application where we sent some 3MB of XML as a web service response. That took a couple of seconds to transmit. Injecting the standard gzip algorithm into the WSGI stack got it down to some 48KB. Nothing more to do here. If you need 'the best' compression, there's no way around benchmarking a couple of different algorithms that are suitable for your application, and choosing the one that works best for your data. That may or may not include EXI. > Besides, EXI incorporates elements > of common compression algorithm(s) as both a fallback for its schema-less > mode and an additional capability in its schema-informed mode. Makes sense, as compression also applies to text content, for example. > EXI is intended for use outboard of the parser, and that would apply > equally well to a Python version. For example, EXI gets rid of the need > to constantly resend over-the-wire all the namespace definitions with each > message. The relevant strings would just go into the string table and get > restored from there when the message is converted back. That's how any run-length based compression algorithm works anyway. Plus, namespace definitions usually only happen once in a document, so they are pretty much negligible in a larger XML document. > However, for something like SOAP in certain applications, it may be > eventually desirable to integrate the EXI implementation within the > communications system. The message sender could reasonably create a > schema-informed EXI version without actually starting from and converting > an XML object. The recipient would have to convert the EXI back to XML, > parse it, and use the data. Ok, that's where I see it, too. At the level where you'd normally apply a compression algorithm anyway. > Numeric data is most efficiently sent as bits Depends on how you select the bits. When I write into my schema that I use a 32 bit integer value in my XML, and all I really send happens to be within [0-9] in, say, 95% of the cases with a few exceptions that really require 32 bits, a general run-length compression algorithm will easily beat anything that sends the value as a 4-byte sequence. That's the advantage of general compression: it sees the real data, not only its schema. I do not question EXI in general, I'm fine with it having its niche (wherever that turns out to be). I'm just saying that common compression algorithms are a lot more broadly available and achieve similar results. So EXI is just another way of compressing XML, with the disadvantage of not being as widely implemented. Compare it to the ubiquity of the gzip compression algorithm, for example. It's just the usual trade-off that you make between efficiency and cross-platform compatibility. Stefan From sklein at cpcug.org Fri Jul 17 17:01:12 2009 From: sklein at cpcug.org (Stanley A. Klein) Date: Fri, 17 Jul 2009 11:01:12 -0400 (EDT) Subject: [XML-SIG] Is anyone implementing EXI in Python? In-Reply-To: <4A6030E9.6010909@behnel.de> References: <49531.207.188.248.157.1247672263.squirrel@www.cpcug.org> <51461.71.163.219.209.1247687472.squirrel@www.cpcug.org> <4A5E3B91.4070401@behnel.de> <47353.207.188.248.157.1247769285.squirrel@www.cpcug.org> <4A6030E9.6010909@behnel.de> Message-ID: <4153.207.188.248.157.1247842872.squirrel@www.cpcug.org> I think the issue here is the nature of the data exchange. EXI essentially provides a compression algorithm that saves information between instances of a message or file and can be seeded with what is known in advance about certain characteristics of the instances. The gzip algorithm learns the characteristics of each instance separately from that instance and does not retain information between instances. If you are occasionally sending a large file, gzip makes sense. There is little gain from retaining information. However, if you have frequent small messages or separate small files based on a schema, the namespace definitions are repeated for each instance and can take up an appreciable fraction of what is sent over-the-wire for each instance. There isn't much for gzip to learn, and it has to start all over for the next instance. Similarly, the tags recur across instances but gzip will only learn them as it encounters them in a particular instance. Again, gzip forgets between instances. I think in the absence of prior information and when used only occasionally (without information retention between instances), EXI provides something close to gzip compression. What EXI provides is a variant of compression technology that has information retention between instances and the ability to use prior information across instances. In applications with frequent repetitive data exchanges, the information retention and ability to use prior information can provide significant benefits. Stan Klein On Fri, July 17, 2009 4:06 am, Stefan Behnel wrote: > Hi, > > Stanley A. Klein wrote: >> On Wed, 2009-07-15 at 22:26 +0200, Stefan Behnel wrote: >>> A well chosen compression method is a lot better suited to such >>> applications and is already supported by most available XML parsers (or >>> rather outside of the parsers themselves, which is a huge advantage). >> >> It depends on the nature of the XML application. One feature of EXI is >> to >> support representation of numeric data as bits rather than characters. >> That is very useful in appropriate applications. > > One drawback is that this requires a schema to make sure the number of > bits > is sufficient. Otherwise, you'd need to add the information how many bits > you use for their representation, which would add to the data volume. > > >> There is a measurements >> document that shows the compression that was achieved on a wide variety >> of >> test cases. Straight use of a common compression algorithm does not >> necessarily achieve the best results. > > Repetitive data like an XML byte stream compresses extremely well, though, > and the 'best' compression isn't always required anyway. I worked on a > Python SOAP application where we sent some 3MB of XML as a web service > response. That took a couple of seconds to transmit. Injecting the > standard > gzip algorithm into the WSGI stack got it down to some 48KB. Nothing more > to do here. > > If you need 'the best' compression, there's no way around benchmarking a > couple of different algorithms that are suitable for your application, and > choosing the one that works best for your data. That may or may not > include > EXI. > > >> Besides, EXI incorporates elements >> of common compression algorithm(s) as both a fallback for its >> schema-less >> mode and an additional capability in its schema-informed mode. > > Makes sense, as compression also applies to text content, for example. > > >> EXI is intended for use outboard of the parser, and that would apply >> equally well to a Python version. For example, EXI gets rid of the need >> to constantly resend over-the-wire all the namespace definitions with >> each >> message. The relevant strings would just go into the string table and >> get >> restored from there when the message is converted back. > > That's how any run-length based compression algorithm works anyway. Plus, > namespace definitions usually only happen once in a document, so they are > pretty much negligible in a larger XML document. > > >> However, for something like SOAP in certain applications, it may be >> eventually desirable to integrate the EXI implementation within the >> communications system. The message sender could reasonably create a >> schema-informed EXI version without actually starting from and >> converting >> an XML object. The recipient would have to convert the EXI back to XML, >> parse it, and use the data. > > Ok, that's where I see it, too. At the level where you'd normally apply a > compression algorithm anyway. > > >> Numeric data is most efficiently sent as bits > > Depends on how you select the bits. When I write into my schema that I use > a 32 bit integer value in my XML, and all I really send happens to be > within [0-9] in, say, 95% of the cases with a few exceptions that really > require 32 bits, a general run-length compression algorithm will easily > beat anything that sends the value as a 4-byte sequence. That's the > advantage of general compression: it sees the real data, not only its > schema. > > I do not question EXI in general, I'm fine with it having its niche > (wherever that turns out to be). I'm just saying that common compression > algorithms are a lot more broadly available and achieve similar results. > So > EXI is just another way of compressing XML, with the disadvantage of not > being as widely implemented. Compare it to the ubiquity of the gzip > compression algorithm, for example. It's just the usual trade-off that you > make between efficiency and cross-platform compatibility. > > Stefan > -- From bo.laurent at canonical.com Tue Jul 21 10:59:31 2009 From: bo.laurent at canonical.com (Bo Laurent) Date: Tue, 21 Jul 2009 01:59:31 -0700 Subject: [XML-SIG] help getting started with xpath Message-ID: I'm new to lxml. I've parsed a simple document, as shown below. But I every simple xpath() expression I try returns empty list. What am I doing wrong? Perhaps I need to spec the namespace to the parser? * CustomObject 16.0 self.doc = etree.parse( self.package_xml_path ) (Pdb) root = self.doc.getroot() (Pdb) root.getchildren() [, ] (Pdb) root.xpath('//Package') [] (Pdb) root.xpath('/Package') [] (Pdb) root.xpath('Package') [] (Pdb) root.xpath('types') [] (Pdb) root.xpath('/types') [] ===== environment ==== Python 2.5.2 lxml-2.2-py2.5-macosx-10.3-i386.egg OSX 10.5.7 From stefan_ml at behnel.de Tue Jul 21 17:27:50 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 21 Jul 2009 17:27:50 +0200 Subject: [XML-SIG] help getting started with xpath In-Reply-To: References: Message-ID: <4A65DE76.7040504@behnel.de> Bo Laurent wrote: > I'm new to lxml. I've parsed a simple document, as shown below. But I > every simple xpath() expression I try returns empty list. What am I > doing wrong? Perhaps I need to spec the namespace to the parser? Yes, exactly. See here: http://codespeak.net/lxml/xpathxslt.html#xpath Stefan From uraniumore238 at gmail.com Tue Jul 21 19:19:13 2009 From: uraniumore238 at gmail.com (uche) Date: Tue, 21 Jul 2009 10:19:13 -0700 (PDT) Subject: [XML-SIG] python parser project Message-ID: <6369de2b-a579-4a31-a6bf-9e627ef14b54@a37g2000prf.googlegroups.com> Hi All, I am developing a python parsing program. This program takes two inputs a comma dilimeted txt file and an xml file, which represents the structure of the datafile. I am using python minidom to read in the xml file and create a tree structure in an object file. The next thing to do is to insert the data into the respective fields of the tree. Once I am done, I'd like to send this object to an sql database. Has anyone attempted to do this ? Is there an example code online that I can reference to ? ... More specifically what code will allow me to combine the data and tree structure into a complete object that I can use to populate the sql database ? Thanks. From uraniumore238 at gmail.com Tue Jul 21 22:40:43 2009 From: uraniumore238 at gmail.com (uche) Date: Tue, 21 Jul 2009 13:40:43 -0700 (PDT) Subject: [XML-SIG] direction needed Message-ID: <389487ac-4b66-431c-b698-ef1e5f0b76ef@y4g2000prf.googlegroups.com> I have a xml file that describes the schema of a database, but this file does not the records (just the attributes column names). I have another file that has the data in a txt file. I would like to use mindom in python to combine these two files into an object file. Which will be used to store in a databse. Has anyone done this ? Is there example code out there that I can reference ? From jriveramerla at gmail.com Wed Jul 22 00:09:50 2009 From: jriveramerla at gmail.com (Jose Rivera Merla) Date: Tue, 21 Jul 2009 17:09:50 -0500 Subject: [XML-SIG] python parser project In-Reply-To: <6369de2b-a579-4a31-a6bf-9e627ef14b54@a37g2000prf.googlegroups.com> References: <6369de2b-a579-4a31-a6bf-9e627ef14b54@a37g2000prf.googlegroups.com> Message-ID: <6f495610907211509i4cdaaeb8q57fd42829b2f3690@mail.gmail.com> Hi Uche: Its my opinion that you coud do this easily with lxml for the XML part. Just Google "Python LXML" Look at this page http://codespeak.net/lxml/tutorial.html The txt file is easy to handle with the split(',') command. The thing I don't know what you are talking it's about sending the XML to a SQL Database, it's easier to handle the text file in SQL bulk insert command, etc.. Regards, Jose Rivera On Tue, Jul 21, 2009 at 12:19 PM, uche wrote: > Hi All, > > I am developing a python parsing program. This program takes two > inputs a comma dilimeted txt file and an xml file, which represents > the structure of the datafile. I am using python minidom to read in > the xml file and create a tree structure in an object file. The next > thing to do is to insert the data into the respective fields of the > tree. Once I am done, I'd like to send this object to an sql database. > Has anyone attempted to do this ? Is there an example code online that > I can reference to ? ... More specifically what code will allow me to > combine the data and tree structure into a complete object that I can > use to populate the sql database ? > > Thanks. > _______________________________________________ > XML-SIG maillist - XML-SIG at python.org > http://mail.python.org/mailman/listinfo/xml-sig > -------------- next part -------------- An HTML attachment was scrubbed... URL: