From tennis at tripit.com  Thu Jul  9 21:06:38 2009
From: tennis at tripit.com (Tennis Smith)
Date: Thu, 9 Jul 2009 12:06:38 -0700
Subject: [XML-SIG] Advice On Testing With XML
Message-ID: <e84cff280907091206o8a45927gbd70f42b17e4b2fd@mail.gmail.com>

Hi,

I'm looking for some guidence in handling a testing issue.  I'm new to
XML/XSLT, so please bear with me.

First, a little background.  My charter is to generate XML test messages to
make sure we process them correctly.  These messages are validated against a
schema.  I'm using generateDS to generate the test messages.  This ensures
the xml is correct.

Everything works great except for one problem that keeps cropping up.  Some
elements cannot be defined easily ahead of time when generating the final
test document.

For example, a field of type "xs:date" will have to be modifed because tests
are based on a relative date, not an absolute one. That is, dates in tests
are based on things like "3 days before today".

Therefore, I'd like to figure out some way to change certain fields like
date so that I can pass a string and _still validate_ it against the
schema.  Using the example, "-3" would be passed in the date field so that
the test harness will recognize it as "today - 3 days".

Put another way, the goal is to make this:
*  <xs:element maxOccurs="1" minOccurs="0" name="date" type="xs:date"/>*
...behave like this:
 *<xs:element maxOccurs="1" minOccurs="0" name="date" type="xs:string"/>*

Naturally, I can edit and copy/paste into a completely new schema file. But
I was hoping someone could tell me if I can do some kind of XSLT or whatever
to get the same effect.

Thanks,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/xml-sig/attachments/20090709/52e2471e/attachment.htm>

From stefan_ml at behnel.de  Thu Jul  9 22:13:42 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 09 Jul 2009 22:13:42 +0200
Subject: [XML-SIG] Advice On Testing With XML
In-Reply-To: <e84cff280907091206o8a45927gbd70f42b17e4b2fd@mail.gmail.com>
References: <e84cff280907091206o8a45927gbd70f42b17e4b2fd@mail.gmail.com>
Message-ID: <4A564F76.5080700@behnel.de>

Hi,

Tennis Smith wrote:
> First, a little background.  My charter is to generate XML test messages to
> make sure we process them correctly.  These messages are validated against a
> schema.  I'm using generateDS to generate the test messages.  This ensures
> the xml is correct.

Hmm, I never (really) used generateDS. AFAIR, it generates Python objects
that you work with. Does it validate their structure while you do so? Or
did you refer to the schema validation that "ensures" the message correctness?


> Everything works great except for one problem that keeps cropping up.  Some
> elements cannot be defined easily ahead of time when generating the final
> test document.
> 
> For example, a field of type "xs:date" will have to be modifed because tests
> are based on a relative date, not an absolute one. That is, dates in tests
> are based on things like "3 days before today".
> 
> Therefore, I'd like to figure out some way to change certain fields like
> date so that I can pass a string and _still validate_ it against the
> schema.  Using the example, "-3" would be passed in the date field so that
> the test harness will recognize it as "today - 3 days".

Why can't you just write the corresponding date into the messages when you
generate them?


> Put another way, the goal is to make this:
> *  <xs:element maxOccurs="1" minOccurs="0" name="date" type="xs:date"/>*
> ...behave like this:
>  *<xs:element maxOccurs="1" minOccurs="0" name="date" type="xs:string"/>*
> 
> Naturally, I can edit and copy/paste into a completely new schema file. But
> I was hoping someone could tell me if I can do some kind of XSLT or whatever
> to get the same effect.

I'd just change the schema on the way in. You didn't say what tool you use
for validation, but at least in lxml, modifying the schema tree is pretty
trivial. You can simply use XPath to find all date types and then fix their
type attribute.

Stefan


From bigotp at acm.org  Fri Jul 10 00:46:50 2009
From: bigotp at acm.org (Peter A. Bigot)
Date: Thu, 09 Jul 2009 17:46:50 -0500
Subject: [XML-SIG] Advice On Testing With XML
In-Reply-To: <e84cff280907091206o8a45927gbd70f42b17e4b2fd@mail.gmail.com>
References: <e84cff280907091206o8a45927gbd70f42b17e4b2fd@mail.gmail.com>
Message-ID: <4A56735A.6090006@acm.org>

I don't grasp exactly what you're trying to do, but if you need a 
program that generates XML documents that conform to a schema for which 
date values are relative to today, I agree having the harness write the 
older date seems to make sense.

If generateDS doesn't fully support all the XML Schema date types, you 
could do that using PyXB with a program like this:

  import schema
  import pyxb.binding.datatypes as xsd
  import datetime

  delta = xsd.duration('P3D')

  s = schema.instance()
  s.setElt(datetime.date.today() - delta)
  print s.toxml()

with output:

 <?xml version="1.0" ?><instance><elt>2009-07-06</elt></instance>

assuming the schema is:

  <?xml version="1.0" encoding="UTF-8"?>
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:element name="instance" type="structure"/>
    <xs:complexType name="structure">
      <xs:sequence>
        <xs:element name="elt" minOccurs="0" type="xs:date"/>
      </xs:sequence>
    </xs:complexType>
  </xs:schema>

PyXB (see http://pyxb.sourceforge.net) is definitely beta software, but 
it's coming along nicely.  It makes a strong effort to validate the data 
written into the binding instances (in fact, a weakness is that you 
can't stop it from trying to validate).  It can also handle very complex 
schemas, such as those from OpenGIS.

If you really need to change the type of an element in a complex type at 
runtime, it could be done by generating a customized binding (though 
you'd have to modify the runtime support class 
pyxb.binding.basis.element to allow this particular kind of customization).

Peter

Tennis Smith wrote:
> Hi,
>
> I'm looking for some guidence in handling a testing issue.  I'm new to 
> XML/XSLT, so please bear with me.
>
> First, a little background.  My charter is to generate XML test 
> messages to make sure we process them correctly.  These messages are 
> validated against a schema.  I'm using generateDS to generate the test 
> messages.  This ensures the xml is correct.
>
> Everything works great except for one problem that keeps cropping up.  
> Some elements cannot be defined easily ahead of time when generating 
> the final test document. 
>
> For example, a field of type "xs:date" will have to be modifed because 
> tests are based on a relative date, not an absolute one. That is, 
> dates in tests are based on things like "3 days before today".
>
> Therefore, I'd like to figure out some way to change certain fields 
> like date so that I can pass a string and _still validate_ it against 
> the schema.  Using the example, "-3" would be passed in the date field 
> so that the test harness will recognize it as "today - 3 days". 
>
> Put another way, the goal is to make this:
> /  <xs:element maxOccurs="1" minOccurs="0" name="date" type="xs:date"/>/
> ...behave like this:
>  /<xs:element maxOccurs="1" minOccurs="0" name="date" type="xs:string"/>/
>
> Naturally, I can edit and copy/paste into a completely new schema 
> file. But I was hoping someone could tell me if I can do some kind of 
> XSLT or whatever to get the same effect.
>
> Thanks,
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> XML-SIG maillist  -  XML-SIG at python.org
> http://mail.python.org/mailman/listinfo/xml-sig
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/xml-sig/attachments/20090709/45badf10/attachment.htm>

From evdo.hsdpa at gmail.com  Fri Jul 10 01:49:08 2009
From: evdo.hsdpa at gmail.com (Robert Kim Wireless Internet Advisor)
Date: Thu, 9 Jul 2009 16:49:08 -0700
Subject: [XML-SIG] Advice On Testing With XML
In-Reply-To: <e84cff280907091206o8a45927gbd70f42b17e4b2fd@mail.gmail.com>
References: <e84cff280907091206o8a45927gbd70f42b17e4b2fd@mail.gmail.com>
Message-ID: <1ec620e90907091649odb2872dj494d7503ab134ca0@mail.gmail.com>

Are you guys on twitter? whats your twitter address?
im @journik

On Thu, Jul 9, 2009 at 12:06 PM, Tennis Smith<tennis at tripit.com> wrote:
> Hi,
>
> I'm looking for some guidence in handling a testing issue.? I'm new to
> XML/XSLT, so please bear with me.
>
> First, a little background.? My charter is to generate XML test messages to
> make sure we process them correctly.? These messages are validated against a
> schema.? I'm using generateDS to generate the test messages.? This ensures
> the xml is correct.
>
> Everything works great except for one problem that keeps cropping up.? Some
> elements cannot be defined easily ahead of time when generating the final
> test document.
>
> For example, a field of type "xs:date" will have to be modifed because tests
> are based on a relative date, not an absolute one. That is, dates in tests
> are based on things like "3 days before today".
>
> Therefore, I'd like to figure out some way to change certain fields like
> date so that I can pass a string and _still validate_ it against the
> schema.? Using the example, "-3" would be passed in the date field so that
> the test harness will recognize it as "today - 3 days".
>
> Put another way, the goal is to make this:
> ? <xs:element maxOccurs="1" minOccurs="0" name="date" type="xs:date"/>
> ...behave like this:
> ?<xs:element maxOccurs="1" minOccurs="0" name="date" type="xs:string"/>
>
> Naturally, I can edit and copy/paste into a completely new schema file. But
> I was hoping someone could tell me if I can do some kind of XSLT or whatever
> to get the same effect.
>
> Thanks,
>
>
> _______________________________________________
> XML-SIG maillist ?- ?XML-SIG at python.org
> http://mail.python.org/mailman/listinfo/xml-sig
>
>


-- 
Robert Q Kim, Wireless Internet Provider
http://journik.com
http://journik.posterous.com
http://twitter.com/journik

From smcg4191 at frii.com  Sun Jul 12 21:35:10 2009
From: smcg4191 at frii.com (Stuart McGraw)
Date: Sun, 12 Jul 2009 13:35:10 -0600
Subject: [XML-SIG] my own entity defs when parsing with etree?
Message-ID: <4A5A3AEE.3040109@frii.com>

Hello,

I could use some really basic help about using Etree.
I have tried reading the etree and expat doc but I
don't understand most of it.

I have an xml file that contains a dtd that defines a 
number of entities that are subsequently referenced 
in the xml. 

What I would like to do:

1) Parse the xml file but override some or all of the 
entity definitions in the dtd with my own definitions.

2) Parse strings containing elements extracted from
the full xml file, without the dtd, and supplying my 
own entity map to resolve any entities.

I am nearly clueless when it comes to xml processesing
so if I could get a code snippet illustrating how to 
do the above, that would be wonderful!  I am currently 
using the stock Python 2.6 elementTree, but could 
switch to lxml's if that would help.

From stefan_ml at behnel.de  Sun Jul 12 22:27:45 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sun, 12 Jul 2009 22:27:45 +0200
Subject: [XML-SIG] my own entity defs when parsing with etree?
In-Reply-To: <4A5A3AEE.3040109@frii.com>
References: <4A5A3AEE.3040109@frii.com>
Message-ID: <4A5A4741.7040105@behnel.de>


Stuart McGraw wrote:
> I could use some really basic help about using Etree.
> I have tried reading the etree and expat doc but I
> don't understand most of it.

In that case, you should read up on XML in general first. The Wikipedia
article isn't all that bad:

http://en.wikipedia.org/wiki/XML


> I have an xml file that contains a dtd that defines a 
> number of entities that are subsequently referenced 
> in the xml. 
> 
> What I would like to do:
> 
> 1) Parse the xml file but override some or all of the 
> entity definitions in the dtd with my own definitions.
> 
> 2) Parse strings containing elements extracted from
> the full xml file, without the dtd, and supplying my 
> own entity map to resolve any entities.

http://effbot.org/elementtree/elementtree-xmlparser.htm#tag-ET.XMLParser.entity


> I am nearly clueless when it comes to xml processesing
> so if I could get a code snippet illustrating how to 
> do the above, that would be wonderful!  I am currently 
> using the stock Python 2.6 elementTree, but could 
> switch to lxml's if that would help.

ElementTree (i.e. the xml.etree package) does not supports DTDs at all. If
you want to use DTDs, e.g. to do validation, to inject default attributes,
or to resolve entity references, you can switch to the external lxml.etree
package. Note, however, that lxml does not support the ".entity" dictionary
on parsers. It doesn't currently have a way to override entity definitions
outside of a DTD.

Stefan

From joshua.r.english at gmail.com  Mon Jul 13 02:24:25 2009
From: joshua.r.english at gmail.com (Josh English)
Date: Sun, 12 Jul 2009 17:24:25 -0700
Subject: [XML-SIG] my own entity defs when parsing with etree?
In-Reply-To: <4A5A4741.7040105@behnel.de>
References: <4A5A3AEE.3040109@frii.com> <4A5A4741.7040105@behnel.de>
Message-ID: <e53a3a5d0907121724m133940a8scceeb820128586e1@mail.gmail.com>

I gave up on Entities ages ago, but thought I'd try it after seeing your link.

I tried this simple code:

from elementtree import ElementTree as ET

p = ET.XMLParser()

p.entity["me"] = "Josh"

text = """<test>&me;</test>"""

p.feed(text)

e = p.close()

print e
ET.dump(e)

And got an error:

>pythonw -u "ETParserWithEntities.py"
Traceback (most recent call last):
  File "ETParserWithEntities.py", line 9, in <module>
    p.feed(text)
  File "C:\Python26\lib\site-packages\elementtree\ElementTree.py",
line 1524, in feed
    self._raiseerror(v)
  File "C:\Python26\lib\site-packages\elementtree\ElementTree.py",
line 1426, in _raiseerror
    raise err
elementtree.ElementTree.ParseError: undefined entity: line 1, column 6
>Exit code: 1


As far as I can tell, the XMLParser is using pyexpat, which only comes
as a .pyd file, so I can't look into this.

Any ideas?

Windows XP, Python 2.6, elementtree 1v3a2

Josh English


-- 
Josh English
Joshua.R.English at gmail.com
http://joshenglish.livejournal.com

From stefan_ml at behnel.de  Mon Jul 13 08:08:37 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 13 Jul 2009 08:08:37 +0200
Subject: [XML-SIG] my own entity defs when parsing with etree?
In-Reply-To: <e53a3a5d0907121724m133940a8scceeb820128586e1@mail.gmail.com>
References: <4A5A3AEE.3040109@frii.com> <4A5A4741.7040105@behnel.de>
	<e53a3a5d0907121724m133940a8scceeb820128586e1@mail.gmail.com>
Message-ID: <4A5ACF65.5040404@behnel.de>

Hi,

Josh English wrote:
> I gave up on Entities ages ago, but thought I'd try it after seeing your link.
> 
> I tried this simple code:
> 
> from elementtree import ElementTree as ET
> 
> p = ET.XMLParser()
> 
> p.entity["me"] = "Josh"
> 
> text = """<test>&me;</test>"""
> 
> p.feed(text)
> 
> e = p.close()
> 
> print e
> ET.dump(e)
> 
> And got an error:
> 
>> pythonw -u "ETParserWithEntities.py"
> Traceback (most recent call last):
>   File "ETParserWithEntities.py", line 9, in <module>
>     p.feed(text)
>   File "C:\Python26\lib\site-packages\elementtree\ElementTree.py",
> line 1524, in feed
>     self._raiseerror(v)
>   File "C:\Python26\lib\site-packages\elementtree\ElementTree.py",
> line 1426, in _raiseerror
>     raise err
> elementtree.ElementTree.ParseError: undefined entity: line 1, column 6
>> Exit code: 1

Interesting. I just tried and got the same result. I guess I never even
tried to do this, given that I knew lxml won't support it anyway...

Without debugging into this, it seems that expat raises that exception
before ElementTree even gets to handle the unknown entity.

I just found this post, but didn't try it:

http://mail.python.org/pipermail/python-list/2007-April/607256.html

Stefan

From tennis at tripit.com  Thu Jul  9 23:20:53 2009
From: tennis at tripit.com (Tennis Smith)
Date: Thu, 9 Jul 2009 14:20:53 -0700
Subject: [XML-SIG] Advice On Testing With XML
In-Reply-To: <4A564F76.5080700@behnel.de>
References: <e84cff280907091206o8a45927gbd70f42b17e4b2fd@mail.gmail.com> 
	<4A564F76.5080700@behnel.de>
Message-ID: <e84cff280907091420t566f97b4jc89ddccdf0dab331@mail.gmail.com>

On Thu, Jul 9, 2009 at 1:13 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:

> Hi,
>
> Tennis Smith wrote:
> > First, a little background.  My charter is to generate XML test messages
> to
> > make sure we process them correctly.  These messages are validated
> against a
> > schema.  I'm using generateDS to generate the test messages.  This
> ensures
> > the xml is correct.
>
> Hmm, I never (really) used generateDS. AFAIR, it generates Python objects
> that you work with. Does it validate their structure while you do so? Or
> did you refer to the schema validation that "ensures" the message
> correctness?


genDS ensures correctness because there are several layers of object types
cascaded in the schema.  Since genDS creates wrappers for all these, it
makes creating schema-compliant objects really easy.


>
>
> > Everything works great except for one problem that keeps cropping up.
>  Some
> > elements cannot be defined easily ahead of time when generating the final
> > test document.
> >
> > For example, a field of type "xs:date" will have to be modifed because
> tests
> > are based on a relative date, not an absolute one. That is, dates in
> tests
> > are based on things like "3 days before today".
> >
> > Therefore, I'd like to figure out some way to change certain fields like
> > date so that I can pass a string and _still validate_ it against the
> > schema.  Using the example, "-3" would be passed in the date field so
> that
> > the test harness will recognize it as "today - 3 days".
>
> Why can't you just write the corresponding date into the messages when you
> generate them?


The messages are generated long before they are actually transmitted.  There
are literally thousands of tests which are created this way.  After
generation, they're stored in svn and then used much later.

> Put another way, the goal is to make this:
> *  <xs:element maxOccurs="1" minOccurs="0" name="date" type="xs:date"/>*
> ...behave like this:
>  *<xs:element maxOccurs="1" minOccurs="0" name="date" type="xs:string"/>*
>
> Naturally, I can edit and copy/paste into a completely new schema file.
But
> I was hoping someone could tell me if I can do some kind of XSLT or
whatever
> to get the same effect.

I'd just change the schema on the way in. You didn't say what tool you use
> for validation, but at least in lxml, modifying the schema tree is pretty
> trivial. You can simply use XPath to find all date types and then fix their
> type attribute.


The tool I'm using is etree.  That's a great suggestion concerning xpath.
That sounds pretty easy.

Thanks, Stefan!


>
>
> Stefan
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/xml-sig/attachments/20090709/f9a3559b/attachment.htm>

From sklein at cpcug.org  Wed Jul 15 17:37:43 2009
From: sklein at cpcug.org (Stanley A. Klein)
Date: Wed, 15 Jul 2009 11:37:43 -0400 (EDT)
Subject: [XML-SIG] Is anyone implementing EXI in Python?
Message-ID: <49531.207.188.248.157.1247672263.squirrel@www.cpcug.org>

Efficient XML Interchange (EXI) is moving toward adoption by W3C.  It
provides a format for efficiently representing XML documents with
schema-informed and schema-less modes.

There is an open-source Java implementation available.

Is anyone working to implement EXI in Python?


Stan Klein


From ht at inf.ed.ac.uk  Wed Jul 15 19:37:34 2009
From: ht at inf.ed.ac.uk (Henry S. Thompson)
Date: Wed, 15 Jul 2009 18:37:34 +0100
Subject: [XML-SIG] Is anyone implementing EXI in Python?
In-Reply-To: <49531.207.188.248.157.1247672263.squirrel@www.cpcug.org>
	(Stanley A. Klein's message of "Wed,
	15 Jul 2009 11:37:43 -0400 (EDT)")
References: <49531.207.188.248.157.1247672263.squirrel@www.cpcug.org>
Message-ID: <f5btz1dn8yp.fsf@hildegard.inf.ed.ac.uk>

Stanley A. Klein writes:

> Efficient XML Interchange (EXI) is moving toward adoption by W3C.  It
> provides a format for efficiently representing XML documents with
> schema-informed and schema-less modes.
>
> There is an open-source Java implementation available.
>
> Is anyone working to implement EXI in Python?

Don't get me wrong, I think EXI is useful, in the right places, but,
could I ask, why would you want to implement it in Python?  I'd be
very surprised if any Python XML application is spending anything like
enough time in the raw parsing activity (as opposed to the
structure-building activity) to make the marginal gain you might get
from EXI worth it. . .

EXI is, IMO, for closely coupled systems in particular messaging
environments where every bit counts, and I guess I'm having difficulty
imagining Python in such a context. . .

ht
-- 
       Henry S. Thompson, School of Informatics, University of Edinburgh
                         Half-time member of W3C Team
      10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
                Fax: (44) 131 651-1426, e-mail: ht at inf.ed.ac.uk
                       URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]

From sklein at cpcug.org  Wed Jul 15 21:51:12 2009
From: sklein at cpcug.org (Stanley A. Klein)
Date: Wed, 15 Jul 2009 15:51:12 -0400 (EDT)
Subject: [XML-SIG] Is anyone implementing EXI in Python?
In-Reply-To: <f5btz1dn8yp.fsf@hildegard.inf.ed.ac.uk>
References: <49531.207.188.248.157.1247672263.squirrel@www.cpcug.org>
	<f5btz1dn8yp.fsf@hildegard.inf.ed.ac.uk>
Message-ID: <51461.71.163.219.209.1247687472.squirrel@www.cpcug.org>

EXI is for data interchange.  That can mean messaging or document/data
storage.  SOAP messages are very verbose, and SOAP messaging can benefit
from EXI, especially if the communications channels have bandwidth or
transit time considerations.  SOAP is increasingly being considered in a
variety of control system applications for which Python makes sense as an
implementation language.  Similarly, scientific applications involving
large amounts of XML-formatted data could benefit from EXI in storing the
data or interchanging it for purposes such as grid processing.

The original application that contributed the technology for EXI was
sending web pages to cell phones.

In general, any applications implemented in Python that involves messaging
or data storage with either bandwidth or storage volume concerns could
benefit from EXI.  And as best I know there are a growing number of such
applications implemented in Python.

Also, why would Java make sense and Python not?


Stan Klein


On Wed, July 15, 2009 1:37 pm, Henry S. Thompson wrote:
> Stanley A. Klein writes:
>
>> Efficient XML Interchange (EXI) is moving toward adoption by W3C.  It
>> provides a format for efficiently representing XML documents with
>> schema-informed and schema-less modes.
>>
>> There is an open-source Java implementation available.
>>
>> Is anyone working to implement EXI in Python?
>
> Don't get me wrong, I think EXI is useful, in the right places, but,
> could I ask, why would you want to implement it in Python?  I'd be
> very surprised if any Python XML application is spending anything like
> enough time in the raw parsing activity (as opposed to the
> structure-building activity) to make the marginal gain you might get
> from EXI worth it. . .
>
> EXI is, IMO, for closely coupled systems in particular messaging
> environments where every bit counts, and I guess I'm having difficulty
> imagining Python in such a context. . .
>
> ht
> --
>        Henry S. Thompson, School of Informatics, University of Edinburgh
>                          Half-time member of W3C Team
>       10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
>                 Fax: (44) 131 651-1426, e-mail: ht at inf.ed.ac.uk
>                        URL: http://www.ltg.ed.ac.uk/~ht/
> [mail really from me _always_ has this .sig -- mail without it is forged
> spam]
>


-- 


From stefan_ml at behnel.de  Wed Jul 15 22:26:57 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 15 Jul 2009 22:26:57 +0200
Subject: [XML-SIG] Is anyone implementing EXI in Python?
In-Reply-To: <51461.71.163.219.209.1247687472.squirrel@www.cpcug.org>
References: <49531.207.188.248.157.1247672263.squirrel@www.cpcug.org>	<f5btz1dn8yp.fsf@hildegard.inf.ed.ac.uk>
	<51461.71.163.219.209.1247687472.squirrel@www.cpcug.org>
Message-ID: <4A5E3B91.4070401@behnel.de>

Hi,

Stanley A. Klein wrote:
> On Wed, July 15, 2009 1:37 pm, Henry S. Thompson wrote:
>> Stanley A. Klein writes:
>>
>>> Efficient XML Interchange (EXI) is moving toward adoption by W3C.  It
>>> provides a format for efficiently representing XML documents with
>>> schema-informed and schema-less modes.
>>>
>>> There is an open-source Java implementation available.
>>>
>>> Is anyone working to implement EXI in Python?
>>
>> Don't get me wrong, I think EXI is useful, in the right places, but,
>> could I ask, why would you want to implement it in Python?  I'd be
>> very surprised if any Python XML application is spending anything like
>> enough time in the raw parsing activity (as opposed to the
>> structure-building activity) to make the marginal gain you might get
>> from EXI worth it. . .
>>
>> EXI is, IMO, for closely coupled systems in particular messaging
>> environments where every bit counts, and I guess I'm having difficulty
>> imagining Python in such a context. . .
>
> EXI is for data interchange.  That can mean messaging or document/data
> storage.  SOAP messages are very verbose, and SOAP messaging can benefit
> from EXI, especially if the communications channels have bandwidth or
> transit time considerations.
>
> SOAP is increasingly being considered in a
> variety of control system applications for which Python makes sense as an
> implementation language.  Similarly, scientific applications involving
> large amounts of XML-formatted data could benefit from EXI in storing the
> data or interchanging it for purposes such as grid processing.
>
> The original application that contributed the technology for EXI was
> sending web pages to cell phones.
>
> In general, any applications implemented in Python that involves
> messaging
> or data storage with either bandwidth or storage volume concerns could
> benefit from EXI.  And as best I know there are a growing number of such
> applications implemented in Python.

Any XML transmission or storage can benefit from *compression*, often
shrinking the data volume by factors up to 100. I doubt that the savings of
EXI are sufficiently large compared to a well compressed XML stream that
they compensate for the drawbacks of yet another new non-readable format.

A well chosen compression method is a lot better suited to such
applications and is already supported by most available XML parsers (or
rather outside of the parsers themselves, which is a huge advantage).


> Also, why would Java make sense and Python not?

Because pretty much all XML technologies come from the Java environment?
That doesn't mean that Java is a suitable language for working with them.
It only means that it supports them because Java is used for developing
them (often as a reference implementation).

Stefan

From sklein at cpcug.org  Thu Jul 16 20:34:45 2009
From: sklein at cpcug.org (Stanley A. Klein)
Date: Thu, 16 Jul 2009 14:34:45 -0400 (EDT)
Subject: [XML-SIG] Is anyone implementing EXI in Python?
In-Reply-To: <4A5E3B91.4070401@behnel.de>
References: <49531.207.188.248.157.1247672263.squirrel@www.cpcug.org> 
	<f5btz1dn8yp.fsf@hildegard.inf.ed.ac.uk> 
	<51461.71.163.219.209.1247687472.squirrel@www.cpcug.org> 
	<4A5E3B91.4070401@behnel.de>
Message-ID: <47353.207.188.248.157.1247769285.squirrel@www.cpcug.org>

On Wed, 2009-07-15 at 22:26 +0200, Stefan Behnel wrote:
> Hi,
>
> Stanley A. Klein wrote:
> > On Wed, July 15, 2009 1:37 pm, Henry S. Thompson wrote:
> >> Stanley A. Klein writes:
> >>
> >>> Efficient XML Interchange (EXI) is moving toward adoption by W3C.
It
> >>> provides a format for efficiently representing XML documents with
schema-informed and schema-less modes.
> >>>
> >>> There is an open-source Java implementation available.
> >>>
> >>> Is anyone working to implement EXI in Python?
> >>
> >> Don't get me wrong, I think EXI is useful, in the right places, but,
could I ask, why would you want to implement it in Python?  I'd be
very surprised if any Python XML application is spending anything
like
> >> enough time in the raw parsing activity (as opposed to the
> >> structure-building activity) to make the marginal gain you might get
from EXI worth it. . .
> >>
> >> EXI is, IMO, for closely coupled systems in particular messaging
environments where every bit counts, and I guess I'm having
difficulty
> >> imagining Python in such a context. . .
> >
> > EXI is for data interchange.  That can mean messaging or document/data
storage.  SOAP messages are very verbose, and SOAP messaging can
benefit
> > from EXI, especially if the communications channels have bandwidth or
transit time considerations.
> >
> > SOAP is increasingly being considered in a
> > variety of control system applications for which Python makes sense as
an
> > implementation language.  Similarly, scientific applications involving
large amounts of XML-formatted data could benefit from EXI in storing
the
> > data or interchanging it for purposes such as grid processing.
> >
> > The original application that contributed the technology for EXI was
sending web pages to cell phones.
> >
> > In general, any applications implemented in Python that involves
messaging
> > or data storage with either bandwidth or storage volume concerns could
benefit from EXI.  And as best I know there are a growing number of
such
> > applications implemented in Python.
>
> Any XML transmission or storage can benefit from *compression*, often
shrinking the data volume by factors up to 100. I doubt that the savings
of EXI are sufficiently large compared to a well compressed XML stream
that they compensate for the drawbacks of yet another new non-readable
format.
>
> A well chosen compression method is a lot better suited to such
> applications and is already supported by most available XML parsers (or
rather outside of the parsers themselves, which is a huge advantage).
>
>
> > Also, why would Java make sense and Python not?
>
> Because pretty much all XML technologies come from the Java environment?
That doesn't mean that Java is a suitable language for working with
them.
> It only means that it supports them because Java is used for developing
them (often as a reference implementation).
>
> Stefan


It depends on the nature of the XML application.  One feature of EXI is to
support representation of numeric data as bits rather than characters. 
That is very useful in appropriate applications.  There is a measurements
document that shows the compression that was achieved on a wide variety of
test cases.  Straight use of a common compression algorithm does not
necessarily achieve the best results.  Besides, EXI incorporates elements
of common compression algorithm(s) as both a fallback for its schema-less
mode and an additional capability in its schema-informed mode.

EXI is intended for use outboard of the parser, and that would apply
equally well to a Python version.  For example, EXI gets rid of the need
to constantly resend over-the-wire all the namespace definitions with each
message.  The relevant strings would just go into the string table and get
restored from there when the message is converted back.

However, for something like SOAP in certain applications, it may be
eventually desirable to integrate the EXI implementation within the
communications system.  The message sender could reasonably create a
schema-informed EXI version without actually starting from and converting
an XML object.  The recipient would have to convert the EXI back to XML,
parse it, and use the data.

Regarding the format readability, it converts to XML and is readable
there.  Numeric data is most efficiently sent as bits, so that data is
necessarily unreadable until converted.  The value of EXI necessarily
depends on the application.


Stan Klein

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/xml-sig/attachments/20090716/adbbf6df/attachment.htm>

From stefan_ml at behnel.de  Fri Jul 17 10:06:01 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 17 Jul 2009 10:06:01 +0200
Subject: [XML-SIG] Is anyone implementing EXI in Python?
In-Reply-To: <47353.207.188.248.157.1247769285.squirrel@www.cpcug.org>
References: <49531.207.188.248.157.1247672263.squirrel@www.cpcug.org>
	<f5btz1dn8yp.fsf@hildegard.inf.ed.ac.uk>
	<51461.71.163.219.209.1247687472.squirrel@www.cpcug.org>
	<4A5E3B91.4070401@behnel.de>
	<47353.207.188.248.157.1247769285.squirrel@www.cpcug.org>
Message-ID: <4A6030E9.6010909@behnel.de>

Hi,

Stanley A. Klein wrote:
> On Wed, 2009-07-15 at 22:26 +0200, Stefan Behnel wrote:
>> A well chosen compression method is a lot better suited to such
>> applications and is already supported by most available XML parsers (or
>> rather outside of the parsers themselves, which is a huge advantage).
> 
> It depends on the nature of the XML application.  One feature of EXI is to
> support representation of numeric data as bits rather than characters. 
> That is very useful in appropriate applications.

One drawback is that this requires a schema to make sure the number of bits
is sufficient. Otherwise, you'd need to add the information how many bits
you use for their representation, which would add to the data volume.


> There is a measurements
> document that shows the compression that was achieved on a wide variety of
> test cases.  Straight use of a common compression algorithm does not
> necessarily achieve the best results.

Repetitive data like an XML byte stream compresses extremely well, though,
and the 'best' compression isn't always required anyway. I worked on a
Python SOAP application where we sent some 3MB of XML as a web service
response. That took a couple of seconds to transmit. Injecting the standard
gzip algorithm into the WSGI stack got it down to some 48KB. Nothing more
to do here.

If you need 'the best' compression, there's no way around benchmarking a
couple of different algorithms that are suitable for your application, and
choosing the one that works best for your data. That may or may not include
EXI.


> Besides, EXI incorporates elements
> of common compression algorithm(s) as both a fallback for its schema-less
> mode and an additional capability in its schema-informed mode.

Makes sense, as compression also applies to text content, for example.


> EXI is intended for use outboard of the parser, and that would apply
> equally well to a Python version.  For example, EXI gets rid of the need
> to constantly resend over-the-wire all the namespace definitions with each
> message.  The relevant strings would just go into the string table and get
> restored from there when the message is converted back.

That's how any run-length based compression algorithm works anyway. Plus,
namespace definitions usually only happen once in a document, so they are
pretty much negligible in a larger XML document.


> However, for something like SOAP in certain applications, it may be
> eventually desirable to integrate the EXI implementation within the
> communications system.  The message sender could reasonably create a
> schema-informed EXI version without actually starting from and converting
> an XML object.  The recipient would have to convert the EXI back to XML,
> parse it, and use the data.

Ok, that's where I see it, too. At the level where you'd normally apply a
compression algorithm anyway.


> Numeric data is most efficiently sent as bits

Depends on how you select the bits. When I write into my schema that I use
a 32 bit integer value in my XML, and all I really send happens to be
within [0-9] in, say, 95% of the cases with a few exceptions that really
require 32 bits, a general run-length compression algorithm will easily
beat anything that sends the value as a 4-byte sequence. That's the
advantage of general compression: it sees the real data, not only its schema.

I do not question EXI in general, I'm fine with it having its niche
(wherever that turns out to be). I'm just saying that common compression
algorithms are a lot more broadly available and achieve similar results. So
EXI is just another way of compressing XML, with the disadvantage of not
being as widely implemented. Compare it to the ubiquity of the gzip
compression algorithm, for example. It's just the usual trade-off that you
make between efficiency and cross-platform compatibility.

Stefan

From sklein at cpcug.org  Fri Jul 17 17:01:12 2009
From: sklein at cpcug.org (Stanley A. Klein)
Date: Fri, 17 Jul 2009 11:01:12 -0400 (EDT)
Subject: [XML-SIG] Is anyone implementing EXI in Python?
In-Reply-To: <4A6030E9.6010909@behnel.de>
References: <49531.207.188.248.157.1247672263.squirrel@www.cpcug.org>
	<f5btz1dn8yp.fsf@hildegard.inf.ed.ac.uk>
	<51461.71.163.219.209.1247687472.squirrel@www.cpcug.org>
	<4A5E3B91.4070401@behnel.de>
	<47353.207.188.248.157.1247769285.squirrel@www.cpcug.org>
	<4A6030E9.6010909@behnel.de>
Message-ID: <4153.207.188.248.157.1247842872.squirrel@www.cpcug.org>

I think the issue here is the nature of the data exchange.  EXI
essentially provides a compression algorithm that saves information
between instances of a message or file and can be seeded with what is
known in advance about certain characteristics of the instances.  The gzip
algorithm learns the characteristics of each instance separately from that
instance and does not retain information between instances.

If you are occasionally sending a large file, gzip makes sense.  There is
little gain from retaining information.  However, if you have frequent
small messages or separate small files based on a schema, the namespace
definitions are repeated for each instance and can take up an appreciable
fraction of what is sent over-the-wire for each instance.  There isn't
much for gzip to learn, and it has to start all over for the next
instance.  Similarly, the tags recur across instances but gzip will only
learn them as it encounters them in a particular instance.  Again, gzip
forgets between instances.

I think in the absence of prior information and when used only
occasionally (without information retention between instances), EXI
provides something close to gzip compression.  What EXI provides is a
variant of compression technology that has information retention between
instances and the ability to use prior information across instances.  In
applications with frequent repetitive data exchanges, the information
retention and ability to use prior information can provide significant
benefits.


Stan Klein


On Fri, July 17, 2009 4:06 am, Stefan Behnel wrote:
> Hi,
>
> Stanley A. Klein wrote:
>> On Wed, 2009-07-15 at 22:26 +0200, Stefan Behnel wrote:
>>> A well chosen compression method is a lot better suited to such
>>> applications and is already supported by most available XML parsers (or
>>> rather outside of the parsers themselves, which is a huge advantage).
>>
>> It depends on the nature of the XML application.  One feature of EXI is
>> to
>> support representation of numeric data as bits rather than characters.
>> That is very useful in appropriate applications.
>
> One drawback is that this requires a schema to make sure the number of
> bits
> is sufficient. Otherwise, you'd need to add the information how many bits
> you use for their representation, which would add to the data volume.
>
>
>> There is a measurements
>> document that shows the compression that was achieved on a wide variety
>> of
>> test cases.  Straight use of a common compression algorithm does not
>> necessarily achieve the best results.
>
> Repetitive data like an XML byte stream compresses extremely well, though,
> and the 'best' compression isn't always required anyway. I worked on a
> Python SOAP application where we sent some 3MB of XML as a web service
> response. That took a couple of seconds to transmit. Injecting the
> standard
> gzip algorithm into the WSGI stack got it down to some 48KB. Nothing more
> to do here.
>
> If you need 'the best' compression, there's no way around benchmarking a
> couple of different algorithms that are suitable for your application, and
> choosing the one that works best for your data. That may or may not
> include
> EXI.
>
>
>> Besides, EXI incorporates elements
>> of common compression algorithm(s) as both a fallback for its
>> schema-less
>> mode and an additional capability in its schema-informed mode.
>
> Makes sense, as compression also applies to text content, for example.
>
>
>> EXI is intended for use outboard of the parser, and that would apply
>> equally well to a Python version.  For example, EXI gets rid of the need
>> to constantly resend over-the-wire all the namespace definitions with
>> each
>> message.  The relevant strings would just go into the string table and
>> get
>> restored from there when the message is converted back.
>
> That's how any run-length based compression algorithm works anyway. Plus,
> namespace definitions usually only happen once in a document, so they are
> pretty much negligible in a larger XML document.
>
>
>> However, for something like SOAP in certain applications, it may be
>> eventually desirable to integrate the EXI implementation within the
>> communications system.  The message sender could reasonably create a
>> schema-informed EXI version without actually starting from and
>> converting
>> an XML object.  The recipient would have to convert the EXI back to XML,
>> parse it, and use the data.
>
> Ok, that's where I see it, too. At the level where you'd normally apply a
> compression algorithm anyway.
>
>
>> Numeric data is most efficiently sent as bits
>
> Depends on how you select the bits. When I write into my schema that I use
> a 32 bit integer value in my XML, and all I really send happens to be
> within [0-9] in, say, 95% of the cases with a few exceptions that really
> require 32 bits, a general run-length compression algorithm will easily
> beat anything that sends the value as a 4-byte sequence. That's the
> advantage of general compression: it sees the real data, not only its
> schema.
>
> I do not question EXI in general, I'm fine with it having its niche
> (wherever that turns out to be). I'm just saying that common compression
> algorithms are a lot more broadly available and achieve similar results.
> So
> EXI is just another way of compressing XML, with the disadvantage of not
> being as widely implemented. Compare it to the ubiquity of the gzip
> compression algorithm, for example. It's just the usual trade-off that you
> make between efficiency and cross-platform compatibility.
>
> Stefan
>


-- 


From bo.laurent at canonical.com  Tue Jul 21 10:59:31 2009
From: bo.laurent at canonical.com (Bo Laurent)
Date: Tue, 21 Jul 2009 01:59:31 -0700
Subject: [XML-SIG] help getting started with xpath
Message-ID: <D13F99B8-93B0-41A3-9419-66D67F80CABE@canonical.com>

I'm new to lxml. I've parsed a simple document, as shown below.  But I  
every simple xpath() expression I try returns empty list. What am I  
doing wrong? Perhaps I need to spec the namespace to the parser?


<?xml version="1.0" encoding="UTF-8"?>
<Package xmlns="http://soap.sforce.com/2006/04/metadata">
     <types>
         <members>*</members>
         <name>CustomObject</name>
     </types>
     <version>16.0</version>
</Package>

self.doc = etree.parse( self.package_xml_path )


(Pdb) root = self.doc.getroot()
(Pdb) root.getchildren()
[<Element {http://soap.sforce.com/2006/04/metadata}types at e1dc60>,  
<Element {http://soap.sforce.com/2006/04/metadata}version at e1dc90>]
(Pdb) root.xpath('//Package')
[]
(Pdb) root.xpath('/Package')
[]
(Pdb) root.xpath('Package')
[]
(Pdb) root.xpath('types')
[]
(Pdb) root.xpath('/types')
[]

===== environment ====
Python 2.5.2
lxml-2.2-py2.5-macosx-10.3-i386.egg
OSX 10.5.7


From stefan_ml at behnel.de  Tue Jul 21 17:27:50 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 21 Jul 2009 17:27:50 +0200
Subject: [XML-SIG] help getting started with xpath
In-Reply-To: <D13F99B8-93B0-41A3-9419-66D67F80CABE@canonical.com>
References: <D13F99B8-93B0-41A3-9419-66D67F80CABE@canonical.com>
Message-ID: <4A65DE76.7040504@behnel.de>


Bo Laurent wrote:
> I'm new to lxml. I've parsed a simple document, as shown below.  But I
> every simple xpath() expression I try returns empty list. What am I
> doing wrong? Perhaps I need to spec the namespace to the parser?

Yes, exactly. See here:

http://codespeak.net/lxml/xpathxslt.html#xpath

Stefan

From uraniumore238 at gmail.com  Tue Jul 21 19:19:13 2009
From: uraniumore238 at gmail.com (uche)
Date: Tue, 21 Jul 2009 10:19:13 -0700 (PDT)
Subject: [XML-SIG] python parser project
Message-ID: <6369de2b-a579-4a31-a6bf-9e627ef14b54@a37g2000prf.googlegroups.com>

Hi All,

I am developing a python parsing program. This program takes two
inputs a comma dilimeted txt file and an xml file, which represents
the structure of the datafile. I am using python minidom to read in
the xml file and create a tree structure in an object file. The next
thing to do is to insert the data into the respective fields of the
tree. Once I am done, I'd like to send this object to an sql database.
Has anyone attempted to do this ? Is there an example code online that
I can reference to ? ... More specifically what code will allow me to
combine the data and tree structure into a complete object that I can
use to populate the sql database ?

Thanks.

From uraniumore238 at gmail.com  Tue Jul 21 22:40:43 2009
From: uraniumore238 at gmail.com (uche)
Date: Tue, 21 Jul 2009 13:40:43 -0700 (PDT)
Subject: [XML-SIG] direction needed
Message-ID: <389487ac-4b66-431c-b698-ef1e5f0b76ef@y4g2000prf.googlegroups.com>

I have a xml file that describes the schema of a database, but this
file does not the records (just the attributes column names). I have
another file that has the data in a txt file. I would like to use
mindom in python to combine these two files into an object file. Which
will be used to store in a databse. Has anyone done this ? Is there
example code out there that I can reference ?

From jriveramerla at gmail.com  Wed Jul 22 00:09:50 2009
From: jriveramerla at gmail.com (Jose Rivera Merla)
Date: Tue, 21 Jul 2009 17:09:50 -0500
Subject: [XML-SIG] python parser project
In-Reply-To: <6369de2b-a579-4a31-a6bf-9e627ef14b54@a37g2000prf.googlegroups.com>
References: <6369de2b-a579-4a31-a6bf-9e627ef14b54@a37g2000prf.googlegroups.com>
Message-ID: <6f495610907211509i4cdaaeb8q57fd42829b2f3690@mail.gmail.com>

Hi Uche:
   Its my opinion that you coud do this easily with lxml for the XML part.
Just Google "Python LXML"

   Look at this page http://codespeak.net/lxml/tutorial.html

   The txt file is easy to handle with the split(',') command.

   The thing I don't know what you are talking it's about sending the XML to
a SQL Database, it's easier to handle the text file in SQL bulk insert
command, etc..

Regards,
Jose Rivera

On Tue, Jul 21, 2009 at 12:19 PM, uche <uraniumore238 at gmail.com> wrote:

> Hi All,
>
> I am developing a python parsing program. This program takes two
> inputs a comma dilimeted txt file and an xml file, which represents
> the structure of the datafile. I am using python minidom to read in
> the xml file and create a tree structure in an object file. The next
> thing to do is to insert the data into the respective fields of the
> tree. Once I am done, I'd like to send this object to an sql database.
> Has anyone attempted to do this ? Is there an example code online that
> I can reference to ? ... More specifically what code will allow me to
> combine the data and tree structure into a complete object that I can
> use to populate the sql database ?
>
> Thanks.
> _______________________________________________
> XML-SIG maillist  -  XML-SIG at python.org
> http://mail.python.org/mailman/listinfo/xml-sig
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/xml-sig/attachments/20090721/478bfd13/attachment.htm>