From Tim.Arnold at sas.com  Mon Jun  1 16:14:53 2009
From: Tim.Arnold at sas.com (Tim Arnold)
Date: Mon, 1 Jun 2009 10:14:53 -0400
Subject: [XML-SIG] docbook 5, lxml and rng
In-Reply-To: <4A221DFF.1080006@behnel.de>
References: <F62CC1C9DC2ABC4A8D986537A79D324F0DD361CF48@MERCMBX14.na.sas.com>
	<4A221DFF.1080006@behnel.de>
Message-ID: <F62CC1C9DC2ABC4A8D986537A79D324F0DD36923D1@MERCMBX14.na.sas.com>

> -----Original Message-----
> From: Stefan Behnel [mailto:stefan_ml at behnel.de]
> Sent: Sunday, May 31, 2009 2:05 AM
> To: Tim Arnold
> Cc: xml-sig at python.org
> Subject: Re: [XML-SIG] docbook 5, lxml and rng
> 
> Hi,
> 
> Tim Arnold wrote:
> > Hi, this is a newbie question I'm sure. I'm trying to validate an
> > example straight out of the docbook 5 documentation (example given
> > on the 'inlineequation' page). As it stands, the file doesn't pass
> > as valid.
> >
> > The code:
> > =======================================
> > from lxml import etree
> > import os
> > # RNGDIR = 'path to docbook.rng'
> > # XMLDIR = 'path to the xml file'
> > relaxng_doc = etree.parse(os.path.join(RNGDIR,'docbook.rng'))
> > relaxng = etree.RelaxNG(relaxng_doc)
> >
> > doc = etree.parse(os.path.join(XMLDIR,'myfile.xml'))
> > print relaxng.validate(doc)
> 
> What does the validator tell you why it's not considered valid? Note that
> there's a property "error_log" which returns a sequence of messages that
> were collected during validation.
> 
> http://codespeak.net/lxml/validation.html#relaxng
> 
> Stefan
> 

Thanks, I should have looked at the documentation more before posting. I see what you're talking about now and I think I might have an explanation of what's going on.
The error_log says:
---------------------
4:0:ERROR:RELAXNGV:RELAXNG_ERR_ELEMWRONG: Did not expect element para there
4:0:ERROR:RELAXNGV:RELAXNG_ERR_ELEMNAME: Expecting element example, got para
4:0:ERROR:RELAXNGV:RELAXNG_ERR_ELEMNAME: Expecting element bridgehead, got para
4:0:ERROR:RELAXNGV:RELAXNG_ERR_EXTRACONTENT: Element para has extra content: text
4:0:ERROR:RELAXNGV:RELAXNG_ERR_ELEMNAME: Expecting element annotation, got para
4:0:ERROR:RELAXNGV:RELAXNG_ERR_CONTENTVALID: Element article failed to validate content
---------------------

But my libxml2 version is 5, which I think means that schematron isn't supported. And the docbook.rng contains some embedded schematron. From the DocBook 5 documentation:
---------------------
If you want to validate against the DocBook 5 RelaxNG schema, then you have to find the right validation tool. The DocBook 5 RelaxNG schema includes embedded Schematron rules to express certain constraints on some content models. For example, a Schematron rule is added to prevent a sidebar element from containing another sidebar. For complete validation, a validator needs to check both the RelaxNG content models and the Schematron rules.
---------------------


Does that make sense?
thanks,
--Tim Arnold


From swtest123 at gmail.com  Thu Jun  4 11:29:35 2009
From: swtest123 at gmail.com (testing123 test)
Date: Thu, 4 Jun 2009 14:59:35 +0530
Subject: [XML-SIG] Regarding 2 XML Files Comparision using Python
Message-ID: <8f7c146d0906040229i2d8a0a46i70a01886d119543d@mail.gmail.com>

Hi all,
       I am prasad.I need a help to write a python script to compare two XML
Files.Is there any tutorial.Should we include any library?Please help me How
to start?

Rgds,
Prasad
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/xml-sig/attachments/20090604/31dceacd/attachment.htm>

From stefan_ml at behnel.de  Thu Jun  4 13:35:16 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 4 Jun 2009 13:35:16 +0200 (CEST)
Subject: [XML-SIG] Regarding 2 XML Files Comparision using Python
In-Reply-To: <8f7c146d0906040229i2d8a0a46i70a01886d119543d@mail.gmail.com>
References: <8f7c146d0906040229i2d8a0a46i70a01886d119543d@mail.gmail.com>
Message-ID: <13f45ea2de40bad6ab3039730ad7442a.squirrel@groupware.dvs.informatik.tu-darmstadt.de>

testing123 test wrote:
> Hi all,
>   I am prasad.I need a help to write a python script to compare two XML
> Files.Is there any tutorial.Should we include any library?Please help me
> How to start?

... by looking at the Python package index?

If your XML files are small, you may get away with the xmldiff package.

Also, a very simple way to do that is to pretty print your XML files and
then run a normal line diff on them. Depends on what you want to achieve
with your 'script'.

If you need more than that and want to implement it in Python, you may
consider using lxml (or cElementTree if you can afford to ignore comments)
to parse the two files and then run through the two trees to look for
differences. But note that this is not trivial. There is some scientific
literature on good algorithms to compare XML tree structures.

Note that lxml.html comes with an HTML diff algorithm, which you can look
at for inspiration.

Stefan


From stefan_ml at behnel.de  Sat Jun  6 17:29:39 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sat, 06 Jun 2009 17:29:39 +0200
Subject: [XML-SIG] docbook 5, lxml and rng
In-Reply-To: <F62CC1C9DC2ABC4A8D986537A79D324F0DD36923D1@MERCMBX14.na.sas.com>
References: <F62CC1C9DC2ABC4A8D986537A79D324F0DD361CF48@MERCMBX14.na.sas.com>	<4A221DFF.1080006@behnel.de>
	<F62CC1C9DC2ABC4A8D986537A79D324F0DD36923D1@MERCMBX14.na.sas.com>
Message-ID: <4A2A8B63.2040205@behnel.de>


Tim Arnold wrote:
> my libxml2 version is 5, which I think means that schematron isn't
> supported. And the docbook.rng contains some embedded schematron. From
> the DocBook 5 documentation:
>
> ---------------------
> If you want to validate against the DocBook 5 RelaxNG schema, then you
> have to find the right validation tool. The DocBook 5 RelaxNG schema
> includes embedded Schematron rules to express certain constraints on
> some content models. For example, a Schematron rule is added to prevent
> a sidebar element from containing another sidebar. For complete
> validation, a validator needs to check both the RelaxNG content models
> and the Schematron rules.
> ---------------------

Yes, it looks like libxml2 can't handle Schematron annotations that are
embedded in RelaxNG schemas, even if both languages are supported separately.

Stefan

From billk at sunflower.com  Sat Jun  6 23:08:27 2009
From: billk at sunflower.com (Bill Kinnersley)
Date: Sat, 06 Jun 2009 16:08:27 -0500
Subject: [XML-SIG] docbook 5, lxml and rng
In-Reply-To: <4A2A8B63.2040205@behnel.de>
References: <F62CC1C9DC2ABC4A8D986537A79D324F0DD361CF48@MERCMBX14.na.sas.com>	<4A221DFF.1080006@behnel.de>	<F62CC1C9DC2ABC4A8D986537A79D324F0DD36923D1@MERCMBX14.na.sas.com>
	<4A2A8B63.2040205@behnel.de>
Message-ID: <4A2ADACB.3010000@sunflower.com>

Stefan Behnel wrote:
> Tim Arnold wrote:
>> my libxml2 version is 5, which I think means that schematron isn't
>> supported. And the docbook.rng contains some embedded schematron. From
>> the DocBook 5 documentation:
>>
>> ---------------------
>> If you want to validate against the DocBook 5 RelaxNG schema, then you
>> have to find the right validation tool. The DocBook 5 RelaxNG schema
>> includes embedded Schematron rules to express certain constraints on
>> some content models. For example, a Schematron rule is added to prevent
>> a sidebar element from containing another sidebar. For complete
>> validation, a validator needs to check both the RelaxNG content models
>> and the Schematron rules.
>> ---------------------
> 
> Yes, it looks like libxml2 can't handle Schematron annotations that are
> embedded in RelaxNG schemas, even if both languages are supported separately.

Doesn't that just mean it skips over them?  I don't see how the 
error_log entries Tim was getting would implicate Schematron.

Anyway, the RelaxNG specification for Docbook, I believe, is still quite 
experimental.  Both jing and trang choke on it, so perhaps libxml2 may 
be forgiven for choking also.


From AndiDog at web.de  Wed Jun 17 16:53:15 2009
From: AndiDog at web.de (Andreas Sommer)
Date: Wed, 17 Jun 2009 15:53:15 +0100
Subject: [XML-SIG] XSLT 2.0 implementation in Python?
Message-ID: <4A39035B.8050506@web.de>

Hi,

I just wanted to ask if there's any Python XML implementation which 
supports XSLT 2.0 (e.g. <xsl:analyte-text>). The only thing I found was 
Saxon, but it's only for Java/.NET (and I don't want to use Jython).

Cheers
 Andreas

From stefan_ml at behnel.de  Wed Jun 17 18:02:05 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 17 Jun 2009 18:02:05 +0200
Subject: [XML-SIG] XSLT 2.0 implementation in Python?
In-Reply-To: <4A39035B.8050506@web.de>
References: <4A39035B.8050506@web.de>
Message-ID: <4A39137D.1060808@behnel.de>

Hi,

Andreas Sommer wrote:
> I just wanted to ask if there's any Python XML implementation which
> supports XSLT 2.0 (e.g. <xsl:analyte-text>). The only thing I found was
> Saxon, but it's only for Java/.NET (and I don't want to use Jython).

This is a bit of a FAQ. You may want to search the list archives for some
answers.

Stefan


From csad7 at t-online.de  Thu Jun 18 12:38:15 2009
From: csad7 at t-online.de (Christof Hoeke)
Date: Thu, 18 Jun 2009 12:38:15 +0200
Subject: [XML-SIG] XSLT 2.0 implementation in Python?
Message-ID: <4A3A1917.3060301@t-online.de>

 > I just wanted to ask if there's any Python XML implementation which
 > supports XSLT 2.0 (e.g. <xsl:analyte-text>). The only thing I found was
 > Saxon, but it's only for Java/.NET (and I don't want to use Jython).

I have looked for a native Python implementation for some time now but 
no chance it seems. With Jython 2.5final out Saxon is an alternative (I 
currently try to use e.g. web.py with it to be able to use XSLT 2 for 
web site templating). You could also try Saxon with IronPython, should 
work but I have not tried it yet.

Only option to use Java/Saxon via Python would to be call Saxon a an os 
command and pipe the result back to your Python program. Does work but 
you still need Java in addition to Python but at least write your 
program in (C)Python. Also you cannot transform any e.g. lxml tree 
directly, you would have to reserialize any XML.

But if you find anything let me know!

Chris

From stefan_ml at behnel.de  Thu Jun 18 14:54:02 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 18 Jun 2009 14:54:02 +0200 (CEST)
Subject: [XML-SIG] XSLT 2.0 implementation in Python?
In-Reply-To: <4A3A1917.3060301@t-online.de>
References: <4A3A1917.3060301@t-online.de>
Message-ID: <4d06f0bd68dce9b32ba723764c058c90.squirrel@groupware.dvs.informatik.tu-darmstadt.de>

Christof Hoeke wrote:
>> I just wanted to ask if there's any Python XML implementation which
>> supports XSLT 2.0 (e.g. <xsl:analyte-text>). The only thing I found was
>> Saxon, but it's only for Java/.NET (and I don't want to use Jython).
>
> Only option to use Java/Saxon via Python would to be call Saxon a an os
> command and pipe the result back to your Python program.

Sounds awfully slow, given the startup time of the average JVM, plus the
time it takes hotspot to heat up.

There's also JPype, GCJ or JCC if running Java is an option, see e.g.

http://ubuntuforums.org/archive/index.php/t-593327.html

http://pypi.python.org/pypi/JCC/
http://jpype.sourceforge.net/


> you still need Java in addition to Python but at least write your
> program in (C)Python. Also you cannot transform any e.g. lxml tree
> directly, you would have to reserialize any XML.

Should I say it? Serialisation and parsing are *fast* in lxml - don't know
about Saxon in Java, though. But given that both XSLT input and output can
be streamed, the I/O performance might not be that much of a problem
either (assuming large documents). Benchmarks will tell.

You could also write an HTTP based transformation service in Jython that
calls Saxon, and just run it in a permanently running JVM.

Stefan


From bigotp at acm.org  Sun Jun 21 02:22:20 2009
From: bigotp at acm.org (Peter A. Bigot)
Date: Sat, 20 Jun 2009 19:22:20 -0500
Subject: [XML-SIG] Python Bindings to XML Schema system released
Message-ID: <4A3D7D3C.4040508@acm.org>

PyXB ("pixbee") is a pure Python package that generates Python source 
code for classes that correspond to data structures defined by 
XMLSchema.  The generated classes support bi-directional conversion 
between XML documents and Python instances.  In concept it is similar to 
JAXB for Java and CodeSynthesis XSD for C++.

Version 0.4.0, available from https://sourceforge.net/projects/pyxb, is 
fairly complete, and supports the following features:

    * Simple and complex type definitions
    * List and union datatypes
    * Constraints on (simple) datatypes (e.g., minInclusive, length)
    * Model groups and attribute groups
    * Complex content models (all, sequence, choice); minOccurs and 
maxOccurs
    * Abstract types, xsi:type, substitution groups
    * Nillable elements with xsi:nil
    * Namespace qualified attributes and elements
    * Class constants corresponding to string enumeration constraints

It successfully generates bindings for many of the major WS-I schemas, 
such as WSDL and SOAP, as well as others like KML and SAML.  A variety 
of examples show how to use it with demonstration web services such as 
the National Digital Forecast Database.  The generated code can easily 
be customized by subclassing the generated bindings.  Both DOM and 
SAX-based parsing are supported.

PyXB assumes a fairly strict interpretation of the XML Schema 
specification, so web services using SOAP encodings with schemas but 
being lax about namespaces and validation against content models can 
sometimes be difficult to use.  This may be addressed in a future release.

The documentation serves as the project's home page, and can be viewed 
at http://pyxb.sourceforge.net/.

This is the initial public release, and I would appreciate any feedback.

Peter